dataeng-zoomcamp's People
Forkers
abz-aaron pdanchenko ahairshi mpumi-py damiiete shiva-mantri mbaeum amine-elkostali amitkooner albancapitant morshedmasud charansagar ealexisaraujo mldurga egorshishkovets haris-m-aslam ab2021 chanukyapatnaik ahmadabdul592 maryann-agofure alarmeluandiappan syedahiraamjad ythie andisugandi suddhasatwa guelormusavuli aul-lab isatyalytics agyenton nialaisa analytics-sam wahyu-triu jastyn olaa003 mohamedelfallah osabutey sfwajero emoloic sudipadh anyfactor marianangelica olubunkayo senny10 akmalmnaim filipetheanalyst chisomloius ayocks leovantoji jeniffermukami n1klaus zekuva bigrainlin rohanpatankar926 suchitsanghvi yousaff2022skipq oguzerdo thesekyi kunmuli zaherweb mcherreal mario-renau-a akinpadeas jpreyes25 newbiesoc davekim917 aimanarifaminnudin sudan45 daiyaharsh90 aaabdulkadir kiddojazz arifdeen balurc abhirajput1194 rahul-netizen rayu711 arifsamii world-citizen-vivian diogoalexandreramalho sivachandanc capsu86 vijay-lionorbit jamessandy bkarsli01 ssime-git amaboh paulonye rhubarbhwy ayehninnkhine lingzhi-wlz josebrida gazielu mahdi-moosa programmer1188 rehamunnisha shivampanwar islam-miko vbiff teacherc heytyrone drpbakshdataeng-zoomcamp's Issues
Schema Registry Issues
when I run producer.py in
/avro_example/, it runs very smooth and consumer.py as well. however when I check the schema-register at localhost:8081 it is null and so is the Confluent Ui (localhost:9021).
Is there anything missing?
NYC dataset changed format and S3 url
NYC.gov has changed all their files to Parquet. The csv files are no longer available through the provided S3 links.
The new link is https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2021-01.parquet
But it requires some additional processing to follow a long. This mostly applies to video DE Zoomcamp 1.2.2 - Ingesting NY Taxi Data to Postgres, but it may pop up in other places throughout the course.
First
pip install pyarrow
Then convert the parquet to pandas:
import pyarrow.parquet as pq
trips = pq.read_table('yellow_tripdata_2021-01.parquet')
df = trips.to_pandas()
Finally, run this command and wait. It will take awhile then return a number when it is finished.
df.to_sql(name='yellow_taxi_data', con=engine, if_exists='replace', chunksize=100000)
Alternatively, the .csv files could be added to the repo with links to those instead.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.