ziritrion / dataeng-zoomcamp Goto Github PK

View Code? Open in Web Editor NEW

303.0 303.0 165.0 9.16 MB

Jupyter Notebook 79.32% Dockerfile 1.12% Python 17.43% HCL 1.88% Shell 0.25%

dataeng-zoomcamp's People

Contributors

Stargazers

Watchers

Forkers

abz-aaron pdanchenko ahairshi mpumi-py damiiete shiva-mantri mbaeum amine-elkostali amitkooner albancapitant morshedmasud charansagar ealexisaraujo mldurga egorshishkovets haris-m-aslam ab2021 chanukyapatnaik ahmadabdul592 maryann-agofure alarmeluandiappan syedahiraamjad ythie andisugandi suddhasatwa guelormusavuli aul-lab isatyalytics agyenton nialaisa analytics-sam wahyu-triu jastyn olaa003 mohamedelfallah osabutey sfwajero emoloic sudipadh anyfactor marianangelica olubunkayo senny10 akmalmnaim filipetheanalyst chisomloius ayocks leovantoji jeniffermukami n1klaus zekuva bigrainlin rohanpatankar926 suchitsanghvi yousaff2022skipq oguzerdo thesekyi kunmuli zaherweb mcherreal mario-renau-a akinpadeas jpreyes25 newbiesoc davekim917 aimanarifaminnudin sudan45 daiyaharsh90 aaabdulkadir kiddojazz arifdeen balurc abhirajput1194 rahul-netizen rayu711 arifsamii world-citizen-vivian diogoalexandreramalho sivachandanc capsu86 vijay-lionorbit jamessandy bkarsli01 ssime-git amaboh paulonye rhubarbhwy ayehninnkhine lingzhi-wlz josebrida gazielu mahdi-moosa programmer1188 rehamunnisha shivampanwar islam-miko vbiff teacherc heytyrone drpbaksh

dataeng-zoomcamp's Issues

Schema Registry Issues

when I run producer.py in
/avro_example/, it runs very smooth and consumer.py as well. however when I check the schema-register at localhost:8081 it is null and so is the Confluent Ui (localhost:9021).
Is there anything missing?

NYC dataset changed format and S3 url

NYC.gov has changed all their files to Parquet. The csv files are no longer available through the provided S3 links.
The new link is https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2021-01.parquet
But it requires some additional processing to follow a long. This mostly applies to video DE Zoomcamp 1.2.2 - Ingesting NY Taxi Data to Postgres, but it may pop up in other places throughout the course.

First
pip install pyarrow

Then convert the parquet to pandas:

import pyarrow.parquet as pq
trips = pq.read_table('yellow_tripdata_2021-01.parquet')
df = trips.to_pandas()

Finally, run this command and wait. It will take awhile then return a number when it is finished.
df.to_sql(name='yellow_taxi_data', con=engine, if_exists='replace', chunksize=100000)

Alternatively, the .csv files could be added to the repo with links to those instead.

ziritrion / dataeng-zoomcamp Goto Github PK

dataeng-zoomcamp's People

Contributors

Stargazers

Watchers

Forkers

dataeng-zoomcamp's Issues

Schema Registry Issues

NYC dataset changed format and S3 url

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent