The meteospark from hussein-awala

hussein-awala / meteospark Goto Github PK

View Code? Open in Web Editor NEW

meteo-spark is an open source project that aims to simplify the Climate Data Analysis using PySpark, which allow the processing of very big files saved in the cloud (S3, GCS, ...) on a large pyspark cluster managed by YARN or Kubernetes.

License: Apache License 2.0

Shell 0.67% Python 99.33%

meteospark's Introduction

Hi there, I'm Hussein

About Myself:

👨🏼‍💻 I'm a Senior Data Engineer within the Ads Network team at Voodoo in Paris, and an Apache committer and PMC member at Apache Airflow
💡 I design, develop and maintain data platforms, especially the modern lakehouse architectures and the stream processing applications
🔬 I am responsible for serving and improving the performance of the ML models and the feature store
🔒 I ensure the security of user data on the data platform, in compliance with the regulations in force (GDPR, e-privacy)
🤝🏻 I contribute to different popular open-source projects (Airflow, Iceberg, Hudi, ...) and my own open-source projects (spark-on-k8s, airflow-duckdb, and async-batcher)
⚡ Fun fact: I am always seeking new opportunities to learn. Also, I love to cook, swim and watch movies or TV series.

Projects I am working on currently:

Improving the performance of a real-time bidding system by implementing a batching mechanism to reduce the resource consumption and the latency, and by optimizing the feature store structure and data freshness
Designing and developing a new lakehouse architecture using spark, iceberg, airflow, dbt and S3, to store the company data in a modern and cheap data store respecting the GDPR