meteo-spark is an open source project that aims to simplify the Climate Data Analysis using PySpark, which allow the processing of very big files saved in the cloud (S3, GCS, ...) on a large pyspark cluster managed by YARN or Kubernetes.
๐จ๐ผโ๐ป I'm a Senior Data Engineer within the Ads Network team at Voodoo in Paris, and an Apache committer and PMC member at Apache Airflow
๐ก I design, develop and maintain data platforms, especially the modern lakehouse architectures and the stream processing applications
๐ฌ I am responsible for serving and improving the performance of the ML models and the feature store
๐ I ensure the security of user data on the data platform, in compliance with the regulations in force (GDPR, e-privacy)
๐ค๐ป I contribute to different popular open-source projects (Airflow, Iceberg, Hudi, ...) and my own open-source projects (spark-on-k8s, airflow-duckdb, and async-batcher)
โก Fun fact: I am always seeking new opportunities to learn. Also, I love to cook, swim and watch movies or TV series.
Projects I am working on currently:
Improving the performance of a real-time bidding system by implementing a batching mechanism to reduce the resource consumption and the latency, and by optimizing the feature store structure and data freshness
Designing and developing a new lakehouse architecture using spark, iceberg, airflow, dbt and S3, to store the company data in a modern and cheap data store respecting the GDPR