Giter Site home page Giter Site logo

elasticlabs / airflow-jupyter-docker-compose Goto Github PK

View Code? Open in Web Editor NEW
34.0 1.0 9.0 139 KB

Orchestration of data science and earth observation models in Apache Airflow, scale-up with Celery Executor, experiment with jupyter notebook using a docker containers composition

License: MIT License

Dockerfile 26.04% Shell 10.95% Python 58.20% Makefile 4.82%
apache-airflow airflow airflow-dags etl data-science jupyter-notebook

airflow-jupyter-docker-compose's Introduction

airflow-jupyter-docker-compose

Orchestration of data science and earth observation models in Apache Airflow, scale-up with Celery Executor, experiment with jupyter notebook using a docker containers composition. Based on https://github.com/puckel/docker-airflow works.

Dependencies

  • docker-compose : 1.27.4

Commands to deploy and manage the stack behind an HTTPS automated proxy:

  • Ensure that appropriate DNS record for airflow base URL is created and resolve well.
  • Ensure that your automated nginx-proxy (e.g. ) is up and running.
  • Create the airflow-proxy network -> sudo docker network create airflow-proxy
  • Attach the new network to the existing nginx-proxy container to ensure proper proxy operations -> sudo docker network connect airflow-proxy <nginx-proxy container name>
  • Bring up the whole stack -> sudo docker-compose up -d --build

Stack management

  • Stop containers : sudo docker-compose down
  • View Container : sudo docker ps
  • Go inside a container : sudo docker-compose exec -it <service-id> bash
  • See logs of a container: sudo docker logs <service-id>
  • Monitor containers : sudo docker stats

Available URL list

  • airflow.<your-awesome-domain.ltd> -> airflow web UI
  • airflow.<your-awesome-domain.ltd>/flower -> Flower, celery workers Web UI
  • airflow.<your-awesome-domain.ltd>/pgadmin -> pgadmin4
  • airflow.<your-awesome-domain.ltd>/jupyter -> jupyter notebook (default password : notebook)

Deployed librairies

Please find below the included choice of librairies and associated reference URL for documentation and examples

Essential python librairies for data analysis

Library Desciption Resources
bokeh The Bokeh Visualization Library https://bokeh.org/
bottleneck Bottleneck is a collection of fast, NaN-aware NumPy array functions written in C. Working with pandas and xarray https://github.com/pydata/bottleneck
dask Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love. https://dask.org/
matplotlib-base Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. https://matplotlib.org/
numpy The fundamental package for scientific computing with Python. https://numpy.org/
panel A high-level app and dashboarding solution for Python. https://panel.holoviz.org/index.html
pytables PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data. https://www.pytables.org/
scipy SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. In particular https://www.scipy.org/
scikit-image Image processing in Python https://scikit-image.org/
scikit-learn Machine Learning in Python https://scikit-learn.org/stable/
seaborn Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. http://seaborn.pydata.org/
statsmodels
xarray xarray (formerly xray) is an open source project and Python package that makes working with labelled multi-dimensional arrays simple, efficient, and fun! http://xarray.pydata.org/en/stable/

Jupyter and Airflow framework specific librairies

Library Desciption Resources
papermill Papermill is a tool for parameterizing and executing Jupyter Notebooks. https://papermill.readthedocs.io/en/latest/
psycopg2 Psycopg is the most popular PostgreSQL database adapter for the Python programming language. https://www.psycopg.org/docs/
  • JupyterLab extensions
Library Desciption Resources
appmode A Jupyter extensions that turns notebooks into web applications. https://github.com/oschuett/appmode
ipywidgets Widgets are eventful python objects that have a representation in the browser, often as a control like a slider, textbox, etc. https://ipywidgets.readthedocs.io/en/stable/examples/Widget%20List.html
ipyleaflet Interactive maps in the Jupyter notebook. https://ipyleaflet.readthedocs.io/en/latest/
jupyterlab-manager A JupyterLab extension for Jupyter/IPython widgets. https://github.com/jupyter-widgets/ipywidgets/tree/master/packages/jupyterlab-manager
jupyter_bokeh An extension for rendering Bokeh content in JupyterLab notebooks https://bokeh.org/
jupyter-matplotlib An extension for rendering Matplotlib content in JupyterLab notebooks https://matplotlib.org/
jupyterlab-plotly The plotly Python library is an interactive, open-source plotting library that supports over 40 unique chart types covering a wide range of statistical, financial, geographic, scientific, and 3-dimensional use-cases. https://plotly.com/python/getting-started/
jupyterlab-voyager JupyterLab extension visualize data with Voyager https://data-voyager.gitbook.io/voyager/

Geo / EO / Weather specific

Library Desciption Resources
cartopy Cartopy is a Python package designed for geospatial data processing in order to produce maps and other geospatial data analyses. https://scitools.org.uk/cartopy/docs/latest/
cmocean This package contains colormaps for commonly-used oceanographic variables. https://matplotlib.org/cmocean/
descartes Use Shapely or GeoJSON-like geometric objects as matplotlib paths and patches https://pypi.org/project/descartes/
ecwmf-api-client ECMWF WebAPI is a set of services developed by ECMWF to allow users from the outside to access some internal features and data of the centre. https://confluence.ecmwf.int/display/WEBAPI/ECMWF+Web+API+Home
iris A powerful, format-agnostic, community-driven Python library for analysing and visualising Earth science data. https://scitools.org.uk/iris/docs/latest/
iris-grib The library iris-grib provides functionality for converting between weather and climate datasets that are stored as GRIB files and Iris cubes. https://iris-grib.readthedocs.io/en/stable/
geos GEOS (Geometry Engine - Open Source) is a C++ port of the ​JTS Topology Suite (JTS). It aims to contain the complete functionality of JTS in C++. https://trac.osgeo.org/geos/
geopandas GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. https://geopandas.org/
metpy MetPy is a collection of tools in Python for reading, visualizing, and performing calculations with weather data. https://unidata.github.io/MetPy/latest/index.html
metview Python interface to Metview meteorological workstation and batch system https://software.ecmwf.int/metview
magics Python interface to Magics meteorological plotting package. https://github.com/ecmwf/magics-python
netcdf4 netcdf4-python is a Python interface to the netCDF C library. http://unidata.github.io/netcdf4-python/netCDF4/index.html
protobuf Protocol buffers are a language-neutral, platform-neutral extensible mechanism for serializing structured data. https://developers.google.com/protocol-buffers/
pynio PyNIO is a multi-format data I/O package with a NetCDF-style interface. http://www.pyngl.ucar.edu/Nio.shtml
shapely Manipulation and analysis of geometric objects in the Cartesian plane. https://shapely.readthedocs.io/en/stable/project.html
siphon A collection of Python utilities for retrieving atmospheric and oceanic data from remote sources, focusing on being able to retrieve data from Unidata data technologies, such as the THREDDS data server. https://unidata.github.io/siphon/latest/examples/index.html

airflow-jupyter-docker-compose's People

Contributors

elasticlabs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.