Giter Site home page Giter Site logo

wikia / discreetly Goto Github PK

View Code? Open in Web Editor NEW
71.0 9.0 7.0 1.65 MB

ETLy is an add-on dashboard service on top of Apache Airflow.

License: MIT License

Dockerfile 0.72% Python 50.05% HTML 27.91% CSS 4.26% JavaScript 17.07%
etl edt-dashboard airflow

discreetly's Introduction

DiscreETLy

No longer maintained!

DiscreETLy's development has been stopped and the repository is put into archived, read-only mode.
We recommend looking into alternative Data Catalog solutions, like:

Legacy docs

DiscreETLy was an add-on dashboard service on top of Apache Airflow. It is a user friendly UI showing status of particular DAGs. Moreover, it allows the users to map Tasks within a particular DAG to tables available in any system (relational and non-relational) via friendly yaml definition. DiscreETLy provides fuctionality for monitoring DAGs status as well as optional communication with services such as Prometheus or InfluxDB.

screenshot

Prerequisites

Minimal setup required to run the dashboard requires docker. You can find docker installation instructions on official docker website.

The minimal setup requires also access to Airflow MySQL instance (MySQL version should be >= 8 and allow analytical functions).

Configuration

Before running or deploying DiscreETLy a configuration file needs to be provided. The template for configuration file can be found in config folder: settings.py.template. Configuration is provided as a standard python file, which makes it easy to define and change utilizng Python APIs. The bare minimum configuration needed for the app to run requires definition of a secret key (stub provided in template document) and connection details for Airflow database (currently only MySQL is supported).

Configuration options for InfluxDB and Prometheus are optional. If those services are not defined in configuration file they will be simply ignored while running the app.

If environment is not specified, the application is run in DEBUG mode, so any errors will be reported on dashboard UI. If environment variable ENV_VAR_PREFIX is set to PROD or appropriate option is changed in settings.py file the application will serve 500 errors as defined in dashboard template.

Views & Plugins

The basic configuration file is enough to run the dashboard, however, in order to take full advantage of dashboard features and functionality there are some additional steps that need to be performed. By default only ETL tab presents valuable information, allowing users to monitor progress of particular Airflow DAGs and tasks. But you can easily enable plugins that configure new views. All plugins reside in plugins directory and are enabled if the plugin name is present in the ENABLED_PLUGINS in the settings.py.

You can find more details on what each plugin provides and how to configure it in the following docs:

  • Tables - tables list: status monitoring and records count
  • Reports - monitoring of the reports external to the Aiflow DAGs (like Tableu, Mode)
  • Streaming - view on non-Airflow based streaming applications
  • Table Descritpions - table and columns description
  • S3 Usage - browser of the aggregated metadata of files stored inside mutliple S3 buckets
  • Athena Usage - summaries of users' queries data consumption to Athena
  • Hello World - DiscreETLy Developers Docs
  • Important links - update links in config/links.yaml

Running locally

See: https://hub.docker.com/r/fandom/discreetly/

Before running the container with app we first need to build it so it becomes available in our local docker repository. Run the following command from project's root directory.

docker build -t <image_name>:<image_version> .

Once the image is build the application can be triggered by running:

docker run -e <env_name>=<env_value> --rm --name <container_name> -v <project_root_folder>:/app -p 8000:8000 <docker_image_name>:<image_version>

Let's dissect this command option by option:

  • -e flag allows to set up different evnvironment varaibles required to e.g. configure the app. Most of those options can be hardcoded in configuration file, however, passing them through environment is recommended. For more detials see configuration section of this README.
  • --rm removes the container after stopping it. It ensures that there is always a fresh version of conpfiguration and other features while running the app.
  • -v maps folders containing application from local environment to container. It ensures that if in development mode all changes applied to files on local file system are immediately reflected in container.
  • -p maps a port from container to localhost

If some of the configuration options are already available through settings.py file the command for running the application can be significantly abbreviated (from project root folder):

docker run --rm -v $(pwd):/app -p 8000:8000 fandom/discreetly:latest

Remember to use docker image name and version provided during build stage.

Once the container is ready and running navigate to localhost:8000 in a browser and enjoy.

Testing

In order to run the tests a docker image needs to be build first. The Dockerfile is available in dashboard/tests/ folder. To build an image one can run the following command from project's root directory:

docker build -t dashboard:tests -f dashboard/tests/Dockerfile .

Once the image is build the tests can be preformed by typing

docker-compose --file dashboard/tests/docker-compose.yml up --abort-on-container-exit && docker-compose --file dashboard/tests/docker-compose.yml down

The output of this command shows a nicely formatted information of number of tests performed and success ratio (all tests are performed by using pytest package).

If working iteratively rebuilding the image everytime some changes are made would be cumbersome. In order to avoid that one can pass additional parameter to subsequent runs (mapping of a local project folder to container destination):

docker run --rm -v <absolute_path_to_project_root_directory>:/tmp/dashboard/ dashboard:tests

Credits

DiscreETLy was maintained by Fandom's Data Engineering team


discreetly's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

discreetly's Issues

Make the process of adding user custom views more robust

Currently, a user may inject their custom logic and visualization layer to extra module in blueprints which makes the dashboard flexible. However, there are some obstacles in doing so:

  1. The user needs to define whole logic, with OAuth function.
  2. The user needs to inject the whole file into an image
  3. Any global changes to logic in views.py would also need to be reflected on user side

Possible solutions:

  • Inject some logic through configuration files (what about user specific imports? what about logic depending on and utilizing configuration options?)
  • Auto-generate logic (blueprints) from a list of html files and related python files.

I would like to discuss it here before going into PR mode.

Marking task instance as success breaks the main view

Describe the bug
Marking task instance as success breaks the main view. This is because task_instance like this does not contain end_date (it's null) and tables view relies on that value to get "last timestamp of successful table update".

To Reproduce
Steps to reproduce the behavior:

  1. Create a new DAG with one task. This task should be connected to a table.
  2. Instead of waiting for the scheduler to execute the task, mark it as success
  3. Open discreETLy, it will break the main view and tables view.

Expected behavior
This task should not be listed as it was not executed.

DiscreETLy (tables) "uses" field can't reference tables of other DAGs

Describe the bug
Tables extension doesn't properly support tables relying ("uses") on other tables from different DAG.
This causes issues on "Tables managed by ..." view, where no graph will be shown for the 'dag2' (see Steps to reproduce).

To Reproduce
Steps to reproduce the behavior:

  1. In /config/tables.yaml declare two tables, belonging to two different DAGs, where one is using another e.g.:

- name: table1
  db: db
  dag_id: dag1
  task_id: create_table1

- name: rollup_of_table1
  db: db
  uses: db.table1
  dag_id: dag2
  task_id: rollup_of_table
```1

2. Run DiscreETLy, go to /etl, select the graph view for dag2
3. No graph is shown

**Expected behavior**
The graph should be shown.

Docker container fails to start

Describe the bug
Docker container fails to start due to _mysql_exceptions.OperationalError: (2059, 'Plugin mysql could not be loaded: Error loading shared library lib/mariadb/plugin/mysql.so: No such file or directory')

To Reproduce
Steps to reproduce the behavior:

  1. git clone https://github.com/Wikia/discreETLy.git
  2. Create settings.py file that sets AIRFLOW_DB_HOST (set to mysql://<my_ip>:3306), AIRFLOW_USERNAME, AIRFLOW_PASSWORD and AIRFLOW_DATABASE. I have confirmed through DataGrip that these settings are valid.
  3. docker run --rm -v $(pwd):/app -p 8000:8000 fandom/discreetly:latest
  4. See error _mysql_exceptions.OperationalError: (2059, 'Plugin mysql could not be loaded: Error loading shared library lib/mariadb/plugin/mysql.so: No such file or directory')

Expected behavior
The docker container should start.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.