Giter Site home page Giter Site logo

abhioncbr / docker-superset Goto Github PK

View Code? Open in Web Editor NEW
101.0 10.0 55.0 2 MB

Repository for Docker Image of Apache-Superset. [Docker Image: https://hub.docker.com/r/abhioncbr/docker-superset]

Home Page: https://abhioncbr.github.io/docker-superset

License: Other

Shell 57.87% Dockerfile 24.29% Python 17.84%
docker-image docker-container docker-compose apache-superset celery redis mysql distributed-systems dashboard flower

docker-superset's Introduction

Inviting contributors for enhancing & maintaining the project.

docker-superset

Repository for building Docker container of Apache Superset.

Superset

CircleCI License Code Climate PRs Welcome

  • For understanding & knowing more about Superset, please follow Official website and Join the chat at https://gitter.im/airbnb/superset
  • Similarly, for Docker follow curated list of resources.

Images

Image Pulls Tags
abhioncbr/docker-superset Docker Pulls tags

Superset components stack

  • Enhanced/Modified version of the docker container of apache-superset.
  • Superset version: Notation for representing version X.YY.ZZzzz which means either
    • 0.36.0
    • 0.35.0, 0.35.1
    • 0.34.0, 0.34.0rc1
    • latest, 0.32.0rc2
    • 0.29.0rc8, 0.29.0rc7, 0.29.0rc5, 0.29.0rc4
    • 0.28.1, 0.28.0
  • Backend database: MySQL
  • SqlLabs query async mode: Celery
  • Task queue & query cache: Redis
  • Image contains all database plugin dependencies and elastic-search

Superset ports

  • superset portal port: 8088
  • superset celery flower port: 5555

Silent features of the docker image

  • multiple ways to start a container, i.e. either by using docker-compose or by using docker run command.
  • superset all components, i.e. web application, celery worker, celery flower UI can run in the same container or in different containers.
  • container first run sets required database along with examples and the Fabmanager user account with credentials username: admin & password: admin.
  • superset config file i.e superset_config.py should be mounted to the container. No need to rebuild image for changing configurations.
  • the default configuration uses MySQL as a Superset metadata database and Redis as a cache & celery broker.
  • starting the container using docker-compose will start three containers. mysql5.7 as the database, redis3.4 as a cache & celery broker and superset container.
    • expects multiple environment variables defined in docker-compose.yml file. Default environment variables are present in file .env.
    • override default environment variables either by editing .env file or passing through commands like SUPERSET_ENV.
    • permissible value of SUPERSET_ENV can be either local or prod.
    • in local mode one celery worker and superset flask-based superset web application run.
    • in prod mode two celery workers and Gunicorn based superset web application run.
  • starting container using docker run can be a used for complete distributed setup, requires database & Redis URL for startup.
    • single or multiple server(using load balancer) container can be spawned. In the server, Gunicorn based superset web application runs.
    • multiple celery workers container running on same or different machines. In worker, celery worker & flower UI runs.

How to build the image

  • DockerFile uses superset-version as a build-arg, for example: 0.28.0 or 0.29.0rc4
  • build image using docker build command
    docker build -t abhioncbr/docker-superset:<version-tag> --build-arg SUPERSET_VERSION=<superset-version> -f ~/docker-superset/docker-files/Dockerfile .

How to run using Kitmatic

  • Simplest way for exploration purpose, using Kitematic(Run containers through a simple, yet powerful graphical user interface.)
    • Search abhioncbr/docker-superset Image on docker-hub Kitematic-search-docker-supeset

    • Start a container through Kitematic UI. Kitematic-start-superset-container

How to run using docker commands

  • Through general docker commands -

    • first pull a docker-superset image from docker-hub using either

      docker pull abhioncbr/docker-superset

      or for specific superset version by providing version value

      docker pull abhioncbr/docker-superset:<version-tag>
    • Copy superset_config.py, docker-compose.yml, and .env files. I am considering directory structure like below

      docker-superset
           |_ config
           |    |_superset_config.py
           |
           |_docker-files
           |    |_docker-compose.yml
           |    |_.env
      
      
    • using docker-compose:

      • starting a superset image as a superset container in a local mode:

        cd docker-superset/docker-files/ && docker-compose up -d

        or for passing some different environment variables values like below

        cd docker-superset/docker-files/ && SUPERSET_ENV=local SUPERSET_VERSION=<version-tag> docker-compose up -d
      • starting a superset image as a superset container in a prod mode:

        cd docker-superset/docker-files/ && SUPERSET_ENV=prod SUPERSET_VERSION=<version-tag> docker-compose up -d
    • using docker run:

      • starting a superset image as a server container:
        cd docker-superset && docker run -p 8088:8088 -v config:/home/superset/config/ abhioncbr/docker-superset:<version-tag> cluster server <superset_metadata_db_url> <redis_url>
      • starting a superset image as a worker container:
         cd docker-superset && docker run -p 5555:5555 -v config:/home/superset/config/ abhioncbr/docker-superset:<version-tag> cluster worker <superset_metadata_db_url> <redis_url>

    Superset

Distributed execution of superset

  • As mentioned above, docker image of superset can be leveraged to run in complete distributed run
    • load-balancer in front for routing the request from the client to one server container.
    • multiple docker-superset container in server mode for serving the UI of the superset.
    • multiple docker-superset containers in worker mode for executing the SQL queries in an async mode using celery executor.
    • centralised Redis container or Redis-cluster for serving as cache layer and celery task queues for workers.
    • centralised superset metadata database.
  • Image below depicts the docker-superset distributed platform: Distributed-Superset

Published Posts

docker-superset's People

Contributors

abhioncbr avatar timlinux avatar tssujt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docker-superset's Issues

Docker won't rebuild

Describe the bug

# docker-compose build
redis uses an image, skipping
mysql uses an image, skipping
superset uses an image, skipping

# docker-compose images
ERROR: no such image: sha256:13b0147efffc26b489f6aafda0370c3ef3afd936548a6cbe75ef9d84158fb7f1: No such image: sha256:13b0147efffc26b489f6aafda0370c3ef3afd936548a6cbe75ef9d84158fb7f1

I manually removed the image because the docker-compose build would not rebuild. Now the image is gone and it still won't build.

To Reproduce
Steps to reproduce the behavior:

  1. docker-compose build won't do anything
  2. docker rmi 5874cfce8261 will remove abhioncbr/docker-superset
  3. docker-compose build still won't do anything

Expected behavior
Rebuilding the container if it is no longer present for whatever reason.

Desktop (please complete the following information):

  • OS: Ubuntu Server 20.04
  • Browser n/a
  • Version latest

Additional context
Superset is not at the latest version. See issue #41.

Add default configurations in DockerFile for starting conatiner fron Kitematic.

Is your feature request related to a problem? Please describe.
Apart from running the Docker container through 'docker-compose' & 'docker run' container, many users start the container through some UI tool like Kitematic. Add the default configurations in the DockerFile so that container could start from Kitematic.

Connect superset docker to local postgresql database

Describe the bug
Deploying through kitematic I'm unable to connect to a postgres db running on local host. Pyscog seems to be successfully installed

Loaded your LOCAL configuration at [/home/superset/config/superset_config.py]
/usr/local/lib/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)
Recognized Database Authentications.
Admin User admin created.
Loaded your LOCAL configuration at [/home/superset/config/superset_config.py]
/usr/local/lib/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.

To Reproduce
Steps to reproduce the behavior:
Add new database
Enter URI as
postgresql://username:@localhost:5432/dbname

Expected behavior
Test connection should be successful instead receive error

ERROR: {"error": "Connection failed!\n\nThe error message returned was:\n(psycopg2.OperationalError) could not connect to server: Connection refused\n\tIs the server running on host \"127.0.0.1\" and accepting\n\tTCP/IP connections on port 5432?

Db is accessible through that address

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: Desktop
  • OS: Mac
  • Browser Chrome
  • Version 75.0.3770.142 (Official Build) (64-bit)

Additional context
Add any other context about the problem here.

cannot import name 'Any'

I'm getting the below error while running the docker image.

Docker : docker pull abhioncbr/docker-superset:0.36.0
Command : docker run -p 8088:8088 -v config:/home/superset/config/ abhioncbr/docker-superset:0.36.0 cluster server localhost:3307 localhost:6379

image

Reproduce:

  1. Execute the above Docker pull command
  2. Then execute the Docker run command
  • I have mysql and redis docker containers running in local

flask.cli.NoAppException while starting superset container in local mode using docker-compose command.

Describe the bug
With Superset image of version '0.29.0rc7', Superset gave 'flask.cli.NoAppException'
in the local mode using docker-compose.

To Reproduce
Steps to reproduce the behavior:

  1. pull superset image of version '0.29.0rc7'.
  2. start the container using docker-compose command and in 'local' mode.
  3. access superset home page using url localhost:8088
  4. 'flask.cli.NoAppException'

Expected behavior
Shouldn't be that page. The home page of the superset should be rendered.

Screenshots
If applicable, add screenshots to help explain your problem.
image

elasticsearch connection error

Describe the bug
error:
ERROR: {"error": "Connection failed!\n\nThe error message returned was:\nCan't load plugin: sqlalchemy.dialects:elasticsearch.https"}

when adding elasticsearch data source

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'Sources'
  2. Click on 'Databases'
  3. Click on Add Database
  4. Add Elasticsearch HTTPS with Authentication
  5. Click on Test Connection

Expected behavior
success with the test connection

Latest Superset install using the docker-compose

This endpoint requires the datasource **, database or `all_datasource_access` permission

When trying to create a chart on data source I get error :
This endpoint requires the datasource , database or all_datasource_access permission

With this stacktrace

Traceback (most recent call last):
  File "/home/superset/superset/views/base.py", line 114, in wraps
    return f(self, *args, **kwargs)
  File "/home/superset/superset/views/core.py", line 1256, in explore_json
    samples=samples,
  File "/home/superset/superset/views/core.py", line 1169, in generate_json
    security_manager.assert_datasource_permission(viz_obj.datasource)
  File "/home/superset/superset/security.py", line 470, in assert_datasource_permission
    self.get_datasource_access_link(datasource),
superset.exceptions.SupersetSecurityException: This endpoint requires the datasource poc-test, database or
            `all_datasource_access` permission

Happens in the latest docker image and also in the version before: 0.29.0rc7

Can't load Athena JDBC plugin

I'm trying to connect Athena JDBC plugin to Superset

I get an error
ERROR: {"error": "Connection failed!\n\nThe error message returned was:\nCan't load plugin: sqlalchemy.dialects:awsathena.jdbc"}

Same with rest plugin:
ERROR: {"error": "Connection failed!\n\nThe error message returned was:\nCan't load plugin: sqlalchemy.dialects:awsathena.rest"}

Am I missing something? should I load the plugin manually?

Support for other query result cache framework apart of Redis.

Is your feature request related to a problem? Please describe.
Superset supports multiple frameworks for query caching however in docker-superset only Redis based caching is available. Requires support of other caching frameworks.

Describe the solution you'd like
Files [superset_config.py and docker-entrypoint.sh] require enhancement for supporting other caching frameworks.

Specifying the number of Celery workers in environment variables

Is your feature request related to a problem? Please describe.
I simply wanted to add more workers in a prod environment, so I had to edit the docker-entrypoint.sh starting script. Then rebuild the entire container because I couldn't change that script inside the container without root privilege.

Describe the solution you'd like
Inside the .env file, have a variable defined as:

# Number of workers
CELERY_WORKERS=2

which will automatically generate the number of workers instead of having it hard-coded to 2.

Describe alternatives you've considered
Manually changing docker-entrypoint.sh and rebuilding everything.

Build

Hello, first of all I congratulate you and I thank you for the contribution of your repository, it has made it much easier for me to start with the concept tests that I must do on Superset.

I'm not the most expert in Docker so the question may sound a little stupid.

I managed to start the container using docker-compose and it works perfectly, however I must make some adjustments from the beginning so I modified the Dockerfile and generated the image with the "docker build" as it is specified in the documentation. However, I can not get the container to start, it always remains in an error state. I'm not clear about the command that I should use. It must be this:

"cd docker-superset & & docker run -p 8088: 8088 -v config: / home / superset / config / abhioncbr / docker-superset: cluster server <superset_metadata_db_url> <redis_url>"

the parameter "superset_metadata_db_url" is the host of the metadata DB with the structure user @ host .....?

When running the "docker build" does not generate the redis container, I must generate it separately and in "redis_url" I send the URL of this one?

Basically what I need is to use the Superset container but to automatically execute a Sh that modifies the Superset configuration, also creates some folders in the container and copies some monitoring routines, but that allows us to use Superset updates, that I thought about modifying the Dockerfile and executing the Sh in the "docker-entrypoint.sh", in that case the best option is to use the docker build of your container? or is there some other simpler way?

I would very much appreciate any help you can give me!

Thank you.

gevent not found when running 0.34 builds

Describe the bug
gevent lib is not installed on container and when it starts worker I see following error.
I do not understand how did you managed to run this version without seeing this error.

To Reproduce
Steps to reproduce the behavior:

  1. use latest images with tag 0.34.0 or 0.34.0rc1 from dockerhub
  2. Running on prod and compose parameters
  3. After workers started I see log
    Started Celery workers[worker1, worker2] and Flower UI.
  4. After server starts I see in the log
    File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/ggevent.py", line 22, in
    ๏†‰73218f80f0a8bcaaa3f5f9d2c04fcabd8fa3a4feb0619aeb716e09856a865a85
    raise RuntimeError("You need gevent installed to use this worker.")

Expected behavior
If I switch back to 0.33.0rc1 which is current latest, it works with everything else is same.

Additional context
Running on aws ECS, but should not have an effect on this error.

Use of environment file for docker-compose instead of passing through command

Is your feature request related to a problem? Please describe.
Currently, for running docker-superset using compose command, we need to pass environment variables (SUPERSET_ENV=local SUPERSET_VERSION=), instead of that .env file would be much better.

Describe the solution you'd like
wrap all environment variables in .env file which will be picked by docker-compose command.

No such image

I'm getting a "ERROR: no such image: abhioncbr/docker-superset" on docker-compose up -d.
Is this a temporal error? I thought I've tried it before with more luck...

Customizing the container assets and pushing to docker-hub

Hi. Sorry for skipping the standard protocol. This isn't really a bug, but more of an inquiry. I wanted to know how to go about pushing an image or container to docker-hub after customizing the assets in the container. Any help/advice would be appreciated.

Options to edit Username, Email address and add Email Reporting

Thanks for the docker. Saved so much time and works without any issues!
Especially there are several issues with the original superset container like entering restarting loop with exit code 243, missing CSS styling. This worked like a charm without editing any files.

However, now I'm out of sync with the original superset documentation. Is there a way to edit username, email address of the user before/after the docker is deployed?

Also say for example, if I need features like email reporting, how to get it to work?

Ansible script for setting up environment and starting docker-superset.

Is your feature request related to a problem? Please describe.
Currently, manually pulling of the image from docker-hub and then needs copying of config & compose files for starting the docker-superset container. Need an automation script for manual steps.

Describe the solution you'd like
Ansible based scripts will install docker in-case if it is not present and then will start the dockers-superset container.

Describe alternatives you've considered
The shell script can be an option but ansible way is a more clean and generic approach.

Additional context
Consider terraform based scripts for setting up the infrastructure.

Python SSL error when connecting to Bigquery

Describe the bug
When I attempt to add a bigquery database connection I received the following error in the UI
ERROR: {"error": "Connection failed!\n\nThe error message returned was:\nHTTPSConnectionPool(host='oauth2.googleapis.com', port=443): Max retries exceeded with url: /token (Caused by SSLError(SSLError(\"bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)\",),))"}

Upon further investigation, I attached to the docker container as root:
docker exec -u 0 -it <id> /bin/bash and then ran python`

Terminal session:
`Python 3.6.8 (default, Mar 27 2019, 08:49:59)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.

import requests
requests.get('https://www.google.com')
From cffi callback <function _verify_callback at 0x7f5b1eccc378>:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/OpenSSL/SSL.py", line 309, in wrapper
_lib.X509_up_ref(x509)
AttributeError: module 'lib' has no attribute 'X509_up_ref'
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/contrib/pyopenssl.py", line 441, in wrap_socket
cnx.do_handshake()
File "/usr/local/lib/python3.6/site-packages/OpenSSL/SSL.py", line 1915, in do_handshake
self._raise_ssl_error(self._ssl, result)
File "/usr/local/lib/python3.6/site-packages/OpenSSL/SSL.py", line 1647, in _raise_ssl_error
_raise_current_error()
File "/usr/local/lib/python3.6/site-packages/OpenSSL/_util.py", line 54, in exception_from_error_queue
raise exception_type(errors)
OpenSSL.SSL.Error: [('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')]
...`

I found that I can successfully connect to bigquery or any other https connection via python by running this pip removal.

pip uninstall pyOpenSSL

To Reproduce
Steps to reproduce the behavior:

  1. Go to add databases
  2. Click add a bigquery://
  3. see error message

Expected behavior
I install the docker-compose template and everything works without hackery.

I'm not sure what the solution is here, but thought I'd report it.

Install Bigquey and Hive packages in image

Is your feature request related to a problem? Please describe.
for accessing BigQuery and Hive databases through Superset, packages needs to be installed.

Describe the solution you'd like
pip install pybigquey pyhive

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Minor update in Dokcerfile.

superset_config.py should not be the part of the docker image. It should be mounted in to the container.

Is your feature request related to a problem? Please describe.
Superset read configurations from superset_config.py file present in the python path. Minor change in the file results into the building of the new docker image. It should be easily modifiable.

Describe the solution you'd like
superset_config.py should not be the part of the docker image. It should be mounted into the container.

Describe alternatives you've considered
All those configurations variable can be provided through docker-compose.yml file but it requires a lot of logical processing in superset_config.py file.

Additional context
Mor configurable docker image for generalization.

Superset error sqlalchemy

Describe the bug
Superset error sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1054, "Unknown column 'dbs.allow_run_sync' in 'field list'")

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'SOURCES'
  2. Click on 'DATABASES'
  3. See error

Expected behavior
Open DATABASE connection page.

Dashboard Metadata in one Container

Hi Abhishek,

Thanks for your detailed blog and process to setup Apache superset.

I am deploying Superset on Container, However I am not able to import dashboard once the given image by you runs.

Also, I am trying to deploy the dashboard metadata database in one container itself.

Could you please share the process if you are aware or direct me to the link as i am working on POC and new to Superset.

Regards,
Tausif.

Install database dependencies python plugins in docker images

Is your feature request related to a problem? Please describe.
Currently, docker image only consists dependencies plugins of MySql, Postgres and, BigQuery. For connectivity to other databases, docker image needs to be rebuilt. Install plugins for all common databases.

Describe the solution you'd like
Install the database dependencies listed in Superset manual

Update docker image to Superset 0.36

Is your feature request related to a problem? Please describe.

Superset 0.36 uses SMART_NUMBER when displaying graphs, but the current image does not support this data type.

Describe the solution you'd like

Update image to the latest Superset tag

Describe alternatives you've considered

Other than not use this solution? We haven't considered anything at this point.

Additional context

We just started using Superset, and we moved from the incubator after learning that it was not suitable outside of development.

ENV not set?

I'm using your .env variable, but I get:

WARNING: The MYSQL_USER variable is not set. Defaulting to a blank string.
WARNING: The MYSQL_PASSWORD variable is not set. Defaulting to a blank string.
WARNING: The MYSQL_DATABASE variable is not set. Defaulting to a blank string.
WARNING: The MYSQL_ROOT_PASSWORD variable is not set. Defaulting to a blank string.
WARNING: The SUPERSET_VERSION variable is not set. Defaulting to a blank string.
WARNING: The MYSQL_HOST variable is not set. Defaulting to a blank string.
WARNING: The MYSQL_PORT variable is not set. Defaulting to a blank string.
WARNING: The REDIS_HOST variable is not set. Defaulting to a blank string.
WARNING: The REDIS_PORT variable is not set. Defaulting to a blank string.
WARNING: The SUPERSET_ENV variable is not set. Defaulting to a blank string.
WARNING: The GOOGLE_APPLICATION_CREDENTIALS variable is not set. Defaulting to a blank string.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.