This repo is inspired by the Docker for Datascience book. It's a Docker image with a data science environment based on the jupyter/datascience-notebook with pandas, matplotlib, scipy, seaborn and scikit-learn pre-installed.
- Clone this repo
Create a new folder with project name, cd into it, and then run
git init
git pull https://github.com/glebmikha/data-science-project-template.git
Or you can just download it as a zip and use it without git features.
- Add your favorite Python modules to ./docker/jupyter/requirements.txt. For example:
xgboost
tensorflow==1.6.0
Or use pip install right in jupyter (don't forget ! in front of the command)
!pip install your_package
- Run containers
docker-compose up
-
Copy a jupyter url from terminal and open it in your browser. Or just click it if you are using VSCode.
-
Find an examples.ipynb notebook in ipynb folder. Create your notebooks.
-
Copy your data into ./data and read it in Jupyter. You also can upload data into PostgresSQL, which is running in it's own container along with Jupyter (see examples notebook for details)
-
Close terminal to stop running jupyter and postgres.
-
Clean Docker's mess
docker rmi -f $(docker images -qf dangling=true)
Sometimes it is useful to remove all docker's data. It is not safe operation so be carefull
docker system prune