Giter Site home page Giter Site logo

rda's Introduction

Rugby Data Environment

This environment fills the gap between direct installation via The Littlest Jupyterhub and managing thousands of users via JupyterHub with Kubernetes. If you have a team of 10 - 15 folks in your organisation and want to quickly stand up a Jupyterhub + Postgres with MADLIB and PL/Python database you are in the correct place.

But why the Rugby name? Well this environment comes pre installed with PyRugga a library to help analyse Opta super scout files. That was the motifation behind this project. I wanted Juypter setup on my home NAS to analyse rugby matches. You can do more with it!

Quick Start

Before you begin you will need to generate a SSL certificate. Place your .key and .crt into secrets folder. You can either use a self signed cert or get a free cert from SSL for free. Files need to be named 'jupyterhub.crt' and 'juypterhub.key'. In production I would recommend running the environment behind NGNIX.

For the lazy I have provide SSL certs that are not valid. You will get a warning in your browser but you can ignore it and everything should still work.

Finally there is a file called userlist. This file file contains a list of users you would like to have setup at the start. I would recommend just having one admin user and then creating new users from the admin account. Note the first time you login as admin you will be asked to create a password. The default user is master_user

The environment can be built by running the build.sh script:

cp env.txt .env
bash build.sh
docker-compose up

You can access your environment via JupyterHub.To take the environment down type

docker-compose down

Database Setup

There is no password for Postgres as default so we would recommend changing these connection details and configuring Postgres to your needs. The details of how this can be done can be found in the Postgres section

Port = 5432
Host = 127.0.0.1
Username = postgres
Password = 

You can access Postgres via any SQL query browsers but we use pgAdmin 4.

Additional MADLIB and PL/Python can be installed as below

sudo docker exec -it "madlib_postgres" bash

psql -d postgres -c "CREATE EXTENSION plpythonu;"
psql -d postgres -c "CREATE EXTENSION plpython3u;"

MAKEFLAGS='-j1' pgxn install madlib \
    && pgxn load madlib

/usr/local/madlib/bin/madpack -s madlib -p postgres -c postgres@localhost:5432/postgres install
/usr/local/madlib/bin/madpack -s madlib -p postgres -c postgres@localhost:5432/postgres install-check

after this you will need to add users. Read this tutorial

Modifying your environment?

  • Docker-compose.yml contains the details of ports your environment is running on along with some environment variables

  • Perhaps the most important file of the configuration is jupyter_config.py. This file controls key aspects of the environment such as which user management system to use and what notebook servers can be started.

  • A key file that is currently not provided as part of the repo is a file named .env which holds important variables utilized across the building process. An example of how to setup this file is in the env.txt

  • There are three notebook services packaged with the environment. They can be found in the the folders begining with ds_. There are Docker files in each which can be modfied to your tasts along with README files for instruction on how to do so.

    • Default (this will load the first time you login )
    • TensorFlow (PyMC3)
    • R
  • Finally there is a Progres database this is located in the postgres folder. Some setup after you have stood up your environment for the first time will be required.

License

Modified BSD License

rda's People

Contributors

jlondal avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.