Giter Site home page Giter Site logo

nextml / next Goto Github PK

View Code? Open in Web Editor NEW
160.0 22.0 53.0 59.54 MB

NEXT is a machine learning system that runs in the cloud and makes it easy to develop, evaluate, and apply active learning in the real-world. Ask better questions. Get better results. Faster. Automated.

Home Page: http://nextml.org

License: Apache License 2.0

Python 76.37% Shell 1.47% HTML 15.88% JavaScript 5.89% CSS 0.29% Dockerfile 0.10%

next's Introduction

Have a question? Ask us on Gitter!ย gitter We encourage asking the dev team questions

Documentation: https://github.com/nextml/NEXT/wiki

Website: http://nextml.org

NEXT is a system that makes it easy to develop, evaluate, and apply active learning.

Talks give a good brief introduction to NEXT at the highest level. For scientists and develoeprs, we most recommend the PyData Ann Arbor talk. It's an enhanced and refined version of the SciPy talk.

Venue Audience Length Link
PyData Ann Arbor Scientists and developers 1 hour https://www.youtube.com/watch?v=rTyu4QTXZTc
SciPy 2017 Scientific Python developers 30 minutes https://www.youtube.com/watch?v=blPjDYCvppY
Simons Institute conference on Interactive Learning Machine learning researchers 30 minutes https://youtu.be/ESXgbZQ1ZTk?t=1732

We give more detail on the items on launching experiments and getting setup in the SciPy 2017 proceedings: http://conference.scipy.org/proceedings/scipy2017/pdfs/scott_sievert.pdf.

This readme contains a quick start to launch the NEXT system on EC2, and to replicate and launch the experiments from the NEXT paper. There are more detailed launch instructions here.

For more information, in-depth tutorials, and API docs, we recommend visiting our GitHub wiki here. You can contact us at [email protected]

We have an experimental AMI that can be used to run NEXT in a purely application based rather than development environment. Included in the AMI is a basic version of our frontend. The AMI is still highly experimental and we give no guarantees on it being up to date with the current code. For more info please visit here.

Testing

Run py.test from NEXT/next. Tests will be run from your local machine but will ping an EC2 server to simulate a client.

Individual files can also be run with py.test. Running py.test test_api.py will only run test_api.py and allow relative imports (which allows from next.utils import timeit).

stdout can be captured with the -s flag for py.test.

pytest is installable with pip install pytest and has a strict backwards compatibility policy.

Getting the code

You can download the latest version of NEXT from github with the following clone command:

$ git clone https://github.com/nextml/NEXT.git

We are actively working to develop and improve NEXT, but users should be aware of the following caveats:

  • NEXT currently supports only UNIX based OS (e.g. Windows compatibility is not yet available).
  • An Amazon Web Services account is needed to launch NEXT on EC2; we have worked hard to make this process as simple as possible, at cost of ease of running the full NEXT stack on a local machine. We plan to make NEXT usable on a personal computer in the future.

Launching NEXT on EC2

First, you must set your Amazon Web Services (AWS) account credentials as enviornment variables. If you don't already have AWS account, you can follow our AWS account quickstart here or the official AWS account set-up guide here for an in-depth introduction. Make sure to have access to

  • AWS access key id
  • AWS secret access key
  • Key Pair (pem file)

Make sure to note down the region that your key pair was made in. By default, the script assumes the region is Oregon (us-west-2). If you choose to use a different region, every time you use the next_ec2.py script, make sure to specify the region --region=<region> (i.e., --region=us-west-2). For example, after selecting the regions "Oregon," the region us-west-2 is specified on the EC2 dashboard. If another region is used, an --ami option has to be included. For ease, we recommend using the Oregon region.

Export your AWS credentials as environment variables using:

$ export AWS_SECRET_ACCESS_KEY=[your_secret_aws_access_key_here]
$ export AWS_ACCESS_KEY_ID=[your_aws_access_key_id_here]

Note that you'll need to use your AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID again later, so save them in a secure place for convenient reference later.

Install the local python packages needed for NEXT:

$ cd NEXT
$ sudo pip install -r local_requirements.txt

Throughout the rest of this tutorial, we will be using the next_ec2.py startup script heavily. For more options and instructions, run python next_ec2.py without any arguments. Additionally, python next_ec2.py -h will provide helper options.

For persistent data storage, we first need to create a bucket in AWS S3 using:

$ cd ec2
$ python next_ec2.py --key-pair=[keypair] --identity-file=[key-file] createbucket [cluster-name]

where:

  • [keypair] is the name of your EC2 key pair
  • [key-file] is the private key file for your key pair
  • [cluster-name] is the custom name you create and assign to your cluster

This will print out another environment variable command export AWS_BUCKET_NAME=[bucket_uid]. Copy and paste this command into your terminal.

You will also need to use your bucket_uid later, so save it in a file along side your AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID for later reference.

Now you are ready to fire up the NEXT system using our launch command. This command will create a new EC2 instance, pull the NEXT repository to that instance, install all of the relevant Docker images, and finally run all Docker containers.

WARNING: Users should note that this script launches a single m3.large machine, the current default NEXT EC2 instance type. This instance type costs $0.14 per hour to run. For more detailed EC2 pricing information, refer to this AWS page. You can change specify the instance type you want to with the --instance-type option.

$ python next_ec2.py --key-pair=[keypair] --identity-file=[key-file] launch [cluster-name]

Once your terminal shows a stream of many multi-colored docker appliances, you are successfully running the NEXT system!

Replicating NEXT adaptive learning experiments

Because NEXT aims to make it easy to reproduce empirical active learning results, we provide a simple command to initialize the experiments performed in this study.

First, in a new terminal, export your AWS credentials and use get-master to obtain your public EC2 DNS.

$ export AWS_BUCKET_NAME=[your_aws_bucket_name_here]
$ cd NEXT/ec2
$ python next_ec2.py --key-pair=[keypair] --identity-file=[key-file] get-master [cluster-name]

Then export this public EC2 DNS.

$ export NEXT_BACKEND_GLOBAL_HOST=[your_public_ec2_DNS_here]
$ export NEXT_BACKEND_GLOBAL_PORT=8000

Now you can execute run_examples.py to initialize and launch the NEXT experiments.

$ cd ../examples
$ python run_examples.py

Once initialized, this script will return a link that you can distribute yourself or post as a HIT on Mechanical Turk. Visit:

http://your_public_ec2_DNS_here:8000/query/query_page/query_page/[exp_uid]/[exp_key]

where [exp_uid] and [exp_key] are unique identifiers for each of the respective Dueling Bandits Pure Exploration, Active Non-Metric Multidimensional Scaling (MDS), and Tuple Bandits Pure Exploration experiments respectively. See this wiki page for a little more information.

Navigate to the strange_fruit_triplet query link (the last one that printed out to your terminal) and answer some questions! Doing so will provide the system with data you can view and interact with in the next step.

Accessing NEXT experiment results, dashboards, and data visualizations

You can access interactive experiment dashboards and data visualizations at by clicking experiments at:

  • http://your_public_ec2_DNS:8000/dashboard/experiment_list

And obtain all logs for an experiment through our RESTful API, visit:

  • http://your_public_ec2_DNS:8000/api/experiment/[exp_uid]/[exp_key]/logs

Where, again, [exp_uid] corresponds to the unique Experiment ID shown on the experiment dashboard pages.

If you'd like to backup your database to access your data later, refer to this wiki for detailed steps.

Finally, you can terminate your EC2 instance and shutdown NEXT using:

$ cd ../ec2
$ python next_ec2.py --key-pair=[keypair] --identity-file=[key-file] destroy [cluster-name]

next's People

Contributors

ayonsn017 avatar barzamin avatar daniel3735928559 avatar dconatha avatar dconathan avatar dependabot[bot] avatar flownandez avatar kgjamieson avatar lalitkumarj avatar liamim avatar stsievert avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

next's Issues

Spot instance ability

I think the spark script has the ability to launch a spot instance. I'd like to flush this out for users.

dashboard enhancements: beautification

Right now the plots are using matplotib defaults and are pretty ugly. We should make these plots prettier

Scott said:
'''
On (3), look at seaborn, a statistical package from Stanford to make matplotlib plots attractive. With two lines of code, it'll temporarily make new matplotlib defaults to make the plots look really good. Check out their example gallery. Because it writes new defaults, it'll work with mpld3.fig_to_dict.

Making plots look sexy is easy: all that's required is

import seaborn as sns
sns.set() # new attractive defaults!
'''

Include troubleshooting guide

If the setup/anything doesn't go as planned, it'd be useful to have a troubleshooting resource (e.g., the setup issue found in #11). I'd recommend a wiki page titled 'troubleshooting' (with links in the readme!).

Clarify setup docs with startup script

While debugging #11, I made a script to debug more easily. @lalitkumarj suggested guiding users through making this script, which might make the install/setup process easier.

# if no IAM user:
#   * AWS > Identity and Access Management > Users > Add User
# if IAM user:
#   * AWS > IAM > Users > select username > User Actions > Manage Access Keys >
#     Create Access Key > Download credentials
# more details found at [1]
# [1]:http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSGettingStartedGuide/AWSCredentials.html
export AWS_ACCESS_KEY_ID=<access>
export AWS_SECRET_ACCESS_KEY=<secret>

# AWS > EC2 > Key Pairs > Create Key Pair
# KEY_FILE is the path to the downloaded file
# KEY_PAIR is the name entered in Amazon of the key
# (i.e, if "key" created, KEY_PAIR="key" not "key.pem")
export KEY_FILE=/Users/scott/Classes/security/AWS/SS_NEXT.pem
export KEY_PAIR=SS_NEXT

chmod 400 $KEY_FILE

cd /Users/scott/Desktop/NEXT/ec2
source activate py27

# This might be a useful alias
alias next_ec2='python next_ec2.py --key-pair=$KEY_PAIR --identity-file=$KEY_FILE'

# Launching a bucket into S3. Either run the command below or define AWS_BUCKET_NAME
#python next_ec2.py --key-pair=$KEY_PAIR --identity-file=$KEY_FILE createbucket SS_next
export AWS_BUCKET_NAME=<bucket uiud>
python next_ec2.py --key-pair=$KEY_PAIR --identity-file=$KEY_FILE launch SS_next

# after starts print out red and green `rabbitmqmonitor_1`, go to URL [2]
# launch scripts and expirements
#
# [2]:http://ec2-52-88-225-126.us-west-2.compute.amazonaws.com:8000/dashboard/experiment_list

Widget app_id

We seem to be overriding the app_id in the widget library. So it does not need to be sent through nextwidget.js! Make sure to remove this from any query pages.

Cleanup Dashboard CSS

There is like 50 million CSS imports without CDN's. We should try to remove these even if it requires re-styling.

Remove emotion_video_triplets

This is not an experiment in the next paper. It should not be in our repo. I'm fine with it being in a different repo...

Testing AWS account setup is manadatory

There's a line in the tutorial that says:

We now have all of the AWS credentials we need to launch NEXT. To verify that your account is set up, let's create an S3 bucket in AWS.

It doesn't give any indication that this section is mandatory and that future parts of the tutorial will depend on having the s3 bucket set up.

Generate a website

If we want to release this publicly, we should have an accompanying website. This website can detail the features, installation and documentation of NEXT.

GitHub offers domain hosting for stuff like this. If a new branch called gh_pages is created, that branch is visible at https://kgjamieson.github.io/next. GitHub has this process documented at GitHub Pages.

A generated site should use a static site generator, probably Hugo, Jekyll or Pelican (and there are dozens of themes for each of these). There's a full list of frameworks/generators listed at staticgen.

first launch using ec2 scripts does not succeed

Using ec2 scripts to launch NEXT, the containers appear to successfully build but when they're launched I got the following error for nextbackenddocker. Clearly since that fails to launch, all its dependencies also fail. This happens with absolutely no interaction on my part, so it should be easily reproducible. Funny thing is, when I docker_login, remove all the containers (docker rm -fv $(docker ps -a -q)) then do docker-compose up everything works great

nextbackenddocker_1 | [2015-10-30 16:20:11 +0000] [1] [DEBUG] Current configuration:
nextbackenddocker_1 |   proxy_protocol: False
nextbackenddocker_1 |   worker_connections: 1000
nextbackenddocker_1 |   statsd_host: None
nextbackenddocker_1 |   max_requests_jitter: 0
nextbackenddocker_1 |   post_fork: <function post_fork at 0x7f172c05a2a8>
nextbackenddocker_1 |   pythonpath: None
nextbackenddocker_1 |   enable_stdio_inheritance: True
nextbackenddocker_1 |   worker_class: gevent
nextbackenddocker_1 |   ssl_version: 3
nextbackenddocker_1 |   suppress_ragged_eofs: True
nextbackenddocker_1 |   syslog: False
nextbackenddocker_1 |   syslog_facility: user
nextbackenddocker_1 |   when_ready: <function when_ready at 0x7f172c053f50>
nextbackenddocker_1 |   pre_fork: <function pre_fork at 0x7f172c05a140>
nextbackenddocker_1 |   cert_reqs: 0
nextbackenddocker_1 |   preload_app: False
nextbackenddocker_1 |   keepalive: 2
nextbackenddocker_1 |   accesslog: None
nextbackenddocker_1 |   group: 0
nextbackenddocker_1 |   graceful_timeout: 30
nextbackenddocker_1 |   do_handshake_on_connect: False
nextbackenddocker_1 |   spew: False
nextbackenddocker_1 |   workers: 5
nextbackenddocker_1 |   proc_name: None
nextbackenddocker_1 |   sendfile: True
nextbackenddocker_1 |   pidfile: None
nextbackenddocker_1 |   umask: 0
nextbackenddocker_1 |   on_reload: <function on_reload at 0x7f172c053de8>
nextbackenddocker_1 |   pre_exec: <function pre_exec at 0x7f172c05a848>
nextbackenddocker_1 |   worker_tmp_dir: None
nextbackenddocker_1 |   post_worker_init: <function post_worker_init at 0x7f172c05a410>
nextbackenddocker_1 |   limit_request_fields: 100
nextbackenddocker_1 |   on_exit: <function on_exit at 0x7f172c05aed8>
nextbackenddocker_1 |   config: None
nextbackenddocker_1 |   secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}
nextbackenddocker_1 |   proxy_allow_ips: ['127.0.0.1']
nextbackenddocker_1 |   pre_request: <function pre_request at 0x7f172c05a9b0>
nextbackenddocker_1 |   post_request: <function post_request at 0x7f172c05aaa0>
nextbackenddocker_1 |   user: 0
nextbackenddocker_1 |   forwarded_allow_ips: ['127.0.0.1']
nextbackenddocker_1 |   worker_int: <function worker_int at 0x7f172c05a578>
nextbackenddocker_1 |   threads: 1
nextbackenddocker_1 |   max_requests: 0
nextbackenddocker_1 |   limit_request_line: 4094
nextbackenddocker_1 |   access_log_format: %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"
nextbackenddocker_1 |   certfile: None
nextbackenddocker_1 |   worker_exit: <function worker_exit at 0x7f172c05ac08>
nextbackenddocker_1 |   chdir: /next_backend
nextbackenddocker_1 |   paste: None
nextbackenddocker_1 |   default_proc_name: next.api.api:app
nextbackenddocker_1 |   errorlog: -
nextbackenddocker_1 |   loglevel: debug
nextbackenddocker_1 |   logconfig: None
nextbackenddocker_1 |   syslog_addr: udp://localhost:514
nextbackenddocker_1 |   syslog_prefix: None
nextbackenddocker_1 |   daemon: False
nextbackenddocker_1 |   ciphers: TLSv1
nextbackenddocker_1 |   on_starting: <function on_starting at 0x7f172c053c80>
nextbackenddocker_1 |   worker_abort: <function worker_abort at 0x7f172c05a6e0>
nextbackenddocker_1 |   bind: ['0.0.0.0:8000']
nextbackenddocker_1 |   raw_env: []
nextbackenddocker_1 |   reload: True
nextbackenddocker_1 |   check_config: False
nextbackenddocker_1 |   limit_request_field_size: 8190
nextbackenddocker_1 |   nworkers_changed: <function nworkers_changed at 0x7f172c05ad70>
nextbackenddocker_1 |   timeout: 30
nextbackenddocker_1 |   ca_certs: None
nextbackenddocker_1 |   django_settings: None
nextbackenddocker_1 |   tmp_upload_dir: None
nextbackenddocker_1 |   keyfile: None
nextbackenddocker_1 |   backlog: 2048
nextbackenddocker_1 |   logger_class: simple
nextbackenddocker_1 |   statsd_prefix: 
nextbackenddocker_1 | [2015-10-30 16:20:11 +0000] [1] [INFO] Starting gunicorn 19.3.0
nextbackenddocker_1 | [2015-10-30 16:20:11 +0000] [1] [DEBUG] Arbiter booted
nextbackenddocker_1 | [2015-10-30 16:20:11 +0000] [1] [INFO] Listening at: http://0.0.0.0:8000 (1)
nextbackenddocker_1 | [2015-10-30 16:20:11 +0000] [1] [INFO] Using worker: gevent
nextbackenddocker_1 | [2015-10-30 16:20:11 +0000] [9] [INFO] Booting worker with pid: 9
nextbackenddocker_1 | [2015-10-30 16:20:11 +0000] [10] [INFO] Booting worker with pid: 10
nextbackenddocker_1 | [2015-10-30 16:20:11 +0000] [11] [INFO] Booting worker with pid: 11
nextbackenddocker_1 | [2015-10-30 16:20:11 +0000] [16] [INFO] Booting worker with pid: 16
nextbackenddocker_1 | [2015-10-30 16:20:11 +0000] [25] [INFO] Booting worker with pid: 25
nextbackenddocker_1 | [2015-10-30 16:20:11 +0000] [1] [DEBUG] 5 workers
nextbackenddocker_1 | [2015-10-30 16:20:11 +0000] [11] [ERROR] Exception in worker process:
nextbackenddocker_1 | Traceback (most recent call last):
nextbackenddocker_1 |   File "/usr/local/lib/python2.7/dist-packages/gunicorn/arbiter.py", line 507, in spawn_worker
nextbackenddocker_1 |     worker.init_process()
nextbackenddocker_1 |   File "/usr/local/lib/python2.7/dist-packages/gunicorn/workers/ggevent.py", line 192, in init_process
nextbackenddocker_1 |     super(GeventWorker, self).init_process()
nextbackenddocker_1 |   File "/usr/local/lib/python2.7/dist-packages/gunicorn/workers/base.py", line 118, in init_process
nextbackenddocker_1 |     self.wsgi = self.app.wsgi()
nextbackenddocker_1 |   File "/usr/local/lib/python2.7/dist-packages/gunicorn/app/base.py", line 67, in wsgi
nextbackenddocker_1 |     self.callable = self.load()
nextbackenddocker_1 |   File "/usr/local/lib/python2.7/dist-packages/gunicorn/app/wsgiapp.py", line 65, in load
nextbackenddocker_1 |     return self.load_wsgiapp()
nextbackenddocker_1 |   File "/usr/local/lib/python2.7/dist-packages/gunicorn/app/wsgiapp.py", line 52, in load_wsgiapp
nextbackenddocker_1 |     return util.import_app(self.app_uri)
nextbackenddocker_1 |   File "/usr/local/lib/python2.7/dist-packages/gunicorn/util.py", line 355, in import_app
nextbackenddocker_1 |     __import__(module)
nextbackenddocker_1 |   File "/next_backend/next/api/api.py", line 1, in <module>
nextbackenddocker_1 |     from next.api import api_blueprint
nextbackenddocker_1 |   File "/next_backend/next/api/api_blueprint.py", line 39, in <module>
nextbackenddocker_1 |     from next.api.resources.targets import Targets
nextbackenddocker_1 |   File "/next_backend/next/api/resources/targets.py", line 17, in <module>
nextbackenddocker_1 |     from next.api.widgets_library import widgetManager
nextbackenddocker_1 |   File "/next_backend/next/api/widgets_library/__init__.py", line 1, in <module>
nextbackenddocker_1 |     from widget_manager import widgetManager
nextbackenddocker_1 |   File "/next_backend/next/api/widgets_library/widget_manager.py", line 1, in <module>
nextbackenddocker_1 |     import next.apps.TupleBanditsPureExploration.widgets
nextbackenddocker_1 |   File "/next_backend/next/apps/TupleBanditsPureExploration/__init__.py", line 1, in <module>
nextbackenddocker_1 |     from .TupleBanditsPureExploration import *
nextbackenddocker_1 |   File "/next_backend/next/apps/TupleBanditsPureExploration/TupleBanditsPureExploration.py", line 23, in <module>
nextbackenddocker_1 |     from next.apps.TupleBanditsPureExploration.Dashboard import TupleBanditsPureExplorationDashboard
nextbackenddocker_1 |   File "/next_backend/next/apps/TupleBanditsPureExploration/Dashboard.py", line 15, in <module>
nextbackenddocker_1 |     import matplotlib.pyplot as plt
nextbackenddocker_1 |   File "/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line 24, in <module>
nextbackenddocker_1 |     import matplotlib.colorbar
nextbackenddocker_1 |   File "/usr/lib/pymodules/python2.7/matplotlib/colorbar.py", line 29, in <module>
nextbackenddocker_1 |     import matplotlib.collections as collections
nextbackenddocker_1 |   File "/usr/lib/pymodules/python2.7/matplotlib/collections.py", line 23, in <module>
nextbackenddocker_1 |     import matplotlib.backend_bases as backend_bases
nextbackenddocker_1 |   File "/usr/lib/pymodules/python2.7/matplotlib/backend_bases.py", line 50, in <module>
nextbackenddocker_1 |     import matplotlib.textpath as textpath
nextbackenddocker_1 |   File "/usr/lib/pymodules/python2.7/matplotlib/textpath.py", line 11, in <module>
nextbackenddocker_1 |     import matplotlib.font_manager as font_manager
nextbackenddocker_1 |   File "/usr/lib/pymodules/python2.7/matplotlib/font_manager.py", line 1356, in <module>
nextbackenddocker_1 |     _rebuild()
nextbackenddocker_1 |   File "/usr/lib/pymodules/python2.7/matplotlib/font_manager.py", line 1343, in _rebuild
nextbackenddocker_1 |     pickle_dump(fontManager, _fmcache)
nextbackenddocker_1 |   File "/usr/lib/pymodules/python2.7/matplotlib/font_manager.py", line 939, in pickle_dump
nextbackenddocker_1 |     with open(filename, 'wb') as fh:
nextbackenddocker_1 | IOError: [Errno 2] No such file or directory: '/tmp/matplotlib-root/fontList.cache'
nextbackenddocker_1 | Traceback (most recent call last):
nextbackenddocker_1 |   File "/usr/local/lib/python2.7/dist-packages/gunicorn/arbiter.py", line 507, in spawn_worker
nextbackenddocker_1 |     worker.init_process()
nextbackenddocker_1 |   File "/usr/local/lib/python2.7/dist-packages/gunicorn/workers/ggevent.py", line 192, in init_process
nextbackenddocker_1 |     super(GeventWorker, self).init_process()
nextbackenddocker_1 |   File "/usr/local/lib/python2.7/dist-packages/gunicorn/workers/base.py", line 118, in init_process
nextbackenddocker_1 |     self.wsgi = self.app.wsgi()
nextbackenddocker_1 |   File "/usr/local/lib/python2.7/dist-packages/gunicorn/app/base.py", line 67, in wsgi
nextbackenddocker_1 |     self.callable = self.load()
nextbackenddocker_1 |   File "/usr/local/lib/python2.7/dist-packages/gunicorn/app/wsgiapp.py", line 65, in load
nextbackenddocker_1 |     return self.load_wsgiapp()
nextbackenddocker_1 |   File "/usr/local/lib/python2.7/dist-packages/gunicorn/app/wsgiapp.py", line 52, in load_wsgiapp
nextbackenddocker_1 |     return util.import_app(self.app_uri)
nextbackenddocker_1 |   File "/usr/local/lib/python2.7/dist-packages/gunicorn/util.py", line 355, in import_app
nextbackenddocker_1 |     __import__(module)
nextbackenddocker_1 |   File "/next_backend/next/api/api.py", line 1, in <module>
nextbackenddocker_1 |     from next.api import api_blueprint
nextbackenddocker_1 |   File "/next_backend/next/api/api_blueprint.py", line 39, in <module>
nextbackenddocker_1 |     from next.api.resources.targets import Targets
nextbackenddocker_1 |   File "/next_backend/next/api/resources/targets.py", line 17, in <module>
nextbackenddocker_1 |     from next.api.widgets_library import widgetManager
nextbackenddocker_1 |   File "/next_backend/next/api/widgets_library/__init__.py", line 1, in <module>
nextbackenddocker_1 |     from widget_manager import widgetManager
nextbackenddocker_1 |   File "/next_backend/next/api/widgets_library/widget_manager.py", line 1, in <module>
nextbackenddocker_1 |     import next.apps.TupleBanditsPureExploration.widgets
nextbackenddocker_1 |   File "/next_backend/next/apps/TupleBanditsPureExploration/__init__.py", line 1, in <module>
nextbackenddocker_1 |     from .TupleBanditsPureExploration import *
nextbackenddocker_1 |   File "/next_backend/next/apps/TupleBanditsPureExploration/TupleBanditsPureExploration.py", line 23, in <module>
nextbackenddocker_1 |     from next.apps.TupleBanditsPureExploration.Dashboard import TupleBanditsPureExplorationDashboard
nextbackenddocker_1 |   File "/next_backend/next/apps/TupleBanditsPureExploration/Dashboard.py", line 15, in <module>
nextbackenddocker_1 |     import matplotlib.pyplot as plt
nextbackenddocker_1 |   File "/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line 24, in <module>
nextbackenddocker_1 |     import matplotlib.colorbar
nextbackenddocker_1 |   File "/usr/lib/pymodules/python2.7/matplotlib/colorbar.py", line 29, in <module>
nextbackenddocker_1 |     import matplotlib.collections as collections
nextbackenddocker_1 |   File "/usr/lib/pymodules/python2.7/matplotlib/collections.py", line 23, in <module>
nextbackenddocker_1 |     import matplotlib.backend_bases as backend_bases
nextbackenddocker_1 |   File "/usr/lib/pymodules/python2.7/matplotlib/backend_bases.py", line 50, in <module>
nextbackenddocker_1 |     import matplotlib.textpath as textpath
nextbackenddocker_1 |   File "/usr/lib/pymodules/python2.7/matplotlib/textpath.py", line 11, in <module>
nextbackenddocker_1 |     import matplotlib.font_manager as font_manager
nextbackenddocker_1 |   File "/usr/lib/pymodules/python2.7/matplotlib/font_manager.py", line 1356, in <module>
nextbackenddocker_1 |     _rebuild()
nextbackenddocker_1 |   File "/usr/lib/pymodules/python2.7/matplotlib/font_manager.py", line 1343, in _rebuild
nextbackenddocker_1 |     pickle_dump(fontManager, _fmcache)
nextbackenddocker_1 |   File "/usr/lib/pymodules/python2.7/matplotlib/font_manager.py", line 939, in pickle_dump
nextbackenddocker_1 |     with open(filename, 'wb') as fh:
nextbackenddocker_1 | IOError: [Errno 2] No such file or directory: '/tmp/matplotlib-root/fontList.cache'
nextbackenddocker_1 | [2015-10-30 16:20:11 +0000] [11] [INFO] Worker exiting (pid: 11)
nextbackenddocker_1 | [2015-10-30 16:20:12 +0000] [1] [DEBUG] 5 workers
nextbackenddocker_1 | [2015-10-30 16:20:13 +0000] [10] [INFO] Worker exiting (pid: 10)
nextbackenddocker_1 | [2015-10-30 16:20:13 +0000] [9] [INFO] Worker exiting (pid: 9)
nextbackenddocker_1 | [2015-10-30 16:20:13 +0000] [16] [INFO] Worker exiting (pid: 16)
nextbackenddocker_1 | [2015-10-30 16:20:14 +0000] [25] [INFO] Worker exiting (pid: 25)
nextbackenddocker_1 | [2015-10-30 16:20:14 +0000] [1] [INFO] Shutting down: Master
nextbackenddocker_1 | [2015-10-30 16:20:14 +0000] [1] [INFO] Reason: Worker failed to boot. 

Store commit version in database

Because some features of NEXT are not backwards compatible with previous versions, we should save the commit version / hash in the database whenever a NEXT system is launched. This way, given just a backup file of NEXT, we can restore using the correct version of the software.

PEP-8 the code

It is important that we try to follow PEP-8 style guides.

I will start going through the code and PEP-8'ing various parts.

Comment resource client

While the Database_api.py is extensively documented, resource_client.py is devoid of any documentation.

Notate the dashboards

The plots on the dashboards have no explanation and are often meaningless unless one knows how the numbers were produced (i.e. how is network delay measured?).

On startup, no default output specified

While setting up NEXT EC2, I wasn't sure my system was working even though it was. I was receiving lines of output like below

rabbitmqmonitor_1   | [ 2015-09-22 23:04:15.668313 ] domains with active workers = [u'e46a4ad06c3a']
nextbackenddocker_1 | [2015-09-22 23:04:16 +0000] [1] [DEBUG] 1 workers
nextbackenddocker_1 | [2015-09-22 23:04:17 +0000] [1] [DEBUG] 1 workers
nextbackenddocker_1 | [2015-09-22 23:04:18 +0000] [1] [DEBUG] 1 workers

and didn't realize I could just go to http://ec2-52-88-225-126.us-west-2.compute.amazonaws.com:8000/dashboard/experiment_list.

I think it'd be useful to both print out the output that it's successfully launched and the URL to go to. Something along the lines of below (and printed in green)

SUCCESS: dashboard launched at http://ec2-52-88-225-126.us-west-2.compute.amazonaws.com:8000/dashboard/experiment_list

Query page - lockout input

Lock out the input during a query submit/response. This might require a bit of restructuring around next_widget.js.

Readme organization

In the readme of a project, I tend to include a high level overview, not specific details of how to install. I would do something similar for this project.

I tend to give an overview of the project and include links to other resources. I answer questions about what the project is about, what it can do, and give links to other resources. I might suggest something like:


NEXT

next_logo.png

NEXT website: http://nextml.org

NEXT documentation: https://github.com/nextml/NEXT/wiki

NEXT paper: http://www.cs.berkeley.edu/~kjamieson/resources/next.pdf

What is NEXT?

NEXT is a machine learning system that runs in the cloud and makes it easy to develop, evaluate, and apply active learning in the real-world. Instead of asking random questions, NEXT bases questions of past responses and provides a GUI to test a suite of algorithms... and any adaptive machine learning algorithm can be added to NEXT!

For example, NEXT can be used to find the "best" element after making pairwise comparisons. You might be asking for the funniest caption for a New York Times comic. The NEXT system will ask you to make a series of pairwise comparisons and will determine the funniest algorithm. That is, it present the following choice:

choice

After doing this many times (and making many pairwise comparisons), it might find "the last guy got flushed" as the funniest example.

How do I launch NEXT?

Complete details can be found on the wiki.

NEXT uses personal web server through the AWS EC2 service. Complete details can be found on the wiki.

Who can use NEXT?

We hope that anyone can use NEXT -- active machine learning researchers or biologists or psychologists. We have developed both a graphical user interface (GUI) and command line interface (CLI). The CLI instructions can be found at TODO and a screenshot of the GUI is visible below:

screen shot 2015-10-01 at 2 30 56 pm

dashboard improvements

I've started a new branch - board-improvements in an effort to convert our plotting library over to mpld3: http://mpld3.github.io/

The dashboards are considerably lighter than before and support zooming. Additional features like tooltips are easily added as well.

I've also moved AppDashboard.py into next/dashboard and added a basic.html page that is defaulted to with all these common graphs if an app-specific dashbaord is not specified in next/apps/[{ app_id }}/dashbaord.

So far I've converted all of the existing basic plots common to all algorithms (quick and dirty, they're not the prettiest thing in the world). What's left:

  1. Add descriptions of each section describing what is being plotted, how its being calculated, and why its useful for the users to look at
  2. Tabbing for the different algorithms http://getbootstrap.com/javascript/#tabs
  3. Update logic in next/dashboard/dashboard.py so that it first looks for the dashboard html file in next/apps/[{ app_id }}/dashbaord but if it fails it will default to next/dashboard/templates/basic.html (I think a try catch should work here)
  4. Make the activity histogram at the top look sexy and informative. I never found these histograms so useful, so at the very least they could look pretty
  5. Add links on basic.html for query view and to download participants so that the dashboard the ultimate one-top-shop for all statistics. We could also move the "system" links to the dashboard - maybe put a real-time CPU plot there

Debrief and Instruction accept html

April wants to be able to input arbitrary html. I think in addition to instructions, we need an "introduction" panel which is basically the modal. The alternative is custom query pages on a forked version...

Zip participant data before sending response

Currently when you request participant data it sends back a json object that is enormous and highly compressable. For example, the raw participant data I just pulled was 14.3 mb and after compressing its just 1.4mb

consolidate dashboard code

  1. Move next/apps/AppDashboard.py to next/dashboard/AppDashboard.py
  2. Create default html page that has all the plots of AppDashboard (i.e. all the plots that are common to all apps no matter what). This html page would be superceded, or appended, by any custom dashboard in the specific app folder (e.g. next/apps/PoolBasedTripletsMDS)

Comment options in launch script

While the launch script examples work its not obvious how to change the behavior even a little without digging into the code.

For example, in dueling bandits its not obvious that to not have a "context" all you do is comment it out or simply not attach it the experiment object. And how to switch the context between an image or text

Clarify parameters on "New experiment" page

On the NEXT frontend GUI, Dashboard > Select project > new experiment > * presents a bunch of options which are not clearly explained. At first glance, it's not clear what "context type" means or what impact it has.

I'm thinking of having an interface like below:

screen shot 2015-09-26 at 8 58 23 pm

The entire page needs clarification: from context to algorithm choice.

Database backup to local file

NEXT currently relies on AWS S3 to save backups of the database. For simplicity we should give the user the opportunity to download the database to a file. That is, for any running NEXT system, there should be a URL (with appropriate security keys given) that returns a zipped version of the entire database and everything necessary to relaunch the system on another machine.

In addition, upon launch of a NEXT system, perhaps we can have a URL with a very simple HTML page that allows us to upload the same file as above to restore the NEXT database.

This is all possible through S3 and the scripts but we should give people the opportunity to save and restore to and from a local file.

dashboard enhancements: add endpoints

Add links on basic.html for query view and to download participants so that the dashboard the ultimate one-top-shop for all statistics. We could also move the "system" links to the dashboard - maybe put a real-time CPU plot there

Config not optimized for default ec2 machine

The number of gunicorn workers and celery workers is not optimized for the small default machine on ec2. This was because we would always use larger machines for actual experiments but users of the system end up more often than not just using the default machine with not great performance as indicated by the queues backing up on the dashboards

We need to make the default machine beefier and/or optimize the config for the default machine

Simplify query_page url's

There are like 17 query_page/query_page/query/query_page in the query_page url. Find out why and shrink it.

EC2 region and AMI parameter can block startup

I ran a set of commands that according to the readme.md/wiki should work. However, I ran into an issue that took several hours to resolve (described below). Long story short, the DEFAULT_REGION and DEFAULT_AMI defined in ec2/next_ec2.py have to be aligned with the EC2 defaults. The readme/wiki should indicate these changes, perhaps under a "Troubleshooting" wiki page.

export AWS_ACCESS_KEY_ID=<key>
export AWS_SECRET_ACCESS_KEY=<secret>
export KEY_FILE=/Users/scott/Classes/security/SSkey
export KEY_PAIR=SSkey

cd /Users/scott/Desktop/NEXT/ec2
source activate py27
#python next_ec2.py --key-pair=$KEY_PAIR --identity-file=$KEY_FILE createbucket testing2
export AWS_BUCKET_NAME=<bucket>
python next_ec2.py --key-pair=$KEY_PAIR --identity-file=$KEY_FILE launch testing2

This last command produced the following output:

> python next_ec2.py --key-pair=$KEY_PAIR --identity-file=$KEY_FILE launch testing2
ERROR:boto:400 Bad Request
ERROR:boto:<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>InvalidKeyPair.NotFound</Code><Message>The key pair 'SSkey' does not exist</Message></Error></Errors><RequestID>907c6818-8cbb-49fa-a912-2a5557cf7b11</RequestID></Response>
Traceback (most recent call last):
  File "next_ec2.py", line 1210, in <module>
    main()
  File "next_ec2.py", line 1202, in main
    real_main()
  File "next_ec2.py", line 1009, in real_main
    (master_nodes, slave_nodes) = launch_cluster(conn, opts, cluster_name)
  File "next_ec2.py", line 519, in launch_cluster
    user_data=user_data_content)
  File "/Users/scott/anaconda/envs/py27/lib/python2.7/site-packages/boto/ec2/image.py", line 329, in run
    tenancy=tenancy, dry_run=dry_run)
  File "/Users/scott/anaconda/envs/py27/lib/python2.7/site-packages/boto/ec2/connection.py", line 973, in run_instances
    verb='POST')
  File "/Users/scott/anaconda/envs/py27/lib/python2.7/site-packages/boto/connection.py", line 1208, in get_object
    raise self.ResponseError(response.status, response.reason, body)
boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>InvalidKeyPair.NotFound</Code><Message>The key pair 'SSkey' does not exist</Message></Error></Errors><RequestID>907c6818-8cbb-49fa-a912-2a5557cf7b11</RequestID></Response

which eventually led to a spark-users mailing list which indicated that Spark only supports regions us-east. Because next_ec2.py uses a Spark startup script, I saw my default EC2 region as shown in this image:

screen shot 2015-09-20 at 5 19 24 pm

(changing this changes the default keys/etc)

In ec2/next_ec2.py I then defined

DEFAULT_REGION = 'us-east-1'
DEFAULT_AMI = 'ami-d05e75b8'  # Ubuntu Server 14.04 LTS

which is different from the defaults and could launch a NEXT EC2 instance.

This issue can also be resolved by changing your default region to Oregon, the default region for the next_ec2.py script. That is, if this issue encountered, you can just change your region like below instead of editing the source:

screen shot 2015-10-01 at 11 59 10 am

dashboards do not reflect true turn around time of query due to gunicorns

timestamps are only recorded once gunicorn passes the requests to python but if the gunicorn workers get overwhelmed, it can take a long time to serve a response because of slow gunicorn workers, not the backend. In fact, the dashboard will show very fast turn around times while the actual turn around times are abysmal because of gunicorn failures.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.