Giter Site home page Giter Site logo

man-group / notebooker Goto Github PK

View Code? Open in Web Editor NEW
843.0 23.0 79.0 2.88 MB

Productionise & schedule your Jupyter Notebooks as easily as you wrote them.

License: GNU Affero General Public License v3.0

Dockerfile 0.44% Python 81.48% Smarty 1.37% JavaScript 8.17% CSS 0.39% HTML 7.00% SCSS 0.79% Jupyter Notebook 0.36%
jupyter-notebook jupyter notebooks productionise publishing jupyter-notebooks

notebooker's Introduction

Notebooker

Productionise and schedule your Jupyter Notebooks, just as interactively as you wrote them. Notebooker is a webapp which can execute and parametrise Jupyter Notebooks as soon as they have been committed to git. The results are stored in MongoDB and searchable via the web interface, essentially turning your Jupyter Notebook into a production-style web-based report in a few clicks.

CircleCI Documentation Status

Run a Jupyter notebook as a report with parameters

Screenshot of "Run A Report" dialog

Execute Jupyter notebooks either on the webservice or command line

Screenshot of Executing a notebook

View the output of notebooks as static HTML

Screenshot of some notebook results

All results are accessible from the home page

Screenshot of the Notebooker homepage

Drill down into each template's results

Screenshot of result listings

Getting started

See the documentation at https://notebooker.readthedocs.io/ for installation instructions.

Notebooker has been tested on Linux, Windows 10, and OSX; the webapp has been tested on Google Chrome.

If you want to explore an example right away, you can use docker-compose:

cd docker
docker-compose up

That will expose Notebooker at http://localhost:8080/ with the example templates.

Contributors

Notebooker has been actively maintained at Man Group since late 2018, with the original concept built by Jon Bannister. It would not have been possible without contributions from:

And these fantastic projects:

notebooker's People

Contributors

aflag avatar ceallen avatar code0x58 avatar danyaalm avatar dependabot[bot] avatar devon-dan avatar jonbannister avatar marcinapostoluk avatar mrdanpearce avatar rs2 avatar samuelkhtu avatar yiskylee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

notebooker's Issues

Configurable link back to Notebook Templates git repo

The URL in the "execute a notebook" sidebar should be configurable, so that we can link back to the repo if users so choose.

If the configuration is called something like GIT_REPO_BASE_URL, then we can also extrapolate the URL for the individual templates to link directly back to their source code in GitHub/BitBucket (only if GIT_REPO_BASE_URL has been defined).

Support ipynb files without requiring conversion

It would be really useful to natively support ipynb files in notebooker, without requiring them to be converted to .py files first.

This would help reduce the cycle time from scratch notebook to automated report, if you could quickly change, commit and run via notebooker.

Add option to customize email subject

Currently it is only possible to choose email subject (when report is sent via email) if ran from command line. UI lacks the option to customize it. Would be nice to have.

Default from_email address is from a nonexistent domain

This email address belongs to a domain which doesn't exist. If someone responds either automatically or by mistake, a firewall may be triggered. This should be configurable by the user (perhaps as an attribute on the result object) and have a sensible default.

Results page is empty in Windows 10

Notebooker is a great and promising project, however I could not get it it working in Windows.

I had to figure out some specific dependency versions to be able to make it (almost fully) working in Docker
(in particular, pymongo==3.6, papermill==1.2.1)
I say "almost" because tests related to scheduling fail (and had some runtime errors related to git functionality).

In Windows 10 I tried do the same that worked in Docker: the same Anaconda version (2020.07 python=3.7), the same mongodb version (4.4.1), pymongo 3.6 and papermill 1.2.1.

Example notebooks get executed fine and results get stored into database. But when I want to see them, the web interface gives empty results table.
I've checked the underlying request:
http://127.0.0.1:11828/core/get_all_available_results?limit=100&report_name=sample/plot_random
it also gives empty output ([ ])

Enable me to run on kubernetes behind reverse proxy

I would like to tryout notebooker deployed on kubernetes.

For quick exploration purposes I just run the command

mkdir -p /home/jovyan/shared/.analytics-workspace/.mongodb;conda install -y -c anaconda mongodb=6.0.2;mongod --dbpath /home/jovyan/shared/.analytics-workspace/.mongodb &pip install notebooker;python -m ipykernel install --user --name=notebooker_kernel;notebooker-cli --mongo-host localhost:27017 start-webapp --port 11828

inside my one docker container.

The container runs and deployes fine.

But I cannot deploy to "/" as this path serves other purposes. I need to deploy to /some-subpath. So when I got to

https://domain/some-subpath

I see

image

I.e. I get a 404 not found

[2022-10-12 11:58:38] "GET /mt-uk-mongodb-notebooker HTTP/1.1" 404 331 0.001099

With other frameworks I deploy like Panel, Streamlit, Dash, Fast Api and Flask I can specify some --prefix to the server application.

But it seems not available with notebooker? Could you add it?

Clean up shims for old python version

For example there is code using six to handle 2/3 differences, but there is code that requires 3.5+ (e.g. type annotations syntax) and docs say 3.6+. As Python2 went EOL earlier this year, it's probably good to clean up the old code and dependencies.

Include Dockerfile in CI

This could also simplify things, as the Dockerfile includes the running of tests at the moment.

'Last run X minutes ago' compares UTC to naive timestamp

This means they're incorrect unless your local timezone is equal to UTC.

In general, any newly generated report will have a 'last run' time of

babel.dates.format_timedelta(datetime.datetime.utcnow() - datetime.datetime.now())

and will just show your difference from UTC.

Error installing Notebooker and report hunter not finding updates

I followed the installation steps provided in the Notebooker documentation, but I am having issues with the report hunter not finding any updates. The logs show the following messages:

INFO:notebooker.web.app:Notebooker is now running at http://0.0.0.0:11828
INFO:notebooker.web.report_hunter:Found 0 updates since None.
INFO:notebooker.web.report_hunter:Found 0 updates since 2023-04-19 04:01:03.564799.

github issue

I'm not sure what the issue could be, and I would appreciate any guidance on how to troubleshoot and resolve this issue. Is there anything else I can check or try?

Thank you for your help.

Bug: A very long-running check (>1h) will be marked as timed out

However, when the check completes it will save properly. In the time range from T0+1h to report completion, it appears that the report has completely failed, but it is working fine.

We need to potentially send a heartbeat to ensure that it is not improperly marked as having failed/timed out when it is actually running in the background.

pymongo.errors.DocumentTooLarge: command document too large error

Hi,

I am seeing this error occur when I run a notebook in Notebooker. The notebook runs fine when run locally.

Traceback (most recent call last):
  File "/default-medusa-venv/lib/python3.6/site-packages/notebooker-1!202105050846+n47d39d7-py3.6.egg/notebooker/execute_notebook.py", line 184, in run_report
    result_serializer.save_check_result(result)
  File "/default-medusa-venv/lib/python3.6/site-packages/notebooker-1!202105050846+n47d39d7-py3.6.egg/notebooker/serialization/mongo.py", line 124, in save_check_result
    self._save_to_db(notebook_result)
  File "/default-medusa-venv/lib/python3.6/site-packages/notebooker-1!202105050846+n47d39d7-py3.6.egg/notebooker/serialization/mongo.py", line 73, in _save_to_db
    self._save_raw_to_db(out_data)
  File "/default-medusa-venv/lib/python3.6/site-packages/notebooker-1!202105050846+n47d39d7-py3.6.egg/notebooker/serialization/mongo.py", line 62, in _save_raw_to_db
    self.library.replace_one({"_id": existing["_id"]}, out_data)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/collection.py", line 907, in replace_one
    collation=collation, session=session),
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/collection.py", line 835, in _update_retryable
    _update, session)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/mongo_client.py", line 1099, in _retryable_write
    return self._retry_with_session(retryable, func, s, None)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/mongo_client.py", line 1076, in _retry_with_session
    return func(session, sock_info, retryable)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/collection.py", line 831, in _update
    retryable_write=retryable_write)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/collection.py", line 796, in _update
    retryable_write=retryable_write).copy()
  File "/default-medusa-venv/lib/python3.6/site-packages/man.core-1!202105071906+ndc84b65-py3.6-linux-x86_64.egg/ahl/mongo/decorators.py", line 247, in _wrapped
    raise e
  File "/default-medusa-venv/lib/python3.6/site-packages/man.core-1!202105071906+ndc84b65-py3.6-linux-x86_64.egg/ahl/mongo/decorators.py", line 241, in _wrapped
    return orig_method(self, *args, **kwargs)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/pool.py", line 501, in command
    self._raise_connection_failure(error)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/pool.py", line 649, in _raise_connection_failure
    raise error
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/pool.py", line 496, in command
    collation=collation)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/network.py", line 107, in command
    name, size, max_bson_size + message._COMMAND_OVERHEAD)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/message.py", line 709, in _raise_document_too_large
    raise DocumentTooLarge("command document too large")
pymongo.errors.DocumentTooLarge: command document too large

The notebook result itself is maybe too large to be stored in mongo. Please address this issue.

Thank you

Add ability to add custom mongo connection logic

Usually you won't have a plaintext password in an environment variable (I hope) so we need to allow users to specify their own connection methods. This in future should be extendable to other storage mechanisms, e.g. postgres

Enable me to explore on Jupyterhub behind reverse proxy

I want to try out notebooker. I don't have access to python on my laptop for enterprise reasons. I can work inside jupyterhub.

I run

mkdir -p /home/jovyan/shared/.analytics-workspace/.mongodb
conda install -y -c anaconda mongodb=6.0.2
pip install notebooker;python -m ipykernel install --user --name=notebooker_kernel
mongod --dbpath /home/jovyan/shared/.analytics-workspace/.mongodb
notebooker-cli --mongo-host localhost:27017 start-webapp --port 11828

Both mongodb and notebooker start up successfully.

The jupyter-server-proxy is installed. So I would expect to be able to try out notebooker ui at

https://domain/namespace/user/user-name/proxy/11828

image

or

https://domain/namespace/user/user-name/proxy/11828/

image

But as you can see I cannot.

The issue is that all assets are expected to be found at the root /.

image

But they should not be. Either they should be found at some nested path that I should be able to specify via a --prefix flag. Or they should be referenced as ./static I believe.

Additional Context

I often run Panel succesfully from the terminal in my jupyter hub. Panel enables me to set a --prefix that is used to point to the static assets.

image

Its the same for other data app frameworks. For example streamlit.

Incorrect cron-schedule hint on when it is to run next

Looks like the cron scheduler used in the notebooker is different from regular cron and the engine used to generate type hints.
Days 1-5 in notebooker mean Tue-Sat (and not Mon-Fri) while the type hint resolves these to Mon-Fri

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.