man-group / notebooker Goto Github PK

Productionise & schedule your Jupyter Notebooks as easily as you wrote them.

License: GNU Affero General Public License v3.0

Dockerfile 0.44% Python 81.48% Smarty 1.37% JavaScript 8.17% CSS 0.39% HTML 7.00% SCSS 0.79% Jupyter Notebook 0.36%

jupyter-notebook jupyter notebooks productionise publishing jupyter-notebooks

notebooker's Introduction

Productionise and schedule your Jupyter Notebooks, just as interactively as you wrote them. Notebooker is a webapp which can execute and parametrise Jupyter Notebooks as soon as they have been committed to git. The results are stored in MongoDB and searchable via the web interface, essentially turning your Jupyter Notebook into a production-style web-based report in a few clicks.

Run a Jupyter notebook as a report with parameters

Execute Jupyter notebooks either on the webservice or command line

View the output of notebooks as static HTML

All results are accessible from the home page

Drill down into each template's results

Getting started

See the documentation at https://notebooker.readthedocs.io/ for installation instructions.

Notebooker has been tested on Linux, Windows 10, and OSX; the webapp has been tested on Google Chrome.

If you want to explore an example right away, you can use docker-compose:

cd docker
docker-compose up

That will expose Notebooker at http://localhost:8080/ with the example templates.

Contributors

Notebooker has been actively maintained at Man Group since late 2018, with the original concept built by Jon Bannister. It would not have been possible without contributions from:

And these fantastic projects:

notebooker's People

Contributors

Stargazers

Watchers

Forkers

rs2 codestoned1 code0x58 wang-shun smitakshigupta maxcodextc steveshep mohitsethi algoricky stjordanis gunjanrt04 sanjibansg fagan2888 shalevy1 codeaudit ilyaselitser maybeee18 koladea bohblue2 edf825 laokpa zwilhelmm davidfelsen geopars uddhavm dunckerr cgd1 lenamax2355 ceallen j-maxey priya-gittest vivek1240 bkbonde aria1991 abkedar nicolizamacorrea luchobrown vishalsingh17 neoflo jjihed mforootan spencerx nanaakwasiabayieboateng simonmarti1992 nayanemaia blazova mohd-muzamil kevinvasquezb ersinalan bimec dlearningplt xberkayb desolatetraveller aadorian python-repository-hub simplified-mind wallace46886799 samuelkhtu lenapheno gadbees 10sun yangowen001 jeffamaxey gg-big-org ruhroh arunabhdas valeman marcinapostoluk iq-scm zcr268 kukrev mrdanpearce praveen686 erickvivas415 pakloong yiskylee asavas terragord7 o7s8r6

notebooker's Issues

Build in py38

Configurable link back to Notebook Templates git repo

The URL in the "execute a notebook" sidebar should be configurable, so that we can link back to the repo if users so choose.

If the configuration is called something like GIT_REPO_BASE_URL, then we can also extrapolate the URL for the individual templates to link directly back to their source code in GitHub/BitBucket (only if GIT_REPO_BASE_URL has been defined).

Delete button does not work for scheduler screen beyond page 1

The fix will be to call addCallbacks() either for all rows or whenever the table of schedules is modified in some way.

/latest-successful URL should use parameters if given

Be able to configure a max timeout for long-running reports

The current limit set in the report_hunter is 60 minutes. There should be an option to extend this.

Rerunning a report from the result screen doesn't hide code input

It seems like the "don't generate code" command to nbconvert doesn't get sent when you ask for a rerun of a report which previously had this selected.

Support ipynb files without requiring conversion

It would be really useful to natively support ipynb files in notebooker, without requiring them to be converted to .py files first.

This would help reduce the cycle time from scratch notebook to automated report, if you could quickly change, commit and run via notebooker.

MongoDB queries should work with sharded libraries

Native cron scheduler doesn't match convention

40 10 * * 1-5 is running Tuesday to Saturday rather than Monday to Friday.

Add option to customize email subject

Currently it is only possible to choose email subject (when report is sent via email) if ran from command line. UI lacks the option to customize it. Would be nice to have.

Grey out/disable button when rerun is clicked

Default from_email address is from a nonexistent domain

This email address belongs to a domain which doesn't exist. If someone responds either automatically or by mistake, a firewall may be triggered. This should be configurable by the user (perhaps as an attribute on the result object) and have a sensible default.

Widen results display

The width is too narrow

Do not show hidden directories in PY_TEMPLATE_DIR

For example, .git is shown in the docker-compose setup from #14

"Email From" not preserved on re-run

On notebook re-run "Email to" is preserved but "Email from" isn't

Add a button which displays the stdout of the job which executed the notebook

Create a view of all report results divided by report name

And perhaps subdivided by parameters

Push directly to pypi from CI

Results page is empty in Windows 10

Notebooker is a great and promising project, however I could not get it it working in Windows.

I had to figure out some specific dependency versions to be able to make it (almost fully) working in Docker
(in particular, pymongo==3.6, papermill==1.2.1)
I say "almost" because tests related to scheduling fail (and had some runtime errors related to git functionality).

In Windows 10 I tried do the same that worked in Docker: the same Anaconda version (2020.07 python=3.7), the same mongodb version (4.4.1), pymongo 3.6 and papermill 1.2.1.

Example notebooks get executed fine and results get stored into database. But when I want to see them, the web interface gives empty results table.
I've checked the underlying request:
http://127.0.0.1:11828/core/get_all_available_results?limit=100&report_name=sample/plot_random
it also gives empty output ([ ])

AttributeError: 'Cursor' object has no attribute 'count'

With pymongo==4.0.2
The following line result in AttributeError: 'Cursor' object has no attribute 'count'

notebooker/notebooker/serialization/mongo.py

Line 450 in 3277684

return self._get_raw_results({"report_name": report_name}, {}, 0).count()

Add ability to manually trigger a scheduled report

Enable me to run on kubernetes behind reverse proxy

I would like to tryout notebooker deployed on kubernetes.

For quick exploration purposes I just run the command

mkdir -p /home/jovyan/shared/.analytics-workspace/.mongodb;conda install -y -c anaconda mongodb=6.0.2;mongod --dbpath /home/jovyan/shared/.analytics-workspace/.mongodb &pip install notebooker;python -m ipykernel install --user --name=notebooker_kernel;notebooker-cli --mongo-host localhost:27017 start-webapp --port 11828

inside my one docker container.

The container runs and deployes fine.

But I cannot deploy to "/" as this path serves other purposes. I need to deploy to /some-subpath. So when I got to

https://domain/some-subpath

I see

I.e. I get a 404 not found

[2022-10-12 11:58:38] "GET /mt-uk-mongodb-notebooker HTTP/1.1" 404 331 0.001099

With other frameworks I deploy like Panel, Streamlit, Dash, Fast Api and Flask I can specify some --prefix to the server application.

But it seems not available with notebooker? Could you add it?

"Delete all" button on report listing screen

Improve install time from egg

Unzipping the multitude of JS files takes ages. Can we speed this up using e.g. webpack?

Deleting a report should also delete the report on GridFS

Freeze package dependencies

Builds aren't reproducible as the dependencies aren't pinned

Report hunter thread should occasionally delete gridfs entries for reports marked as deleted

Clean up shims for old python version

For example there is code using six to handle 2/3 differences, but there is code that requires 3.5+ (e.g. type annotations syntax) and docs say 3.6+. As Python2 went EOL earlier this year, it's probably good to clean up the old code and dependencies.

Include Dockerfile in CI

This could also simplify things, as the Dockerfile includes the running of tests at the moment.

'Last run X minutes ago' compares UTC to naive timestamp

This means they're incorrect unless your local timezone is equal to UTC.

In general, any newly generated report will have a 'last run' time of

babel.dates.format_timedelta(datetime.datetime.utcnow() - datetime.datetime.now())

and will just show your difference from UTC.

Add option to pass scheduled cron time to the notebook

Being able to read scheduled cron time from the notebook would improve the use case of using notebooker as tool to generate periodic reports. Might also need to maintain that time if same report is re-run.

Docker image doesn't have the tex packages needed to render PDF

As can be seen by using the running the example template and choosing to render a PDF while using the docker-compose setup from #14

Error installing Notebooker and report hunter not finding updates

I followed the installation steps provided in the Notebooker documentation, but I am having issues with the report hunter not finding any updates. The logs show the following messages:

INFO:notebooker.web.app:Notebooker is now running at http://0.0.0.0:11828
INFO:notebooker.web.report_hunter:Found 0 updates since None.
INFO:notebooker.web.report_hunter:Found 0 updates since 2023-04-19 04:01:03.564799.

I'm not sure what the issue could be, and I would appreciate any guidance on how to troubleshoot and resolve this issue. Is there anything else I can check or try?

Thank you for your help.

Webapp configuration documentation is wrong

It should essentially just list out the click command as it is descriptive enough

Different email address for failed and succeeded reports

It might make sense to split the target email address into two:

one for reports that succeeded - these normally go to the target audience
another for reports that failed - which might go to tech/support team

Bug: A very long-running check (>1h) will be marked as timed out

However, when the check completes it will save properly. In the time range from T0+1h to report completion, it appears that the report has completely failed, but it is working fine.

We need to potentially send a heartbeat to ensure that it is not improperly marked as having failed/timed out when it is actually running in the background.

pymongo.errors.DocumentTooLarge: command document too large error

Hi,

I am seeing this error occur when I run a notebook in Notebooker. The notebook runs fine when run locally.

Traceback (most recent call last):
  File "/default-medusa-venv/lib/python3.6/site-packages/notebooker-1!202105050846+n47d39d7-py3.6.egg/notebooker/execute_notebook.py", line 184, in run_report
    result_serializer.save_check_result(result)
  File "/default-medusa-venv/lib/python3.6/site-packages/notebooker-1!202105050846+n47d39d7-py3.6.egg/notebooker/serialization/mongo.py", line 124, in save_check_result
    self._save_to_db(notebook_result)
  File "/default-medusa-venv/lib/python3.6/site-packages/notebooker-1!202105050846+n47d39d7-py3.6.egg/notebooker/serialization/mongo.py", line 73, in _save_to_db
    self._save_raw_to_db(out_data)
  File "/default-medusa-venv/lib/python3.6/site-packages/notebooker-1!202105050846+n47d39d7-py3.6.egg/notebooker/serialization/mongo.py", line 62, in _save_raw_to_db
    self.library.replace_one({"_id": existing["_id"]}, out_data)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/collection.py", line 907, in replace_one
    collation=collation, session=session),
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/collection.py", line 835, in _update_retryable
    _update, session)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/mongo_client.py", line 1099, in _retryable_write
    return self._retry_with_session(retryable, func, s, None)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/mongo_client.py", line 1076, in _retry_with_session
    return func(session, sock_info, retryable)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/collection.py", line 831, in _update
    retryable_write=retryable_write)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/collection.py", line 796, in _update
    retryable_write=retryable_write).copy()
  File "/default-medusa-venv/lib/python3.6/site-packages/man.core-1!202105071906+ndc84b65-py3.6-linux-x86_64.egg/ahl/mongo/decorators.py", line 247, in _wrapped
    raise e
  File "/default-medusa-venv/lib/python3.6/site-packages/man.core-1!202105071906+ndc84b65-py3.6-linux-x86_64.egg/ahl/mongo/decorators.py", line 241, in _wrapped
    return orig_method(self, *args, **kwargs)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/pool.py", line 501, in command
    self._raise_connection_failure(error)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/pool.py", line 649, in _raise_connection_failure
    raise error
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/pool.py", line 496, in command
    collation=collation)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/network.py", line 107, in command
    name, size, max_bson_size + message._COMMAND_OVERHEAD)
  File "/default-medusa-venv/lib/python3.6/site-packages/pymongo-3.6.0-py3.6-linux-x86_64.egg/pymongo/message.py", line 709, in _raise_document_too_large
    raise DocumentTooLarge("command document too large")
pymongo.errors.DocumentTooLarge: command document too large

The notebook result itself is maybe too large to be stored in mongo. Please address this issue.

Thank you

Add ability to add custom mongo connection logic

Usually you won't have a plaintext password in an environment variable (I hope) so we need to allow users to specify their own connection methods. This in future should be extendable to other storage mechanisms, e.g. postgres

Grouped front page should be case-sensitive

e.g. if you run for Cowsay and cowsay, the capitalised version will take precendence.

Bug: generate_pdf_output and hide_code_output not working from scheduler

The scheduler does not seem to be passing these parameters correctly to the executor for some reason. This needs to be investigated.

nbconvert --to slide support

Support Reveal.js HTML slideshow option of nbconvert such that the output of a scheduled notebook can be a slide deck.

https://nbconvert.readthedocs.io/en/latest/usage.html

Enable me to explore on Jupyterhub behind reverse proxy

I want to try out notebooker. I don't have access to python on my laptop for enterprise reasons. I can work inside jupyterhub.

I run

mkdir -p /home/jovyan/shared/.analytics-workspace/.mongodb
conda install -y -c anaconda mongodb=6.0.2
pip install notebooker;python -m ipykernel install --user --name=notebooker_kernel

mongod --dbpath /home/jovyan/shared/.analytics-workspace/.mongodb

notebooker-cli --mongo-host localhost:27017 start-webapp --port 11828

Both mongodb and notebooker start up successfully.

The jupyter-server-proxy is installed. So I would expect to be able to try out notebooker ui at

https://domain/namespace/user/user-name/proxy/11828

https://domain/namespace/user/user-name/proxy/11828/

But as you can see I cannot.

The issue is that all assets are expected to be found at the root /.

But they should not be. Either they should be found at some nested path that I should be able to specify via a --prefix flag. Or they should be referenced as ./static I believe.

Additional Context

I often run Panel succesfully from the terminal in my jupyter hub. Panel enables me to set a --prefix that is used to point to the static assets.

Its the same for other data app frameworks. For example streamlit.

git clone
cd docker
docker-compose up

[Document] Missing Prerequisites & Setup Instruction

Add Prerequisites:

yarn

Setup:

Also need to run npm run-script build otherwise the schedule page will not work with 404 missing schedule_bundle.js error.

man-group / notebooker Goto Github PK

notebooker's Introduction

Run a Jupyter notebook as a report with parameters

Execute Jupyter notebooks either on the webservice or command line

View the output of notebooks as static HTML

All results are accessible from the home page

Drill down into each template's results

Getting started

Contributors

notebooker's People

Contributors

Stargazers

Watchers

Forkers

notebooker's Issues

Additional Context

Recommend Projects

Recommend Topics

Recommend Org