Giter Site home page Giter Site logo

Comments (7)

rasbt avatar rasbt commented on July 23, 2024 1

I don't really know much about the formatting recommendation/guidelines in/for Jupyter notebooks, and if there's a difference between Jupyter Notebook and Jupyter Lab in terms on what gets written to .ipynb files. However, I noticed that in the Jupyter Lab UI, there's a metadata field, which would probably be equivalent to what @bollwyvl mentioned with

{
"metadata": {
  "watermark": {
    "date": "2015-17-06T15:04:35",
    "CPython": "3.4.3",
    "IPython": "3.1.0",
    "compiler": "GCC 4.2.1 (Apple Inc. build 5577)",
    "system" : "Darwin",
    "release" : "14.3.0",
    "machine": "x86_64",
    "processor" : "i386",
    "CPU cores": "4",
    "interpreter": "64bit"
  }
}

screen shot 2018-09-24 at 10 47 59 am

In any case, if you or @bollwyvl or someone else would like to implement this (a way to optionally write metadata), I'd be very open to this and be happy to merge it (there was good work in progress over at #7 ).

This could be either via a

  • magic command
  • decorator, or
  • --metadata flag.

from watermark.

rasbt avatar rasbt commented on July 23, 2024

Thanks for the suggestion, this sounds interesting. API-wise, I would think of an additional (optional) flag that would maybe write the produced output into the meta-tag.

Just wondering, what application and use-case would you have in mind? Right now, for example, I'd use this plugin to conveniently show the time-stamp of the last update to users. Or to show Python versions and packages that were used to create those results. I am just wondering how the "meta" tag could be additionally used to improve reproducibility.

from watermark.

bollwyvl avatar bollwyvl commented on July 23, 2024

Thanks for the response. Yeah, -m is already taken, but something to that
effect.

I think the big win is that metadata in standard formats (iso, etc) is more
unambiguously parseable by downstream consumers and UI than inline text.
Instead of writing some regular expressions, one can
json.load()[metadata][watermark] For example, on nbviewer, we show the
kernel that was used to create the notebook.

So if one has a big stack of documentation notebooks in a repo, one can
check for when they were actually executed, not when they were checked out,
etc.

When we get better search, either in Jupyter hub or in custom deployments,
metadata fields will just be ready to go as facets. An organization that
has watermark as part of their "standard distribution" could gain a lot of
insight, about a snapshot or over time.

On 23:34, Tue, Sep 1, 2015 Sebastian Raschka [email protected]
wrote:

Thanks for the suggestion, this sounds interesting. API-wise, I would
think of an additional (optional) flag that would maybe write the produced
output into the meta-tag.

Just wondering, what application and use-case would you have in mind?
Right now, for example, I'd use this plugin to conveniently show the
time-stamp of the last update to users. Or to show Python versions and
packages that were used to create those results. I am just wondering how
the "meta" tag could be additionally used to improve reproducibility.


Reply to this email directly or view it on GitHub
#4 (comment).

from watermark.

rasbt avatar rasbt commented on July 23, 2024

metadata in standard formats (iso, etc) is more
unambiguously parseable by downstream consumers and UI than inline text.

Good point, I agree. In this context, I could also imagine an optional little add-on to write all current package specifications of the Python env into the metadata as in pip freeze > requirements.txt

Btw. something like

-s      --save_meta
-g      --generate_meta

seems to be okay! However, I would suggest to not use the 1-letter short form here and go with --generate_meta to make it clear to a "user" of this notebook that the current watermark would change the notebook's meta-data in some way upon re-execution.

Would you be interested in implementing such a feature?

from watermark.

bollwyvl avatar bollwyvl commented on July 23, 2024

Sorry I didn't get back to you sooner: traveling!

I'd love to take a whack at this. Hopefully I can get a PoC up quickly.

Addons are great, but likely outside the scope of this particular request!

But, since we're off topic... I highly recommend building thementry_points vs namespace tomfoolery or magic module/function names.

In addition to pip, i'd consider being able to serialize the state of:

  • python
    • conda
  • "native" managers:
    • apt
    • dnf / yum
    • brew
  • other vcs
    • hg

from watermark.

rasbt avatar rasbt commented on July 23, 2024

No need to apologize, and I am sorry, too. It was a pretty hectic week. I am currently in final stage of finishing up my new book that is coming out in 1-2 weeks and there is a lot of stuff to be done :).

So, I think writing to the meta-tags as an option would be great. And I will open separate issues for the other suggestions. I like the idea of considering other "managers"/"environments"

Cheers,
Sebastian

from watermark.

betatim avatar betatim commented on July 23, 2024

Worth reheating this discussion? I think it would be cool to have the information inside the metadata of the notebook. Then follow up with a PR for conda-tools/conda-execute#3 which might make the notebook a "shareable unit". Right now for sharing notebooks you need to make repository with a requirements.txt or some such.

from watermark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.