Giter Site home page Giter Site logo

jupyter / nbdime Goto Github PK

View Code? Open in Web Editor NEW
2.6K 2.6K 155.0 14.41 MB

Tools for diffing and merging of Jupyter notebooks.

Home Page: http://nbdime.readthedocs.io

License: Other

Python 38.70% Jupyter Notebook 18.05% CSS 2.34% JavaScript 0.63% HTML 0.46% TypeScript 39.82%
diff diffing git hg jupyter jupyter-notebook jupyterlab-extension mercurial merge merge-driver mergetool vcs version-control

nbdime's Introduction

Installation | Documentation | Contributing | Development Install | Testing | License | Getting help

nbdime Jupyter Notebook Diff and Merge tools

Test codecov.io Documentation Status Google Group

nbdime provides tools for diffing and merging of Jupyter Notebooks.

  • nbdiff compare notebooks in a terminal-friendly way
  • nbmerge three-way merge of notebooks with automatic conflict resolution
  • nbdiff-web shows you a rich rendered diff of notebooks
  • nbmerge-web gives you a web-based three-way merge tool for notebooks
  • nbshow present a single notebook in a terminal-friendly way

Diffing notebooks in the terminal:

terminal-diff

Merging notebooks in a browser:

web-merge

Installation

Install nbdime with pip:

pip install nbdime

See the installation docs for more installation details and development installation instructions.

Documentation

See the latest documentation at https://nbdime.readthedocs.io.

See also description and discussion in the Jupyter Enhancement Proposal.

Contributing

If you would like to contribute to the project, please read our contributor documentation and the CONTRIBUTING.md.

Development Install

To install a development version of nbdime, you will need npm installed and available on your PATH while installing.

For a development install, enter on the command line:

pip install -e git+https://github.com/jupyter/nbdime#egg=nbdime

See installation documentation for additional detail, particularly related to performing a dev install for working on the browser script code.

Testing

Install the test requirements:

pip install nbdime[test]

To run Python tests locally, enter on the command line: pytest

To run Javascript tests locally, enter: npm test

Install the codecov browser extension to view test coverage in the source browser on github.

See testing documentation for additional detail.

License

We use a shared copyright model that enables all contributors to maintain the copyright on their contributions.

All code is licensed under the terms of the revised BSD license.

Getting help

We encourage you to ask questions on the mailing list.

Resources

nbdime's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nbdime's Issues

nbmerge-web shows nothing

To reproduce:

cd nbdime/tests/files
nbmerge-web -o tmp-merged.ipynb single_cell_nb.ipynb single_cell_nb--changed_source_output_ec.ipynb single_cell_nb--changed_source_output.ipynb

Only the file dialogues and save/close buttons are displayed, but the buttons don't work.

Autoresolve doesn't shift conflict indices

Indices (keys) in the conflict diffs returned from autoresolve must be shifted to reflect edits to the document. Probably best to make autoresolve return (newdocument, localconflicts, remoteconflicts) like merge does instead of (resolvedconflicts, localconflicts, remoteconflicts) as it does now.

Add merge strategy options

For example git merge has options like 'use ours'/'use theirs' to force conflict resolution, and we can add notebook-specific merge options like 'clean all notebook output', 'clean conflicting notebook output', 'join conflicting notebook output', and many more. We should sketch some use cases to figure out which options will be useful and try to keep the set small.

Remove numpy dependency

I used numpy some places in sequence diff algorithms mainly for the convenient slicing features, but this is not strictly necessary. Is this ok or should it be rewritten?

Move to line-based diff of sources and outputs

The current implementation does a charbased diff of sources, we don't want that. The same applies to some other strings, in particular some of the output fields. However there are many single line/word strings we want to deal with as atomic values. How do we distinguish which is which?

The easiest way to implement this is to

  1. split lines for all relevant strings in notebooks when loaded
  2. treat all strings as atomic values in the diff algorithm
  3. rejoin lines when done with notebooks

@minrk Can I use split_lines/rejoin_lines from nbformat.v4.rwbase? It looks like they do what I want.

Use object to describe merge descisions

Motivated by a need to communicate the merge results/conflicts to the mergetool, I would propose to have the merge algorithm build a structure which defines its decisions. A gist outlining a proposal for such a format can be found here: https://gist.github.com/vidartf/5ff461d224583772ff799732956d8f07

This decision structure could then be applied to the base directly to produce the merged file or sent to the merge tool.

Advantages include:

  • Mergetool gets all the information it needs:
    • Which diffs are in conflict with each others.
    • Can easily show local/remote diffs aligned with merged output.
    • Can show the source of all difference in merged vs. base (i.e. color highlighting to say "this change comes from local, and this change comes from remote", which is good for context).
    • Has the original diffs easily accessible if the user wants to overrule/reset default decisions.
  • In the merge routine, the decision making is separated from how those decisions are applied to the notebook (e.g. inserting conflict markers). This should also help compartmentalize the code for easier maintenance and unit testing.

Input would be very welcome on:

  • Are there any cases where this will break down / cause inefficiencies?
  • Should there be any changes to the proposed format for describing the decisions?

Define a format for representing merge conflicts

The merge operation results in two objects: the merge result and the conflicts.

The merge result is the base document with the parts of ´diff(base, local)´and ´diff(base, remote)´ that are not in conflict. Thus it is a notebook.

The conflicts should contain the rest of the diffs, possibly combined into a more convenient format.

Automerge fails for two divergent notebook with the same base

The (minmal) test case:

https://gist.github.com/eco32i/2f9877b8044c28ef8a5ca525342ec73e

With these three notebooks executing the merge command fails:

$ nbmerge test.ipynb test_A.ipynb test_B.ipynb -o test_merged
Traceback (most recent call last):
  File "/home/ilya/.venv/pydata3/bin/nbmerge", line 9, in <module>
    load_entry_point('nbdime', 'console_scripts', 'nbmerge')()
  File "/mnt/data/scipy2016/nbdime/nbdime/nbmergeapp.py", line 109, in main
    r = main_merge(args)
  File "/mnt/data/scipy2016/nbdime/nbdime/nbmergeapp.py", line 43, in main_merge
    m, lc, rc = merge_notebooks(b, l, r, args)
  File "/mnt/data/scipy2016/nbdime/nbdime/merging/notebooks.py", line 54, in merge_notebooks
    autoresolve_notebook_conflicts(merged, local_diffs, remote_diffs, args)
  File "/mnt/data/scipy2016/nbdime/nbdime/merging/notebooks.py", line 36, in autoresolve_notebook_conflicts
    autoresolve(merged, local_diffs, remote_diffs, strategies, "")
  File "/mnt/data/scipy2016/nbdime/nbdime/merging/autoresolve.py", line 346, in autoresolve
    return autoresolve_dicts(merged, local_diff, remote_diff, strategies, path)
  File "/mnt/data/scipy2016/nbdime/nbdime/merging/autoresolve.py", line 317, in autoresolve_dicts
    newvalue, ldi, rdi = autoresolve(value, le.diff, re.diff, strategies, subpath)
  File "/mnt/data/scipy2016/nbdime/nbdime/merging/autoresolve.py", line 348, in autoresolve
    return autoresolve_lists(merged, local_diff, remote_diff, strategies, path)
  File "/mnt/data/scipy2016/nbdime/nbdime/merging/autoresolve.py", line 234, in autoresolve_lists
    newvalue, ldi, rdi = autoresolve(merged[j], le.diff, re.diff, strategies, subpath)
  File "/mnt/data/scipy2016/nbdime/nbdime/merging/autoresolve.py", line 346, in autoresolve
    return autoresolve_dicts(merged, local_diff, remote_diff, strategies, path)
  File "/mnt/data/scipy2016/nbdime/nbdime/merging/autoresolve.py", line 281, in autoresolve_dicts
    assert set(lcd) == set(rcd)
AssertionError

Make diff format more verbose

The tuple notation [action, key, value] is opaque for readers of the source code. Change that into something like {"op":action, "key":key, "value":value}.

Implementing an edit distance function for any kind of diffable object

An edit distance function can be used to define approximate equality. The function needs to cover any type of diffable object, including nested dict and list structures.

For comparing strings, one alternative is the python-Levenshtein package:

https://github.com/ztane/python-Levenshtein/

This was used by the nbdiff.org project.

For strings or tuples containing lines of strings, difflib.SequenceMatcher(...).ratio()|quick_ratio() could be used, possibly using the autojunk heuristic to ignore blank lines.

For arbitrary nested structures we will anyway need a custom implementation.

works only in Python 2

I think it would be worth adding to the README that the plugin works only in Python 2.

Make autoresolve accept cell inserts from both sides

While inserting items at the same location in a list in general is suspicious and should be a conflict, for cells we can arguably allow inserting cells at the same location and before and after cells with conflicting edits. This is because a 'cell' is a more 'isolated component' than e.g. source lines. Since the merge algorithm is generic (notebook-independent), this is probably easiest implemented in autoresolve.

Consider adding 'keeprange' ops to diff format for sequences

Having an explicit operation for 'keep base[start:end] unmodified' may simplify several algorithms.

One backside is that it potentially doubles the size of the diff (one keeprange between each diff entry).

Should be introduced consistently throughout the codebase.

Support attachment field

Version 4.1 of the notebook format added the attachments field to raw and markdown cells. Nbdime should also support this field.

TODO:

  • Add server side predicates, differs and merge strategy for attachments to be similar to that of output mime bundles.
  • Add display capabilities in diff web view.
  • Add display capabilities in merge web view.

Cover cli tools invocation in test suite

The tools are now refactored into functions ´main_diff´ etc, which can be invoked from tests but currently are not. These write to files, unit tests may want to factor that out as well.

Script/alias to run mergetool on notebooks

While git tools cannot selectively be run on certain file extensions, the following command is very useful, and we should maybe make a command for it?

git mergetool --tool=nbdimeweb *.ipynb

Which basically leverages the optional file field of the git mergetool command, and the fact that it accepts wildcards.

Suggestions for command name if we implement this?

Format diff in human readable text format

interesting project! I tested the plugin on two sample notebooks you have as tests:

jupyter nbdiff test-data-singlecell-1.ipynb test-data-singlecell-2.ipynb a.json
! cells
    -- 0-1
    ++ 0-1
        [{u'cell_type': u'code',
          u'execution_count': None,
          u'metadata': {u'collapsed': True},
          u'outputs': [],
          u'source': u'def mult(a, b):\n    "This version is debugged."\n    return a * b'}]

is there an option to pretty-print the diff in a more human readable format?

Fetch merge test data from other tool(s)

Automatic testing of the integrity of diff is simple: if ´patch(a, diff(a, b)) == b´ then the diff is correct.

Automatic testing of the quality of diff is more of an art, as there is no unique solution to the problem. Still the integrity is the most important and quality we can iterate on.

Automatic testing of the integrity of merge is crucial and hard. Reusing a regression test data set from existing merge tools would be a major time saver and reassuring as to the quality of our final tools.

To do:

  • Investigate what's out there of such data
  • Integrate in our test suite somehow

For example, creating a single-cell notebook with source code from such test data will test the basic line-based merge tools. In addition, we can generate test cases by e.g. splitting test data into multiple cells by a random partitioning of the lines. This is of course in addition to hand-crafted test cases.

git-nbdiffdriver crashes when writing .git/config

After utf-8 fixes in commit f1a348e git-nbdiffdriver crashes with the following error. This happens in a fresh repository with .git/config in pristine state.

$ git-nbdiffdriver config --enable
Traceback (most recent call last):
  File "/Users/pjuhas/arch/anaconda/envs/cmi-dev/bin/git-nbdiffdriver", line 11, in <module>
    load_entry_point('nbdime', 'console_scripts', 'git-nbdiffdriver')()
  File "/Users/pjuhas/programs/github/jupyter/nbdime/nbdime/gitdiffdriver.py", line 115, in main
    opts.config_func(opts.global_)
  File "/Users/pjuhas/programs/github/jupyter/nbdime/nbdime/gitdiffdriver.py", line 57, in enable
    f.write('\n*.ipynb\tdiff=jupyternotebook\n')
TypeError: write() argument 1 must be unicode, not str

nbdiff never finishes

Hi folks! Been tracking the work. Looking good!

I have found a pair of reasonably large notebooks (1.7mb each) for which nbdiff doesn't finish. I've left it running overnight!

!curl -L https://notebooks.anaconda.org/anaconda-cluster/notebook-blaze-impala/download?version=1.0 > before.ipynb
!curl -L https://notebooks.anaconda.org/stephenakearns/notebook-blaze-impala-copy1/download?version=2016.04.29.0856 > after.ipynb
!du -h *.ipynb
!nbdiff before.ipynb after.ipynb

Any thoughts? I haven't dug enough into the implementation, but would do so if need be!

Allow customizing approximate-equality predicates based on json path

The main use case for this is to compare notebook cells based on source only, although it can easily be useful in other cases as well.

One design choice would be, instead of passing around a callable compare(a, b), passing a callable compare(a, b, path) where path is a list of the keys into the json object to reach a and b.

One question is whether this path should include list indices, i.e. for the case:

nb1 = { "cells": [ { "source": "foo", ... },  { "source": "bar", ... } ] }
nb2 = { "cells": [ { "source": "bat", ... } ] }

when comparing "bar" and "bat", should the compare get arguments:

approximately_equal = compare("bar", "bat", ["cells", "source"])

or:

approximately_equal = compare("bar", ["cells", 1, "source"], "bat", ["cells", 0, "source"])

Note that including indices in the path requires passing separate paths, and I don't know what to use that index for so I'm leaning towards dropping it.

Example naive compare function for illustration:

def mycompare(a, b, path):
    if path == ["cells"]:
        return compare_sources(a["source"], b["source"])
    else:
        return a == b

Markdown should be rendered

including mathjax, etc.

While we don't have rich rendered markdown diff, we can still show source diffs there,
but at least unchanged markdown should be rendered.

Add download button to nbmerge-web

If a filename is not provided on the commandline, there's no way to get the merge result. This can be really annoying if someone forgets to pass that argument and does a lot of work on the merge. Add a download button to allow storing the result. Also useful when run remotely.

Generate notebook representation of merge result with conflicts included

Just as text merge tools can generate a text file containing the merge result and conflicts inlined in the text file with markers <<< >>> for manual editing in a text editor, we can generate a notebook file with similar markers inside code cells and let the user resolve conflicts in cell sources using standard jupyter notebook editing tools. Editing output cells is not considered interesting.

Refine and document diff format well

I think the diff format covers basically what it should now but is still considered work in progress. Consider eventual input from JPEP discussion and experience from implementation and refine format, then document it better.

Lots of build dependencies

npm, tornado, ... All I want is some nicer git diffs. Does it really need to have all these dependencies for that, no way to have that optional?

unicode error for notebooks with interactive matplotlib figures

With active diff driver git diff HEAD~1 fails for notebooks that contain interactive matplotlib windows, i.e., which start with %matplotlib notebook

$ git diff HEAD~1

Traceback (most recent call last):
  File "/Users/pjuhas/anaconda/envs/dev/bin/git-nbdiffdriver", line 11, in <module>
    load_entry_point('nbdime', 'console_scripts', 'git-nbdiffdriver')()
  File "/Users/pjuhas/programs/github/jupyter/nbdime/nbdime/gitdiffdriver.py", line 117, in main
    show_diff(opts.a, opts.b)
  File "/Users/pjuhas/programs/github/jupyter/nbdime/nbdime/gitdiffdriver.py", line 75, in show_diff
    nbdiffapp.main([before, after])
  File "/Users/pjuhas/programs/github/jupyter/nbdime/nbdime/nbdiffapp.py", line 119, in main
    r = main_diff(args)
  File "/Users/pjuhas/programs/github/jupyter/nbdime/nbdime/nbdiffapp.py", line 44, in main_diff
    pretty_print_notebook_diff(afn, bfn, a, d)
  File "/Users/pjuhas/programs/github/jupyter/nbdime/nbdime/prettyprint.py", line 331, in pretty_print_notebook_diff
    print("\n".join(p))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 1080: ordinal not in range(128)
fatal: external diff died, stopping at test.ipynb

The interactive plot window from matplotlib (version 1.5.3) adds the following java code to the notebook:

...
       "        alert('Your browser does not have WebSocket support.' +\n",
       "              'Please try Chrome, Safari or Firefox ≥ 6. ' +\n",
...

where the symbol is the \u2265 Unicode character that makes the diff program fail.

Close buttons on nbdiff/merge-web page

Running nbdiff-web or nbmerge-web on some files now require ctrl+c to shutdown the server, would be nicer with at least a close button like the mergetool view has.

Setup doesn't seem to run npm in webapp

On my setup with a virtualenv with npm installed via 'pip install nodeenv', setup.py doesn't build npm packages in nbdime/webapp and nbdime-web properly.

Make merge produce conflicts on adjacent changes

Currently, merging these three source cells:

base:

def foo(x, y):
    z = x * y
    return z
long line with minor change

local:

def foo(x, y):
    z = x * y
    return z
long line with minor change L

remote:

def foo(x, y):
    z = x * y
    return z
long line with minor change R

results in both diffs agreeing on removing the last line, and each diff inserting a new line, with the end result:

def foo(x, y):
    z = x * y
    return z
long line with minor change L
long line with minor change R

I think this is too forgiving, in cases such as this there should be a conflict. @minrk agree?

Duplication of output cells in merge gui

The current drag-and-drop output handling in the merge gui will just insert copies of the output cells, with no way to undo, so you can end up with repeated instances of output cells you drag two times.

Adding a button to clear conflicting output is one way to solve this, useful also to quickly decide you don't need this output.

But maybe also add a check to avoid adding output cell twice in the merged output?

Add configuration framework

There will be a need to pass parameters around, from python calls, from commandline, possibly from config files in user directory or parent git repository of cwd. Are there some existing solutions to this in other jupyter projects that we can reuse and/or copy?

Design pass on diff viewer

The web-based notebook diff viewer is in pretty good shape in terms of presented content, so we are ready to start thinking seriously about doing design passes on the diff viewer.

cc @ellisonbg for ideas about how to approach the design process.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.