Giter Site home page Giter Site logo

nteract / testbook Goto Github PK

View Code? Open in Web Editor NEW
405.0 16.0 37.0 154 KB

๐Ÿงช ๐Ÿ“— Unit test your Jupyter Notebooks the right way

Home Page: https://testbook.readthedocs.io

License: BSD 3-Clause "New" or "Revised" License

Python 84.99% Jupyter Notebook 15.01%
jupyter-notebook unit-testing pytest nteract python testbook

testbook's Introduction

Build Status image Documentation Status PyPI Python 3.6 Python 3.7 Python 3.8 Python 3.9 Code style: black

testbook

testbook is a unit testing framework extension for testing code in Jupyter Notebooks.

Previous attempts at unit testing notebooks involved writing the tests in the notebook itself. However, testbook will allow for unit tests to be run against notebooks in separate test files, hence treating .ipynb files as .py files.

testbook helps you set up conventional unit tests for your Jupyter Notebooks.

Here is an example of a unit test written using testbook

Consider the following code cell in a Jupyter Notebook example_notebook.ipynb:

def func(a, b):
   return a + b

You would write a unit test using testbook in a Python file example_test.py as follows:

# example_test.py
from testbook import testbook


@testbook('/path/to/example_notebook.ipynb', execute=True)
def test_func(tb):
   func = tb.get("func")

   assert func(1, 2) == 3

Then pytest can be used to run the test:

pytest example_test.py

Installing testbook

pip install testbook

NOTE: This does not install any kernels for running your notebooks. You'll need to install in the same way you do for running the notebooks normally. Usually this is done with pip install ipykernel

Alternatively if you want all the same dev dependencies and the ipython kernel you can install these dependencies with:

pip install testbook[dev]

Documentation

See readthedocs for more in-depth details.

Development Guide

Read CONTRIBUTING.md for guidelines on how to setup a local development environment and make code changes back to testbook.

testbook's People

Contributors

alonme avatar bensenberner avatar boluwatifeh avatar dav009 avatar fcollonval avatar lfunderburk avatar libelinda avatar loichuder avatar mseal avatar nakami avatar nastra avatar rohitsanj avatar ronnie-llamado avatar timkpaine avatar willingc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

testbook's Issues

execute_cell fails if cell takes longer than 60 seconds to run

We're testing a notebook with a cell that normally takes about 2 minutes to run. However, calling execute_cell on that cell gives us this error:

E           nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 60 seconds.
E           The message was: Cell execution timed out.
E           Here is a preview of the cell contents:
E           -------------------
E           ...

Is there a way we can increase the timeout threshold to more than 60 seconds?

Accept notebook as str, file and nbformat.NotebookNode object

Currently, the testbook api only accepts a notebook path. We should extend it to also accept file objects and nbformat.NotebookNode objects.

@MSeal how do you suggest we go about this?

A particular use case where this would be feasible is when a notebook is obtained through a GET request. It would need to be saved first, and then provide the path to it for testbook to work. We can avoid unnecessary file I/O by directly accepting file objects.

Allow using tb['some_object'] or tb.get('some_object') instead of tb.ref('some_object')

It's more Pythonic, easier to remember, and makes it easy to migrate in-notebook tests to a script.

Here's my use case:
When I'm working on a notebook, I like to write tests inline first, and then move it to a test.py file:

# notebook.ipynb

def add(x, y):
  return x + y
 
def test_add():
  assert add(2, 3) == 5

This allows me to test the function instantly without having to leave Jupyter, open a terminal, create a new script, etc., etc. This is especially useful when I'm using Binder, Colab, or Kaggle notebooks, and opening up a terminal or a new script isn't that straightforward.

Later, I move the test to a test.py file:

# test.py

def test_add(tb):
  add = tb.ref('add')
  assert add(2, 3)

This requires an additional step of adding the tb argument and accessing add using tb.ref('add'). This change has to be made for every single test I write and every time I update a test in my notebook.

It'd be nice is if I could use tb like a dictionary. Then, I can write the tests within my notebook in this fashion:

# notebook.ipynb

def add(x, y):
  return x + y

def test_add(tb):
  add = tb['add'] # or tb.get('add') for safety
  assert add(2, 3) == 5

# A simple in-notebook test runner
def run_test(test):
  return test(globals())

Within the notebook I can run the tests by simply passing globals() as the value for tb and then I can easily migrate the tests to a script without making any code changes.

# test.py

def test_add(tb):
  add = tb['add'] # or tb.get('add') for safety
  assert add(2, 3) == 5

# Look ma, no changes!

Documentation for `inject` function

Something along the lines of the following text, with some examples too.

  • pass in run=False (default is True) to not execute the cell as soon as it is injected
  • pass in pop=True (default is False) to pop the cell off after execution, this is useful in cases where you would want to execute the entire notebook (or range of cells) again but not including the injected cell.
  • pass in before or after arguments to inject the cells in a particular location in the notebook. Useful when used with run=False.

Note that inject appends a cell and runs it by default.

Originally posted by @rohitsanj in #72 (comment)

How can this be used as part of the github classroom autograder workflow?

I've attempted to use this utility as part of the github autograder workflow.

My attempts result in the error below. Unfortunately, I don't know enough about the environment to determine if the problem is with my GitHub workflow action, with the notebook, with how I'm creating the test file, or how I'm referencing the notebook. Thank you for any assistance you can offer.

When this is present in the test file:

@testbook('simple_lr.ipynb', execute=True)

I get the following error:

  self = <jupyter_client.kernelspec.KernelSpecManager object at 0x7fb372f72518>
  kernel_name = 'python3'
      def get_kernel_spec(self, kernel_name):
          """Returns a :class:`KernelSpec` instance for the given kernel_name.

          Raises :exc:`NoSuchKernel` if the given kernel name is not found.
          """
          if not _is_valid_kernel_name(kernel_name):
              self.log.warning("Kernelspec name %r is invalid: %s", kernel_name,
                               _kernel_name_description)

          resource_dir = self._find_spec_directory(kernel_name.lower())
          if resource_dir is None:
  >           raise NoSuchKernel(kernel_name)
  E           jupyter_client.kernelspec.NoSuchKernel: No such kernel named python3

  /usr/local/lib/python3.6/dist-packages/jupyter_client/kernelspec.py:235: NoSuchKernel
  =========================== short test summary info ============================
  FAILED simple_lr_test.py::test - jupyter_client.kernelspec.NoSuchKernel: No s...
  ========================= 1 failed, 1 passed in 0.65s ==========================

Cell tag lookup

Use of cell tags to identify cells to run and/or test will be more user friendly than using cell index number.

Raise an exception as notebook exception

We want to wrap test failure exceptions as best we can. The idea here would be to capture the exception information from the notebook client and promote it to a proper python exception. Bonus points if we can remap to built-in python exceptions when possible.

Code coverage for notebooks

Along with unit-testing, we should provide a way to compute code coverage for Jupyter Notebooks.

This is likely to be another project in itself.

@MSeal suggests using trace.

Tests fail if there's _ (underscore) defined in the notebook

Is there a way to handle this?

Test:

from testbook import testbook

@testbook('nb.ipynb', execute=True)
def test_foo(tb):
    foo = tb.ref("foo")
    assert foo(2) == 3

Notebook:

_ = 20

def foo(x):
    return x + 1

Result:


    @testbook('nb.ipynb', execute=True)
    def test_foo(tb):
        foo = tb.ref("foo")
>       assert foo(2) == 3
E       AssertionError: assert 20 == 3
E        +  where 20 = <[TypeError('__repr__ returned non-string (type int)') raised in repr()] TestbookObjectReference object at 0x7ff04d735c90>(2)

tb.py:6: AssertionError
======================================================================== short test summary info ========================================================================
FAILED tb.py::test_foo - AssertionError: assert 20 == 3

Injected code snippets

When testing a notebook we'll want to be able to pass snippets of code to run inside the kernel. This will allow for users to write functions or text of code (when cross language) to prepare, or activate code from within the kernel.

def helper():
  print("I ran in the kernel")

@testbook.notebook(path)
def test_foo(notebook):
  notebook.inject(helper)
  notebook.assert_output_text("I ran in the kernel")

This can be achieved by using the inspect module to extract the source code as text and passing it as though it were a cell to execute in the jupyter kernel.

More examples documentation

We could use more examples for how to use testbook in different scenarios. This would have a strong lasting effect for adoption and resuse of the project over other efforts.

Consider adding support for len and iter

This doesn't work:

data_list = tb.ref('data_list')
assert len(data_list) == 1024

It throws:

TypeError: object of type 'TestbookObjectReference' has no len()

For the same reason, I'm guessing this won't work too:

for x in data_list:
    assert x == 1

I believe the solution is to implement __len__ and __iter__ in TestbookObjectReference to simply invoke the same functions on the underlying object in the notebook.

Conda build

We need a conda-forge feedstock recipe for installing easily in conda environments.

Unit testing tutorial

It would be nice to have a unit testing tutorial in the docs, which can be succeeded by a notebook unit testing guide.

Inject Code

Add the ability to inject code into a notebook before a particular cell executes. We likely need the support text injection to be evaled as well as passing python functions to be injected.

the tb.patch will only allow strings to be returned from mocks

I have a notebook that calls some library function and gets a complex object returned. Since the object ultimately comes from a database call, I want to mock this such that the notebook gets a synthetic object instead. I have tried with the following pattern:

from testbook import testbook
from collections import namedtuple

@testbook('./doc/examples/notebooks/tools/01-CreateFromObjectNo.ipynb')
def test_get_object(tb):
    mock_object = namedtuple("dataobject",["name", "description"])
    mock_return__function = lambda x: mock_object("Some Name", "Some Description")
    
    with tb.patch("library.dataobjects.DataObject.retrieve", mock_return_function):
        tb.execute_cell([2, 4])

The notebook cell 2 handles imports and the notebook cell 4 looks something like:

dataobject = DataObject.retrieve("DO-320")
print(dataobject.name)
print(dataobject.description)

The test the fails with:

AttributeError: 'str' object has no attribute 'name'

Am I missing something in the documentation or can we only return strings with the wrapped patch?

PyTest Integration

Add basic integration to pytest with an empty decorator / fixture wrapper. This should be the basis for making a test framework we can plug into,

Add method to execute entire notebook

Will need to add few methods to TestbookNotebookClient:

  • execute method - could also have a bool argument to __init__ to execute the entire notebook.
  • have a execute_upto_cell arg in execute_cell - which executes upto a particular cell (including or excluding - can be controlled by an arg)

Update when docs added

testbook/CONTRIBUTING.md

Lines 67 to 72 in 708c820

TODO: Update when docs added
The documentation is built using the [Sphinx](http://www.sphinx-doc.org/en/master/) engine. To contribute, edit the [RestructuredText (`.rst`)](https://en.wikipedia.org/wiki/ReStructuredText) files in the docs directory to make changes and additions.
Once you are done editing, to generate the documentation, use tox and the following command from the root directory of the repository:


This issue was generated by todo based on a TODO comment in 708c820. It's been assigned to @MSeal because they committed the code.

Document how to use test wrappers

Once we have basic operational we should add docs describing how to use the capabilities with simple examples pulled from files in the repo. Likely a later task relative to basic functionality.

drop usage of eval

# TODO: drop usage of eval
raise eval(e.ename)(e) from None
executed_cells.append(cell)
return executed_cells[0] if len(executed_cells) == 1 else executed_cells


This issue was generated by todo based on a TODO comment in f31b1e9. It's been assigned to @rohitsanj because they committed the code.

Decorated tests not being picked up by pytest

With a test.py that looks like

from testbook import testbook

@testbook('my_notebook.ipynb')
def test_foo(tb):
    assert False, "foo"

def test_bar():
    assert False, "bar"

running pytest test.py yields

=================================== test session starts ===================================
platform linux -- Python 3.7.9, pytest-6.0.2, py-1.9.0
collected 1 item                                                                          

test.py F                        [100%]

======================================== FAILURES =========================================
________________________________________ test_bar _________________________________________

    def test_bar():
>       assert False, "bar"
E       AssertionError: bar
E       assert False

test.py:9: AssertionError
================================= short test summary info =================================
FAILED test.py::test_bar - AssertionE...
==================================== 1 failed in 0.17s ====================================

Only the non-testbook test is picked up, even though they both are prefixed with test.


On the other hand, when I use a context manager in test.py:

from testbook import testbook

def test_foo():
    with testbook('my_notebook.ipynb') as tb:
        assert False, "foo"

def test_bar():
    assert False, "bar"

then both the tests are picked up.

=================================== test session starts ===================================
platform linux -- Python 3.7.9, pytest-6.0.2, py-1.9.0
collected 2 items                                                                         

test.py FF                       [100%]

======================================== FAILURES =========================================
________________________________________ test_foo _________________________________________

    def test_foo():
        with testbook('my_notebook.ipynb') as tb:
>           assert False, "foo"
E           AssertionError: foo
E           assert False

test.py:7: AssertionError
________________________________________ test_bar _________________________________________

    def test_bar():
>       assert False, "bar"
E       AssertionError: bar
E       assert False

test.py:10: AssertionError
================================= short test summary info =================================
FAILED test.py::test_foo - AssertionE...
FAILED test.py::test_bar - AssertionE...
==================================== 2 failed in 1.20s ====================================

Injected mocks

This will be trickier, but injected python mocks into kernels would be a really helpful wrapper. Likely this would only be supported for python kernels at first.

',

testbook/setup.py

Lines 48 to 53 in 708c820

description='TODO',
author='nteract contributors',
author_email='[email protected]',
license='BSD',
# Note that this is a string of words separated by whitespace, not a list.
keywords='jupyter mapreduce nteract pipeline notebook',


This issue was generated by todo based on a TODO comment in 708c820. It's been assigned to @MSeal because they committed the code.

Improve traceback

Couple of issues with the traceback

  • too verbose, we'd likely want to hide the underlying nbclient calls
  • errors are printed twice at the end of the traceback

Here is a sample snippet that throws a NameError

In [3]: with testbook('../something.ipynb') as tb:
   ...:     tb.execute_cell(0) # execute the first cell
   ...:     tb.value('foo') # does not exist, will throw NameError
   ...:
---------------------------------------------------------------------------
CellExecutionError                        Traceback (most recent call last)
<ipython-input-3-bc3dbf017b62> in <module>
      1 with testbook('../something.ipynb') as tb:
      2     tb.execute_cell(0)
----> 3     tb.value('foo')
      4

~/testbook/testbook/client.py in value(self, name)
    130         """Extract a JSON-able variable value from notebook kernel"""
    131
--> 132         result = self.inject(name)
    133         if not self._execute_result(result.outputs):
    134             raise ValueError('code provided does not produce execute_result')

~/testbook/testbook/client.py in inject(self, code, args, prerun)
    123
    124         self.nb.cells.append(new_code_cell(lines))
--> 125         cell = self.execute_cell(len(self.nb.cells) - 1)
    126
    127         return TestbookNode(cell)

~/testbook/testbook/client.py in execute_cell(self, cell, **kwargs)
     59         executed_cells = []
     60         for idx in cell_indexes:
---> 61             cell = super().execute_cell(self.nb['cells'][idx], idx, **kwargs)
     62             executed_cells.append(cell)
     63

~/miniconda3/envs/testbook/lib/python3.8/site-packages/nbclient/util.py in wrapped(*args, **kwargs)
     70     """
     71     def wrapped(*args, **kwargs):
---> 72         return just_run(coro(*args, **kwargs))
     73     wrapped.__doc__ = coro.__doc__
     74     return wrapped

~/miniconda3/envs/testbook/lib/python3.8/site-packages/nbclient/util.py in just_run(coro)
     49         nest_asyncio.apply()
     50         check_patch_tornado()
---> 51     return loop.run_until_complete(coro)
     52
     53

~/miniconda3/envs/testbook/lib/python3.8/asyncio/base_events.py in run_until_complete(self, future)
    610             raise RuntimeError('Event loop stopped before Future completed.')
    611
--> 612         return future.result()
    613
    614     def stop(self):

~/miniconda3/envs/testbook/lib/python3.8/site-packages/nbclient/client.py in async_execute_cell(self, cell, cell_index, execution_count, store_history)
    745         if execution_count:
    746             cell['execution_count'] = execution_count
--> 747         self._check_raise_for_error(cell, exec_reply)
    748         self.nb['cells'][cell_index] = cell
    749         return cell

~/miniconda3/envs/testbook/lib/python3.8/site-packages/nbclient/client.py in _check_raise_for_error(self, cell, exec_reply)
    669         if self.force_raise_errors or not cell_allows_errors:
    670             if (exec_reply is not None) and exec_reply['content']['status'] == 'error':
--> 671                 raise CellExecutionError.from_cell_and_msg(cell, exec_reply['content'])
    672
    673     async def async_execute_cell(self, cell, cell_index, execution_count=None, store_history=True):

CellExecutionError: An error occurred while executing the following cell:
------------------
foo
------------------

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-f1d2d2f924e9> in <module>
----> 1 foo

NameError: name 'foo' is not defined
NameError: name 'foo' is not defined

Resolves #6

Documentation updates for 0.2.0

List of things to add to docs before 0.2.0 release:

  • documentation and examples for patch and patch_dict (#45)
  • documentation of error handling and error assertion using testbook (#46)
  • documentation of passing slice/range to execute argument

Share kernel context across multiple tests

We need an option to share kernel context across multiple test functions so that the set up (for a test to be executed) need not be executed multiple times, but just once - which results in faster tests.

UPDATE: Check #17 (comment).

Support multiple cells with the same tag

First of thanks for this really interesting package.

Here is a suggestion: could the code gather all cells with the provided tags to be executed instead of returning the first tagged cell found?

Switch primary branch to `main`

Did this for papermill last year, but just noticed this is still master oriented. Shouldn't impact much of anything here to change it imho.

Add support for cell id targeting

Now that cell ids are part of the 4.5 spec, we should have the execution identifier look for cell_id matches when gathering cells to execute.

Execute a cell decorator

Add the most basic "execute indicated cell" before unittest runs decorator. This will be used to prep a test case for evaluation.

Document usecases of `resolve`

Hello,

I wanted to test the value of variables defined in my notebook:

foo = tb.ref("my_variable")
assert foo == "foo" # This fails

EDIT: This works in fact. See the below comments for an accurate description of my problem

I failed to do so as the documentation shows only examples with functions (or did I miss something ?).
I had to dig in the code to find the resolve() method to do:

foo = tb.ref("my_variable").resolve()
assert foo == "foo" # This passes

Is this the recommended way of doing it ? If so, examples with resolve would be a helpful addition to the documentation ! ๐Ÿ™‚

Push down or inject assertions into kernel

We would want to be inject assertions (push down) into the kernel instead of a pull up approach. This would help in two ways:

  • non JSON serializable objects can also be asserted against (as they cannot be pulled up in a straightforward manner)
  • allow for kernel agnostic assertions (current implementation of pull up #26 only works with ipykernel)

Sample implementation by @MSeal

class NotebookTest:
    @context
    def exception_mapper():
        try:
            yield
        except CellExecutionError as e:
            # Extract exception and reraise closest exception type
            if (e.exception_class == 'AssertionError'):
                raise AssertionError(e.message)
    
    def assert_in_notebook(lhs_var, rhs):
        try:
            with self.exception_mapper():
                if (self.kernel_name == 'python'):
                    rhs_assign = papermill.translate('_assert_key', rhs, self.kernel_name)
                    code_to_inject = f"""
                    import pytest #???
                    {rhs_assign}
                    assert {lhs_var} == _assert_key
                    """
                else:
                    raise NotImplementedError("No json dump available for {kernel_name}")
                self.inject_code(code_to_inject)
        except AssertionError:
            # Might not be necessary
            print("Normal pytest message")
            
            
    class EmbededEqualityReference():
        def __init__(self, nbt, lhs, rhs):
            self.nbt = nbt
            self.lhs = lhs
            self.rhs = rhs
        
        def assert_in_notebook():
            self.nbt.assert_in_notebook(self.lhs, self.rhs)

            
    class EmbededVariableReference():
        # Custom collections.???
        # Compare with Mock objects
        def __init__(self, nbt, lhs):
            self.nbt = nbt
            self.lhs = lhs
            
        def __eq__(self, rhs):
            return EmbededEqualityReference(self.nbt, self.lhs, self.rhs)

        def resolve():
            return fetch_variable(self.lhs, self.nbt.kernel_name)


@pytest.hookimpl(hookwrapper=True)
def pytest_notebooktest_call(item):
    try:
        yield.get_result()
    except AssertionError as e:
        if (isinstance(e.args[0], EmbededEqualityReference):
            e.args[0].assert_in_notebook()
        # Continue with test call?
    except:
            raise

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.