reviewnb / treon Goto Github PK

View Code? Open in Web Editor NEW

295.0 12.0 27.0 90 KB

Easy to use test framework for Jupyter Notebooks

Home Page: https://reviewnb.com

License: MIT License

Python 72.80% Makefile 0.22% Jupyter Notebook 26.97%

jupyter-notebooks unittest

treon's People

Contributors

Stargazers

Watchers

treon's Issues

Ability to only run / ignore specific cells

Executing entire notebook is a "good" CI practice. But it's not always feasible or required (some cells can take hours to run). It would be good to offer an ability to only run specific cells or exclude specific cells. Which cells to run / ignore can be specified with tags.

This whole partial execution can be wrapped as a new treon cmdline flag (e.g. treon --partial). We'd still want the standard mode to execute complete notebooks top to bottom.

Linter + CI

During development for #4, I felt like the following essentials might be missing from treon:

.gitignore - for __pycache__
~~Pipfile and Pipfile.lock - easier to get started, because there are already non-standard dependencies like docopt.~~
Document dependency set-up (see "Note about dependencies" in readme)
Pylint - code is in good shape, better keep it that way
basic sanity tests
Travis? - to exercise above effectively

How do you feel about this list, @amit1rrr ?

Treon chokes on notebooks that refer to relative paths

We're using treon for HumanCellAtlas/data-consumer-vignettes, and ran into a problem where notebooks that refer to relative paths can unexpectedly fail when tested with treon.

These notebooks expect the current working directory to be the directory that the notebook resides in. When using treon, this is not likely to be the case, since treon searches recursively for notebooks to test. (In this case, the current working directory is the directory where treon was invoked.)

I've been able to solve this locally by changing the working directory to the directory in which the notebook resides before testing the notebook. That said, this solution only works if treon is limited to a single thread (or only testing one notebook), since the current working directory is shared across all threads.

There are a few approaches to this that I can think of:

Refactor treon to use multiprocessing instead of multithreading. Skimming the source code, it seems like this change would be more or less trivial. Using multiprocessing would have the benefit of working directory isolation, in addition to potential performance improvements for testing CPU-bound notebooks by circumventing the GIL, at the cost of some performance overhead.
Add an option to perform the same directory-switching that I used above that limits the number of parallel threads to one.
Ignore this problem - our workaround is to handle the directory-switching ourselves, running treon with one notebook at a time, while still achieving parallelism with xargs.

I'm not sure if this is a widespread use case, and I'm happy to take a shot at either of these myself.

Cmdline option for less verbose output (--silent/--quiet)

Currently we print verbose output which shows every test that's running. It's a good default since,

Users can see the progress (some tests in notebooks can be long running)
Easy to make sure all intended tests have run

But verbose output is not always required. Users would sometimes prefer py.test type output which only becomes loud on failure and is silent otherwise. It would be good to have a flag (--quiet/--silent) to only print bare minimum output.

ModuleNotFoundError when using `! pip install <package>

Hi,

I quickly tested your package today with a notebook call draft-treon.ipynb with the following cell:

! pip install pandas
import pandas as pd

and ran the command:

pip install treon
treon draft-treon.ipynb

But I got the following error message

Executing treon version 0.1.3
Triggered test for draft-treon.ipynb
ERROR in testing draft-treon.ipynb

An error occurred while executing the following cell:
------------------
! pip install pandas

import pandas as pd
------------------

---------------------------------------------------------------------------
ModuleNotFoundError
Traceback (most recent call last)
<ipython-input-1-0da7781ded87> in <module>()
      1 get_ipython().system(' pip install pandas')
      2
----> 3 import pandas as pd

ModuleNotFoundError: No module named 'pandas'
ModuleNotFoundError: No module named 'pandas'




-----------------------------------------------------------------------
TEST RESULT
-----------------------------------------------------------------------
draft-treon.ipynb       -- FAILED 
-----------------------------------------------------------------------
0 succeeded, 1 failed, out of 1 notebooks tested.
-----------------------------------------------------------------------

What would be the correct procedure to run the test ?

Thanks,

Treon stops running with multiple threads

When running treon on Windows 10 with multiple threads it sometimes stops running because of issues with the underlying jupyter client.

To some extent this is an issue with the jupyter client, and or nbconvert but treon is triggering the issue by calling nbconvert in multiple threads.

The error message and discussion of the jupyter client is at this issue: jupyter/jupyter_client#466

For treon a workaround though would be to use multiple processes instead of threads. Ipython does not seem to be thread-safe as of now but this is being worked on (jupyter/nbconvert#936).

Could you add support for linting/formatting?

I think it would be nice to autoformat the code written in notebooks using black and possibly lint with pycodestyle, etc.

Could support for this be added?

`ValueError: Invalid file descriptor: -1` in asyncio

I often (but not always) witness treon output the following error just before exiting:

Exception ignored in: <function BaseEventLoop.__del__ at 0x7fbaf2e4a4c0>
Traceback (most recent call last):
  File "/usr/lib64/python3.8/asyncio/base_events.py", line 656, in __del__
    self.close()
  File "/usr/lib64/python3.8/asyncio/unix_events.py", line 58, in close
    super().close()
  File "/usr/lib64/python3.8/asyncio/selector_events.py", line 92, in close
    self._close_self_pipe()
  File "/usr/lib64/python3.8/asyncio/selector_events.py", line 99, in _close_self_pipe
    self._remove_reader(self._ssock.fileno())
  File "/usr/lib64/python3.8/asyncio/selector_events.py", line 276, in _remove_reader
    key = self._selector.get_key(fd)
  File "/usr/lib64/python3.8/selectors.py", line 190, in get_key
    return mapping[fileobj]
  File "/usr/lib64/python3.8/selectors.py", line 71, in __getitem__
    fd = self._selector._fileobj_lookup(fileobj)
  File "/usr/lib64/python3.8/selectors.py", line 225, in _fileobj_lookup
    return _fileobj_to_fd(fileobj)
  File "/usr/lib64/python3.8/selectors.py", line 42, in _fileobj_to_fd
    raise ValueError("Invalid file descriptor: {}".format(fd))
ValueError: Invalid file descriptor: -1

Treon then goes on to exit with a successful error code, so it's effectively just a warning, even though it is caused by a ValueError. I've witnessed this on multiple versions of python..

I typically invoke treon as treon . --threads 2. I suspect this warning might go away if I don't use threads, but I haven't investigated it yet.

treon RuntimeError: Can only launch a kernel on a local interface

When I run treon I get a RuntimeError telling me that it can only launch a kernel on a local interface.

The error message suggests to me that something is wrong with my configuration settings, but I had expected since I don't see this error when I run ipython or jupyter notebook then I also shouldn't see it when I run treon.

Details

Windows 10
Python 3.7
treon master branch

I made a new (conda) virtual environment to install the treon master branch into, with pip install -r requirements-dev.txt
and pip install -e .

Here's the output when I run treon on the example notebooks included with the repository:

(treon-dev) C:\Users\Genevieve\Documents\GitHub\treon>treon
Executing treon version 0.1.3
Recursively scanning C:\Users\Genevieve\Documents\GitHub\treon for notebooks...
Triggered test for C:\Users\Genevieve\Documents\GitHub\treon\tests\resources\basic.ipynb
Triggered test for C:\Users\Genevieve\Documents\GitHub\treon\tests\resources\doctest_failed.ipynb
Triggered test for C:\Users\Genevieve\Documents\GitHub\treon\tests\resources\runtime_error.ipynb
Triggered test for C:\Users\Genevieve\Documents\GitHub\treon\tests\resources\unittest_failed.ipynb
ERROR in testing C:\Users\Genevieve\Documents\GitHub\treon\tests\resources\runtime_error.ipynb

An error occurred while executing the following cell:
------------------
1 / 0
------------------

�[1;31m---------------------------------------------------------------------------�[0m
�[1;31mZeroDivisionError�[0m                         Traceback (most recent call last)
�[1;32m<ipython-input-1-bc757c3fda29>�[0m in �[0;36m<module>�[1;34m�[0m
�[1;32m----> 1�[1;33m �[1;36m1�[0m �[1;33m/�[0m �[1;36m0�[0m�[1;33m�[0m�[1;33m�[0m�[0m
�[0m
�[1;31mZeroDivisionError�[0m: division by zero
ZeroDivisionError: division by zero




Traceback (most recent call last):
  File "C:\Users\Genevieve\anaconda3\envs\treon-dev\Scripts\treon-script.py", line 33, in <module>
    sys.exit(load_entry_point('treon', 'console_scripts', 'treon')())
  File "c:\users\genevieve\documents\github\treon\treon\treon.py", line 52, in main
    trigger_tasks(tasks, thread_count)
  File "c:\users\genevieve\documents\github\treon\treon\treon.py", line 71, in trigger_tasks
    pool.map(Task.run_tests, tasks)
  File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\multiprocessing\pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\multiprocessing\pool.py", line 657, in get
    raise self._value
  File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\multiprocessing\pool.py", line 44, in mapstar
    return list(map(*args))
  File "c:\users\genevieve\documents\github\treon\treon\task.py", line 25, in run_tests
    self.is_successful, console_output = execute_notebook(self.file_path)
  File "c:\users\genevieve\documents\github\treon\treon\test_execution.py", line 13, in execute_notebook
    processor.preprocess(notebook, metadata(path))
  File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\nbconvert\preprocessors\execute.py", line 79, in preprocess
    self.execute()
  File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\nbclient\util.py", line 74, in wrapped
    return just_run(coro(*args, **kwargs))
  File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\nbclient\util.py", line 53, in just_run
    return loop.run_until_complete(coro)
  File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\asyncio\base_events.py", line 587, in run_until_complete
    return future.result()
  File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\nbclient\client.py", line 519, in async_execute
    async with self.async_setup_kernel(**kwargs):
  File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\async_generator\_util.py", line 34, in __aenter__
    return await self._agen.asend(None)
  File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\nbclient\client.py", line 477, in async_setup_kernel
    await self.async_start_new_kernel(**kwargs)
  File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\nbclient\client.py", line 389, in async_start_new_kernel
    await ensure_async(self.km.start_kernel(extra_arguments=self.extra_arguments, **kwargs))
  File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\nbclient\util.py", line 85, in ensure_async
    result = await obj
  File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\jupyter_client\manager.py", line 571, in start_kernel
    kernel_cmd, kw = self.pre_start_kernel(**kw)
  File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\jupyter_client\manager.py", line 252, in pre_start_kernel
    "Currently valid addresses are: %s" % (self.ip, local_ips())
RuntimeError: Can only launch a kernel on a local interface. This one is not: 127.0.0.1.Make sure that the '*_address' attributes are configured properly. Currently valid addresses are: ['49.127.90.25', '172.28.70.145', '192.168.56.1', '192.168.1.103', '192.168.137.1', '172.22.0.1', '0.0.0.0', '']

(treon-dev) C:\Users\Genevieve\Documents\GitHub\treon>

Things I've tried:

Calling treon with an --ip= keyword argument, the same way you might if you ran either ipython or jupyter directory (perhaps unsuprisingly, this isn't supported by treon)
Generating a jupyter config file with 'jupyter notebook --generate-config' and editing 'c.NotebookApp.ip' and 'c.NotebookApp.allow_origin', as suggested in this stackoverflow thread

Allow multiple PATHs to be provided

According to treon --help, treon is meant to be invoked as

treon [PATH] [--threads=<number>] [-v] [--exclude=<string>]...

However, it is currently not possible to pass multiple PATHs, unlike many command-line tools (e.g., cat). If multiple PATHs are provided, the help text is displayed and nothing else happens.

It is possible to exclude multiple paths (#1), but currently, the only way to include multiple paths (e.g., to test multiple notebooks by individual filename or to test multiple directories) is to invoke treon multiple times sequentially, once for each. But this means that the full benefits of multithreading are not available.

Add pytest functionality

Combining Jupyter notebooks with test driven development feels great, treon is really helpful for CI pipelines. Is it possible besides doctest and unittest to also include the pytest framework?

It seems that pytest does not have a drop-in function such as the unittest.main() which executes the current module but requires a filename. Still this would be a nice addition and removes a lot of boilerplate one needs for the unittest framework.

Merge projects?

Any interest in merging projects?
https://github.com/timkpaine/jupyterlab_celltests

Ability to ignore set of notebooks from test suite

Usually we have some rough notebooks and some final notebooks in the same repo. We never want to run tests for rough notebooks. It would be great to have .ci_ignore file that can be used to specify list of notebooks/directories that should be excluded from test suite.

Current workaround is to actually specify path of the notebooks/directories that needs to be included in the test suite.

reviewnb / treon Goto Github PK

treon's People

Contributors

Stargazers

Watchers

Forkers

treon's Issues

Recommend Projects

Recommend Topics

Recommend Org