reviewnb / treon Goto Github PK
View Code? Open in Web Editor NEWEasy to use test framework for Jupyter Notebooks
Home Page: https://reviewnb.com
License: MIT License
Easy to use test framework for Jupyter Notebooks
Home Page: https://reviewnb.com
License: MIT License
Executing entire notebook is a "good" CI practice. But it's not always feasible or required (some cells can take hours to run). It would be good to offer an ability to only run specific cells or exclude specific cells. Which cells to run / ignore can be specified with tags.
This whole partial execution can be wrapped as a new treon cmdline flag (e.g. treon --partial
). We'd still want the standard mode to execute complete notebooks top to bottom.
During development for #4, I felt like the following essentials might be missing from treon:
.gitignore
- for __pycache__
Pipfile
and Pipfile.lock
- easier to get started, because there are already non-standard dependencies like docopt.How do you feel about this list, @amit1rrr ?
We're using treon for HumanCellAtlas/data-consumer-vignettes, and ran into a problem where notebooks that refer to relative paths can unexpectedly fail when tested with treon.
These notebooks expect the current working directory to be the directory that the notebook resides in. When using treon, this is not likely to be the case, since treon searches recursively for notebooks to test. (In this case, the current working directory is the directory where treon
was invoked.)
I've been able to solve this locally by changing the working directory to the directory in which the notebook resides before testing the notebook. That said, this solution only works if treon is limited to a single thread (or only testing one notebook), since the current working directory is shared across all threads.
There are a few approaches to this that I can think of:
Refactor treon to use multiprocessing instead of multithreading. Skimming the source code, it seems like this change would be more or less trivial. Using multiprocessing would have the benefit of working directory isolation, in addition to potential performance improvements for testing CPU-bound notebooks by circumventing the GIL, at the cost of some performance overhead.
Add an option to perform the same directory-switching that I used above that limits the number of parallel threads to one.
Ignore this problem - our workaround is to handle the directory-switching ourselves, running treon with one notebook at a time, while still achieving parallelism with xargs
.
I'm not sure if this is a widespread use case, and I'm happy to take a shot at either of these myself.
Currently we print verbose output which shows every test that's running. It's a good default since,
But verbose output is not always required. Users would sometimes prefer py.test type output which only becomes loud on failure and is silent otherwise. It would be good to have a flag (--quiet/--silent) to only print bare minimum output.
Hi,
I quickly tested your package today with a notebook call draft-treon.ipynb
with the following cell:
! pip install pandas
import pandas as pd
and ran the command:
pip install treon
treon draft-treon.ipynb
But I got the following error message
Executing treon version 0.1.3
Triggered test for draft-treon.ipynb
ERROR in testing draft-treon.ipynb
An error occurred while executing the following cell:
------------------
! pip install pandas
import pandas as pd
------------------
---------------------------------------------------------------------------
ModuleNotFoundError
Traceback (most recent call last)
<ipython-input-1-0da7781ded87> in <module>()
1 get_ipython().system(' pip install pandas')
2
----> 3 import pandas as pd
ModuleNotFoundError: No module named 'pandas'
ModuleNotFoundError: No module named 'pandas'
-----------------------------------------------------------------------
TEST RESULT
-----------------------------------------------------------------------
draft-treon.ipynb -- FAILED
-----------------------------------------------------------------------
0 succeeded, 1 failed, out of 1 notebooks tested.
-----------------------------------------------------------------------
What would be the correct procedure to run the test ?
Thanks,
When running treon on Windows 10 with multiple threads it sometimes stops running because of issues with the underlying jupyter client.
To some extent this is an issue with the jupyter client, and or nbconvert but treon is triggering the issue by calling nbconvert in multiple threads.
The error message and discussion of the jupyter client is at this issue: jupyter/jupyter_client#466
For treon a workaround though would be to use multiple processes instead of threads. Ipython does not seem to be thread-safe as of now but this is being worked on (jupyter/nbconvert#936).
I think it would be nice to autoformat the code written in notebooks using black
and possibly lint with pycodestyle
, etc.
Could support for this be added?
I often (but not always) witness treon output the following error just before exiting:
Exception ignored in: <function BaseEventLoop.__del__ at 0x7fbaf2e4a4c0>
Traceback (most recent call last):
File "/usr/lib64/python3.8/asyncio/base_events.py", line 656, in __del__
self.close()
File "/usr/lib64/python3.8/asyncio/unix_events.py", line 58, in close
super().close()
File "/usr/lib64/python3.8/asyncio/selector_events.py", line 92, in close
self._close_self_pipe()
File "/usr/lib64/python3.8/asyncio/selector_events.py", line 99, in _close_self_pipe
self._remove_reader(self._ssock.fileno())
File "/usr/lib64/python3.8/asyncio/selector_events.py", line 276, in _remove_reader
key = self._selector.get_key(fd)
File "/usr/lib64/python3.8/selectors.py", line 190, in get_key
return mapping[fileobj]
File "/usr/lib64/python3.8/selectors.py", line 71, in __getitem__
fd = self._selector._fileobj_lookup(fileobj)
File "/usr/lib64/python3.8/selectors.py", line 225, in _fileobj_lookup
return _fileobj_to_fd(fileobj)
File "/usr/lib64/python3.8/selectors.py", line 42, in _fileobj_to_fd
raise ValueError("Invalid file descriptor: {}".format(fd))
ValueError: Invalid file descriptor: -1
Treon then goes on to exit with a successful error code, so it's effectively just a warning, even though it is caused by a ValueError
. I've witnessed this on multiple versions of python..
I typically invoke treon as treon . --threads 2
. I suspect this warning might go away if I don't use threads, but I haven't investigated it yet.
When I run treon
I get a RuntimeError telling me that it can only launch a kernel on a local interface.
The error message suggests to me that something is wrong with my configuration settings, but I had expected since I don't see this error when I run ipython
or jupyter notebook
then I also shouldn't see it when I run treon
.
Details
I made a new (conda) virtual environment to install the treon master branch into, with pip install -r requirements-dev.txt
and pip install -e .
Here's the output when I run treon on the example notebooks included with the repository:
(treon-dev) C:\Users\Genevieve\Documents\GitHub\treon>treon
Executing treon version 0.1.3
Recursively scanning C:\Users\Genevieve\Documents\GitHub\treon for notebooks...
Triggered test for C:\Users\Genevieve\Documents\GitHub\treon\tests\resources\basic.ipynb
Triggered test for C:\Users\Genevieve\Documents\GitHub\treon\tests\resources\doctest_failed.ipynb
Triggered test for C:\Users\Genevieve\Documents\GitHub\treon\tests\resources\runtime_error.ipynb
Triggered test for C:\Users\Genevieve\Documents\GitHub\treon\tests\resources\unittest_failed.ipynb
ERROR in testing C:\Users\Genevieve\Documents\GitHub\treon\tests\resources\runtime_error.ipynb
An error occurred while executing the following cell:
------------------
1 / 0
------------------
�[1;31m---------------------------------------------------------------------------�[0m
�[1;31mZeroDivisionError�[0m Traceback (most recent call last)
�[1;32m<ipython-input-1-bc757c3fda29>�[0m in �[0;36m<module>�[1;34m�[0m
�[1;32m----> 1�[1;33m �[1;36m1�[0m �[1;33m/�[0m �[1;36m0�[0m�[1;33m�[0m�[1;33m�[0m�[0m
�[0m
�[1;31mZeroDivisionError�[0m: division by zero
ZeroDivisionError: division by zero
Traceback (most recent call last):
File "C:\Users\Genevieve\anaconda3\envs\treon-dev\Scripts\treon-script.py", line 33, in <module>
sys.exit(load_entry_point('treon', 'console_scripts', 'treon')())
File "c:\users\genevieve\documents\github\treon\treon\treon.py", line 52, in main
trigger_tasks(tasks, thread_count)
File "c:\users\genevieve\documents\github\treon\treon\treon.py", line 71, in trigger_tasks
pool.map(Task.run_tests, tasks)
File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\multiprocessing\pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\multiprocessing\pool.py", line 657, in get
raise self._value
File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\multiprocessing\pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\multiprocessing\pool.py", line 44, in mapstar
return list(map(*args))
File "c:\users\genevieve\documents\github\treon\treon\task.py", line 25, in run_tests
self.is_successful, console_output = execute_notebook(self.file_path)
File "c:\users\genevieve\documents\github\treon\treon\test_execution.py", line 13, in execute_notebook
processor.preprocess(notebook, metadata(path))
File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\nbconvert\preprocessors\execute.py", line 79, in preprocess
self.execute()
File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\nbclient\util.py", line 74, in wrapped
return just_run(coro(*args, **kwargs))
File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\nbclient\util.py", line 53, in just_run
return loop.run_until_complete(coro)
File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\asyncio\base_events.py", line 587, in run_until_complete
return future.result()
File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\nbclient\client.py", line 519, in async_execute
async with self.async_setup_kernel(**kwargs):
File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\async_generator\_util.py", line 34, in __aenter__
return await self._agen.asend(None)
File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\nbclient\client.py", line 477, in async_setup_kernel
await self.async_start_new_kernel(**kwargs)
File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\nbclient\client.py", line 389, in async_start_new_kernel
await ensure_async(self.km.start_kernel(extra_arguments=self.extra_arguments, **kwargs))
File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\nbclient\util.py", line 85, in ensure_async
result = await obj
File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\jupyter_client\manager.py", line 571, in start_kernel
kernel_cmd, kw = self.pre_start_kernel(**kw)
File "C:\Users\Genevieve\anaconda3\envs\treon-dev\lib\site-packages\jupyter_client\manager.py", line 252, in pre_start_kernel
"Currently valid addresses are: %s" % (self.ip, local_ips())
RuntimeError: Can only launch a kernel on a local interface. This one is not: 127.0.0.1.Make sure that the '*_address' attributes are configured properly. Currently valid addresses are: ['49.127.90.25', '172.28.70.145', '192.168.56.1', '192.168.1.103', '192.168.137.1', '172.22.0.1', '0.0.0.0', '']
(treon-dev) C:\Users\Genevieve\Documents\GitHub\treon>
Things I've tried:
--ip=
keyword argument, the same way you might if you ran either ipython or jupyter directory (perhaps unsuprisingly, this isn't supported by treon)According to treon --help
, treon is meant to be invoked as
treon [PATH] [--threads=<number>] [-v] [--exclude=<string>]...
However, it is currently not possible to pass multiple PATHs, unlike many command-line tools (e.g., cat
). If multiple PATHs are provided, the help text is displayed and nothing else happens.
It is possible to exclude multiple paths (#1), but currently, the only way to include multiple paths (e.g., to test multiple notebooks by individual filename or to test multiple directories) is to invoke treon multiple times sequentially, once for each. But this means that the full benefits of multithreading are not available.
Combining Jupyter notebooks with test driven development feels great, treon is really helpful for CI pipelines. Is it possible besides doctest and unittest to also include the pytest framework?
It seems that pytest does not have a drop-in function such as the unittest.main() which executes the current module but requires a filename. Still this would be a nice addition and removes a lot of boilerplate one needs for the unittest framework.
Any interest in merging projects?
https://github.com/timkpaine/jupyterlab_celltests
Usually we have some rough notebooks and some final notebooks in the same repo. We never want to run tests for rough notebooks. It would be great to have .ci_ignore
file that can be used to specify list of notebooks/directories that should be excluded from test suite.
Current workaround is to actually specify path of the notebooks/directories that needs to be included in the test suite.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.