a-slide / pycoqc Goto Github PK

pycoQC computes metrics and generates Interactive QC plots from the sequencing summary report generated by Oxford Nanopore technologies basecaller (Albacore/Guppy)

Home Page: https://a-slide.github.io/pycoQC/

License: GNU General Public License v3.0

Python 95.01% TeX 3.00% Shell 0.54% Jinja 1.46%

jupyter-notebook generates-plots computing-metrics nanopore

pycoqc's Introduction

pycoQC v2.5.2

PycoQC computes metrics and generates interactive QC plots for Oxford Nanopore technologies sequencing data

PycoQC relies on the sequencing_summary.txt file generated by Albacore and Guppy, but if needed it can also generate a summary file from basecalled fast5 files. The package supports 1D and 1D2 runs generated with Minion, Gridion and Promethion devices, basecalled with Albacore 1.2.1+ or Guppy 2.1.3+. PycoQC is written in pure Python3. Python 2 is not supported. For a quick introduction see tutorial by Tim Kahlke available at https://timkahlke.github.io/LongRead_tutorials/QC_P.html

Full documentation is available at https://a-slide.github.io/pycoQC

Gallery

]

Example HTML reports

Example JSON reports

Disclaimer

Please be aware that pycoQC is a research package that is still under development.

It was tested under Linux Ubuntu 16.04 and in an HPC environment running under Red Hat Enterprise 7.1.

Thank you

Classifiers

Development Status :: 3 - Alpha
Intended Audience :: Science/Research
Topic :: Scientific/Engineering :: Bio-Informatics
License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Programming Language :: Python :: 3

licence

GPLv3 (https://www.gnu.org/licenses/gpl-3.0.en.html)

Authors

Adrien Leger & Tommaso Leonardi / [email protected] / https://adrienleger.com

pycoqc's People

Contributors

Stargazers

Watchers

pycoqc's Issues

import sys addition

I know this is a tutorial, and you are using a dataset designed for the tutorial, but it may be worth including a section in the workflow about using a real dataset. It is an easy addition and I know these manuals are often designed for people who understand the code but it may be useful for people who aren't great at coding to include a portion in the readme:
after:
from pycoQC.pycoQC import jprint as print
import sys
example_file_1D = '/path/to/real/sequencing/sequencing_summary.txt'
print(example_file_1D)
/path/to/real/sequencing/sequencing_summary.txt
Then all analysis following this portion will point to real data rather than your preloaded set. I only say this because I have had this installed in my work's cluster and I have had several people ask me why they always get the same output for the data. I then refer them to this or meet with them to show them but it makes it more inclusive.
Thanks

Update usage notebook and setup a mybinder instance for live tests

Add a sequence Length over time plot

Tweak HTML layout

Remove side menu = to the top ?
Add sample name (as a cli option)
Add path to summary file ? maybe.

EXP-NBD114 support

Describe the bug
pycoQC does not seem to support the EXP-NBD114 expansion.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
The generation of the pycoQC report in html, including the distrubution of the reads over the barcodes.

Screenshots

This is the error log.
PARSE DATA FILES
Import raw data from sequencing summary files
3,099,954 reads found in initial file
Import barcode information from barcode summary files
Traceback (most recent call last):
File "/path/Anaconda3-5.1.0/envs/pycoqc/bin/pycoQC", line 10, in
sys.exit(main_pycoQC())
File "/path/Anaconda3-5.1.0/envs/pycoqc/lib/python3.6/site-packages/pycoQC/cli.py", line 169, in main_pycoQC
title=args.title)
File "/path/Anaconda3-5.1.0/envs/pycoqc/lib/python3.6/site-packages/pycoQC/cli.py", line 196, in generate_report
filter_calibration=filter_calibration)
File "/path/Anaconda3-5.1.0/envs/pycoqc/lib/python3.6/site-packages/pycoQC/pycoQC.py", line 94, in init
raise pycoQCError ("File {} does not contain required barcode information".format(fp))
pycoQC.common.pycoQCError: File ./fastq_demux/barcoding_summary.txt does not contain required barcode information

Desktop (please complete the following information):

OS: Linux 3.10.0-957.10.1.el7.x86_64 x86_64
Browser: N.A.
Version: pycoQC-2

Additional context
The same version of pycoQC is processing EXP-NBD104 barcodes flawlessly.

Is there a lack of compatibility?

Minor improvements of read quality and length distribution plots

For the histograms of read quality and length, an option to put a line at the median value could be useful

For the length histogram, an option to graph length in log scale would help people with really long reads

Output_per_time is wrong is sample is lower than real value

Use a scaling factor or deactivate sampling

Add command line interface with a config file that generates an HTML report

Ability to compare multiple runs

Generate simple stats and simplified length / quality plot to compare several runs

The channel activity plot is not working as expected for Promethion data

When loading a Prom Seq summary file there is an issue in how data are collected for the channel activity plot ending up in having a massive array stored in the HTML file

See example of plot below

Although it doesn't prevent pycoQC from working, this needs to be should be solved as the file generated is massive and the plot is mostly empty

Combining several sequencing_summary.txt files

My sequencing was divided into two runs, and I therefore have two summary-files. I tried to combine them by just copy-pasting them together, but pycoQC doesn't count the reads from the second run. Is there a simple way to solve this?

Option for static image generation in HTML report

Large summary files (eg, from Promethion) lead to massive pycoQC report file as data are self contained in the HTML file.

One option to dramatical reduce the size would be to have static images instead of dynamic Js plots. This is apparently feasible but not very straightforward and it requires a package not available through pip: https://plot.ly/python/static-image-export/

@tleonardi could you have a look at this issue when you have a bit of time?

Windows-error with HOME - variable

Describe the bug
When running pycoQC v2.2.2 on Windows I get the following error message:

PARSE CONFIGURATION FILE
Traceback (most recent call last):
  File "c:\users\snorres\appdata\local\continuum\anaconda3\envs\pycoqc\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\snorres\appdata\local\continuum\anaconda3\envs\pycoqc\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\snorres\AppData\Local\Continuum\Anaconda3\envs\pycoQC\Scripts\pycoQC.exe\__main__.py", line 9, in <module>
  File "c:\users\snorres\appdata\local\continuum\anaconda3\envs\pycoqc\lib\site-packages\pycoQC\cli.py", line 169, in main_pycoQC
    title=args.title)
  File "c:\users\snorres\appdata\local\continuum\anaconda3\envs\pycoqc\lib\site-packages\pycoQC\cli.py", line 186, in generate_report
    config_dict = parse_config_file(config)
  File "c:\users\snorres\appdata\local\continuum\anaconda3\envs\pycoqc\lib\site-packages\pycoQC\cli.py", line 280, in parse_config_file
    home=os.environ['HOME']
  File "c:\users\snorres\appdata\local\continuum\anaconda3\envs\pycoqc\lib\os.py", line 725, in __getitem__
    raise KeyError(key) from None
KeyError: 'HOME'

To Reproduce
On Windows:

pycoQC --file sequencing_summary.text

Expected behavior
Run as normal

Desktop (please complete the following information):

Windows 10
Chrome
2.2.2

Additional context
The issue is solved by adding "HOME" as an environmental variable with

set HOME=%USERPROFILE%

as described in https://stackoverflow.com/questions/14742064/python-os-environhome-works-on-idle-but-not-in-a-script

Compatibility with deepbinner classification file

Is your feature request related to a problem? Please describe.
pycoQC is only compatible with guppy barcoder classification files.

Describe the solution you'd like
The ability to pass a barcode classification file from deepbinner.

Export tables from report as *.tsv.

Is your feature request related to a problem? Please describe.
The results in the tables (made by plotly) are enlocked in the *.html file.

Describe the solution you'd like
A button/handle in order to save the tabular results.

Describe alternatives you've considered
Looking into the source.

Additional context
N.A.

Add a distribution of quality scores and read lengths table (as in pycoQC1)

Error when using pycoQC on small files

If the sequencing_summary.txt files only contains a few reads, the binning in __output_over_time_data() fails:

PARSE DATA FILES
GENERATES PLOTS
Traceback (most recent call last):
  File "/home/nanopore/.local/bin/pycoQC", line 11, in <module>
    sys.exit(main())
  File "/home/nanopore/.local/lib/python3.6/site-packages/pycoQC/cli.py", line 108, in main
    verbose_level=args.verbose_level)
  File "/home/nanopore/.local/lib/python3.6/site-packages/pycoQC/cli.py", line 143, in generate_report
    fig = method(**method_args)
  File "/home/nanopore/.local/lib/python3.6/site-packages/pycoQC/pycoQC.py", line 513, in output_over_time
    dd1, ld1 = args=self.__output_over_time_data (all_df, level="reads")
  File "/home/nanopore/.local/lib/python3.6/site-packages/pycoQC/pycoQC.py", line 558, in __output_over_time_data
    t = np.digitize (t, bins=x, right=True)
ValueError: bins must have non-zero length

Push on Pypi to make package easily available

Possible new functionality

A useful addition would be to show cumulative yield over time. This could also report how long into the run 1/4, 1/2, 3/4 of the output was generated.

Another possible addition would be to allow temporal ordering of multiple sequencing runs within an experiment. When a sequencing run is stopped or crashes and then is restarted the runID changes. Currently the analyses that output yield or quality over-time put these multiple runs together when they should (ideally) be consecutive. I don't think it is possible to tell solely from the sequence_summary.txt what the order of the runs was, but maybe the user could specify the order of the runIDs?

update bioconda recipe to v2.2.4

bioconda/bioconda-recipes#15642

Add method to generate a summary file from fast5 in case it is not available

Add Median and n50 to summary table

JOSS paper draft

Hi @tleonardi,
Can you have a look at the last version of the draft in the paper branch and fill in your funding information ?

Thanks

Trouble installing pycoQC

Describe the bug
Hi I am having trouble installing pycoQC.
I followed the instructions and created a venv running Python 3.5.2.
I installed using Option 1: Installation with pip from pypi.
After the installation was completed and I tried to run pycoQC I got the following error:


Traceback (most recent call last):
  File "/data/laurensl/venv/bin/pycoQC", line 6, in <module>
    from pycoQC.__main__ import main_pycoQC
  File "/data/laurensl/venv/lib/python3.5/site-packages/pycoQC/__main__.py", line 18, in <module>
    from pycoQC.pycoQC import pycoQC
  File "/data/laurensl/venv/lib/python3.5/site-packages/pycoQC/pycoQC.py", line 13, in <module>
    from pycoQC.pycoQC_plot import pycoQC_plot
  File "/data/laurensl/venv/lib/python3.5/site-packages/pycoQC/pycoQC_plot.py", line 179
    height:int=300+(30*self.all_df[groupby].nunique()) if groupby else 300
          ^
SyntaxError: invalid syntax

How can I fix this?

Cannot write seq_summary_fn

Describe the bug
I tried creating a report using a FAST5 folder inside a Docker container.

To Reproduce

1.) Use nfcore/bacass:dev container with pycoQC recipe in it.
2.) Try running Fast5_to_seq_summary -f FAST5 -s testme

Fast5_to_seq_summary -f FAST5 -s testme
Traceback (most recent call last):
  File "/opt/conda/envs/nf-core-bacass-1.1.0dev/bin/Fast5_to_seq_summary", line 12, in <module>
    sys.exit(main_Fast5_to_seq_summary())
  File "/opt/conda/envs/nf-core-bacass-1.1.0dev/lib/python3.6/site-packages/pycoQC/cli.py", line 80, in main_Fast5_to_seq_summary
    verbose_level = args.verbose_level)
  File "/opt/conda/envs/nf-core-bacass-1.1.0dev/lib/python3.6/site-packages/pycoQC/Fast5_to_seq_summary.py", line 114, in __init__
    raise pycoQCError ("Cannot write the indicated seq_summary_fn")
pycoQC.common.pycoQCError: Cannot write the indicated seq_summary_fn

Though the directory is writable from inside the container without issues.

touch test_seq
(base) root@083663bc4c59:/home/apeltzer/bacass_test/work/7b/b16ef3592d07d01f0febf80be61d67# ls
FAST5  barcode01_NB01_Burkholderia.fastq  test_seq

Expected behavior
I expect the file to be written out :-)

Desktop (please complete the following information):

OS: [e.g. iOS] CentoS7

x log scale for reads_len_quality

Showing the read length vs quality 2D plot with log scale for the x axis would be nice.

Update Sequencing summary files and drop Albacore v1 support

Add alignment information bam file

It would be a nice addition

Publish realease 2.2.0 prior JOSS submission

When submitting to JOSS update version numberto v2.2.0 (no alpha) release

Fix links in html template

Some of the links in the HTML report are broken.
It would be nice to link this: https://a-slide.github.io/pycoQC/ and fix broken link

Add Guppy barcoding information to plots

Demultiplexed files from Guppy are concatenated in one large text file: barcoding_summary.txt. Current pycoQC can not handle this file

Guppy barcoding summary test file (50.000 random reads):

barcoding_summary_test.txt

Implement Minion / Promethion autodetect for Channel_output

Because CLI breaks if you don't use a modified config file

numpy.dtype size changed

Describe the bug
Upon running the following code an error is reported.

bsub -q bio -o pycoQClog.txt -e pycoQCerror.txt -n 13 pycoQC -f fastq/sequencing_summary.txt -o pycoQC_output.html

path/to/envs/pycoqc/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
path/to/envs/pycoqc/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed
sequencing_summary.txt
, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
path/to/envs/pycoqc/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning:

numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88

Clarification regarding 1D2 vs 2D run

The introductory information refers to the Albacore output of a "2D" run. However this is the tabular output from 1D2 reads. The previous 2D output tables were quite different. See attached file for 2D column names.
2D_column_headers.txt

Reduce size of PromethION .html files

The PycoQC .html files are quite large (~500 mb) for a few PromethION summaries that I've run - it would be great to have an option to reduce this size.

Add linear scale on read length plots

Having the choice between the 2 scales could be useful to visualise specific datasets

Parse Barcode values when available + plot

Barcodes values are indicated in the field barcode_arrangement if Albacore (2.0+?) was called with the option --barcoding

They could be easily parsed and could be used to generate a barcode distribution plot per runid.

Minor improvements of "over time" plotting functions

Add a cumulative yield over time curve
Take into account the different run into the kinetics (One graph per run? or order by start time)

Integration with multiQC

Would be nice to have a report text mode only for integration in multiQC

Add fastq support

Is your feature request related to a problem? Please describe.
No.

Describe the solution you'd like
Be able to pass a fastq file as input rather than a sequencing summary file.

Describe alternatives you've considered
Using other tools? But their plots and reports aren't as nice as yours.

Additional context
I think a big use case for this is when I demultiplex a sample I have a bunch of single fastq files and it would nice to be able to see the quality metrics for just these.
Additionally, I don't always have the sequencing summary file for a fastq.

Dynamic heatmap

Include a dynamic heatmap, similar to Matt Loose flowcellvis
https://github.com/mattloose/flowcellvis

Move to dynamic plotting lib

Such as Bokeh

pycoqc v2.2.4 on conda?

I was wondering if version 2.4 was already on conda. It seems to be but I cannot updat my current version:

pycoqc 2.2.3.3 dev_0

summary of read stats per barcode

HI,

Awesome tool, thanks!

What I would like to see though is the readlength stats, N50 and quality etc split out per barcode. Now this is done per summary file, but an option to recalculate all the readsstats per barcode would be great.

Read length over time does not display the expected data in CLI mode

For some reason the CLI outputs a plot that is different from the one generated through the API.
The API version is correct.

Add table with numbers for barcodes

Suggestions for pycoQC

Just tested pycoQC, here are a few thoughts:

I like the interactive plots but the labels on two of the plots ended up being cluttered, "Distribution of read length” and "Output over experiment time”.
(The Distribution of read length was cluttered because the narrow distribution for the poly(A) run, would probably not be an issue in most cases)

I think you should include an option to generate a new, filtered sequencing_summary.txt file.
I ended up generating these when I was analysing the poly(A) data.

Also it would make more sense if the parameters for —verbose_level were 2,1,0 (0 = silent).

Split pass/fail/calibration instead of filtering out

After the runid filtering, I would be more interesting to split the df in 3 when pass_filtering and calibration information are available, rather than completely discarding the filtered rows.
This would allow to generate overall dataset metrics including the number of pass/fail/calibration per runid.

Get info from the Telemetry json file generated by Albacore ?

Add support for multi-fast5 to Fast5_to_seq_summary

Recent versions of minknow/Guppy use multi-fast5 format (multiple reads per files).
Fast5_to_seq_summary is not compatible with this format at the moment

a-slide / pycoqc Goto Github PK

pycoqc's Introduction

pycoQC v2.5.2

Gallery

Example HTML reports

Example JSON reports

Disclaimer

Classifiers

licence

Authors

pycoqc's People

Contributors

Stargazers

Watchers

Forkers

pycoqc's Issues

Recommend Projects

Recommend Topics

Recommend Org