Giter Site home page Giter Site logo

cojac's People

Contributors

dr-david avatar dryak avatar jahnka avatar kpj avatar larafuhrmann avatar mcarrara-bioinfo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cojac's Issues

Paired end 150?

Hi,

I just read your paper introducing this tool- very well done!

I'm getting set up to run COJAC myself on our own data. The amplicon was generated with the ARTIC 3 method, so we should be set on that front. However, sequencing was paired end 150, rather than 250. I know that this will create some gaps within reads, but am still curious to see how this algorithm performs. Are there any settings I will need to adjust to make COJAC run on our data?

Thanks

issue with overlapping amplicons

I run cojac on wastewater samples which are sequenced using NimagenV3 primer amplicons. It worked well for most variant profile however I encountered this error while using a profile which has co-occurrence in amplicon 120:
File "/usr/local/bin/cooc-mutbamscan2", line 245, in
table[sample]=scanbam(alnfname, amplicons)
File "/usr/local/bin/cooc-mutbamscan2", line 155, in scanbam
amplicon_iter = alnfile.fetch(rq_chr, rq_b, rq_e)
File "pysam/libcalignmentfile.pyx", line 1091, in pysam.libcalignmentfile.AlignmentFile.fetch
File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (22867) > stop (22846)

and here are the corresponding NimagenV3 insert coordinates:
amplicon_119 [22595-22837]
amplicon_120 [22804-22976]
amplicon_121 [22876-23119]
amplicon_122 [23072-23288]
According to the cooc-mutbamscan script, the boundaries are defined by +30 from previous amplicon end (22867) and -30 from next amplicon start (22846) which lead to this error.
Could you implement a fix for it?
Irene Bassano with you had contact with about cojac used to work with me. Thanks Hubert

add the notebooks

we need to create a folder notebooks/ and upload there the jupyter / rstudio notebooks that were used in the publication.

and add documentation describing them.

it's less urgent, but it's a nice to have.

yaml module not found

Hello,
I installed cojac using conda within a virtual environment (mac monterey)
conda create -n cojac
conda activate cojac
conda install cojac
but am now encountering the following error when testing:
cooc-mutbamscan --help
Traceback (most recent call last):
File "/Users/susanne/opt/anaconda3/envs/cojac/bin/cooc-mutbamscan", line 10, in
import yaml
ModuleNotFoundError: No module named 'yaml'
I tried to install yaml via conda and pip, as well as homebrew. When I just use python within the cojac environment I can import the module, but I can't appear to solve the error.
Thank you and any help is appreciated,
Susanne

blank cooc.test.json file

Hi,
I was trying to use cojac for analysing SARS-CoV-2 sequenced wastewater samples.
When I was checking for co-occurrance using a standalone file,
(cojac) opentrons-b10@AirdiOptronsB10 cojac % cooc-mutbamscan -b QIAseqDIRECTSARSCoV2ampliconsfinal.bed -m voc/ -a read_mapping_21BL54986-TO_S3_L001.bam -j cooc-test.json

This message appears:

insertions not supported (yet): +ACT : ACT
insertions not supported (yet): +GAGCCAGAA : GAGCCAGAA
insertions not supported (yet): +AACA : AACA
autodecting reference as MN908947.3

and the cooc-test.json is empty:

Schermata 2023-01-11 alle 17 00 49

What do you reckon is the problem here?
Many thanks for the help and for this amazing pipeline!!

cooc-curate TypeError: 'NoneType' object is not iterable

Hello,
I am trying to use cojac for my research and encountering the issue.
When I run in command line
cooc-curate voc/omicron_ba2_mutations.yaml (this command is also mentioned as an example in https://github.com/cbg-ethz/cojac#input-data-requirements)
or
cooc-curate voc/omicron_ba2_mutations.yaml voc/omicron_ba1_mutations.yaml voc/delta_mutations.yaml -a amplicons.yaml -m
(where amplicons.yaml is produced by cooc-nutbamscan)

In both cases I get the error

File ".../bin/cooc-curate", line 140, in listalllineages
    return {s['pangoLineage']:s['count'] for s in aggregated(fields='pangoLineage')}
TypeError: 'NoneType' object is not iterable

Can you please help me?
Thanks in advance.

Best wishes,
Polina

updated yamls files

Hi, I am new to Cojac, it's a great tool, thanks!!

  1. I was wondering where I can find an updated yamsl file with the list of all the variant? For this comand
    -m DIR, --vocdir DIR directory containing the yamls defining the variant of concerns

  2. usually what's a good number to chose for -# COOC, --cooc COOC minimum number of cooccurences to search for?

Thanks a lot!!
Carlotta

TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'

Hello,

I am attempting to use cojac for a research project and have created a conda environment (conda version 4.13.0) for this, but when supplying *.bam files to test the package I have encountered an error.

First I will list my conda environment for your reference in case you notice important versioning differences (cojac and pysam in bold ):

packages in environment at /home/evan/anaconda3/envs/cojac:

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
brotli 1.0.9 h5eee18b_7
brotli-bin 1.0.9 h5eee18b_7
brotlipy 0.7.0 py39h27cfd23_1003
bzip2 1.0.8 h7b6447c_0
c-ares 1.18.1 h7f8727e_0
ca-certificates 2022.07.19 h06a4308_0
certifi 2022.6.15 py39h06a4308_0
cffi 1.15.1 py39h74dc2b5_0
charset-normalizer 2.0.4 pyhd3eb1b0_0
cojac 0.2 hdfd78af_0 bioconda
colorama 0.4.5 py39h06a4308_0
cryptography 37.0.1 py39h9ce1e76_0
cycler 0.11.0 pyhd3eb1b0_0
dbus 1.13.18 hb2f20db_0
expat 2.4.8 h27087fc_0 conda-forge
fontconfig 2.14.0 h8e229c2_0 conda-forge
fonttools 4.25.0 pyhd3eb1b0_0
freetype 2.11.0 h70c0345_0
gettext 0.21.0 hf68c758_0
giflib 5.2.1 h7b6447c_0
glib 2.72.1 h6239696_0 conda-forge
glib-tools 2.72.1 h6239696_0 conda-forge
gst-plugins-base 1.14.0 h8213a91_2
gstreamer 1.14.0 h28cd5cc_2
icu 58.2 he6710b0_3
idna 3.3 pyhd3eb1b0_0
jpeg 9e h7f8727e_0
kiwisolver 1.4.2 py39h295c915_0
krb5 1.19.2 hac12032_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.38 h1181459_1
libblas 3.9.0 15_linux64_openblas conda-forge
libbrotlicommon 1.0.9 h5eee18b_7
libbrotlidec 1.0.9 h5eee18b_7
libbrotlienc 1.0.9 h5eee18b_7
libcblas 3.9.0 15_linux64_openblas conda-forge
libclang 10.0.1 default_hb85057a_2
libcurl 7.84.0 h91b91d3_0
libdeflate 1.10 h7f98852_0 conda-forge
libedit 3.1.20210910 h7f8727e_0
libev 4.33 h7f8727e_1
libevent 2.1.12 h8f2d780_0
libffi 3.4.2 h295c915_4
libgcc-ng 12.1.0 h8d9b700_16 conda-forge
libgfortran-ng 11.2.0 h00389a5_1
libgfortran5 11.2.0 h1234567_1
libglib 2.72.1 h2d90d5f_0 conda-forge
libgomp 12.1.0 h8d9b700_16 conda-forge
libiconv 1.16 h7f8727e_2
liblapack 3.9.0 15_linux64_openblas conda-forge
libllvm10 10.0.1 hbcb73fb_5
libnghttp2 1.46.0 hce63b2e_0
libnsl 2.0.0 h7f98852_0 conda-forge
libopenblas 0.3.20 h043d6bf_1
libpng 1.6.37 hbc83047_0
libpq 12.9 h16c4e8d_3
libssh2 1.10.0 h8f2d780_0
libstdcxx-ng 12.1.0 ha89aaad_16 conda-forge
libtiff 4.2.0 h2818925_1
libuuid 2.32.1 h7f98852_1000 conda-forge
libwebp 1.2.2 h55f646e_0
libwebp-base 1.2.2 h7f8727e_0
libxcb 1.15 h7f8727e_0
libxkbcommon 1.0.1 hfa300c1_0
libxml2 2.9.14 h74e7548_0
libxslt 1.1.35 h4e12654_0
libzlib 1.2.12 h166bdaf_2 conda-forge
lz4-c 1.9.3 h295c915_1
matplotlib 3.5.2 py39hf3d152e_1 conda-forge
matplotlib-base 3.5.2 py39h700656a_1 conda-forge
munkres 1.1.4 py_0
ncurses 6.3 h5eee18b_3
nspr 4.33 h295c915_0
nss 3.74 h0370c37_0
numpy 1.23.1 py39hba7629e_0 conda-forge
openssl 1.1.1q h7f8727e_0
packaging 21.3 pyhd3eb1b0_0
pandas 1.4.3 py39h1832856_0 conda-forge
pcre 8.45 h295c915_0
pillow 9.2.0 py39hace64e9_1
pip 22.1.2 py39h06a4308_0
ply 3.11 py39h06a4308_0
pycparser 2.21 pyhd3eb1b0_0
pyopenssl 22.0.0 pyhd3eb1b0_0
pyparsing 3.0.4 pyhd3eb1b0_0
pyqt 5.15.7 py39h6a678d5_1
pyqt5-sip 12.11.0 py39h6a678d5_1
pysam 0.19.1 py39h5030a8b_0 bioconda
pysocks 1.7.1 py39h06a4308_0
python 3.9.13 h9a8a25e_0_cpython conda-forge
python-dateutil 2.8.2 pyhd3eb1b0_0
python_abi 3.9 2_cp39 conda-forge
pytz 2022.1 py39h06a4308_0
pyyaml 6.0 py39hb9d737c_4 conda-forge
qt-main 5.15.2 h327a75a_6
qt-webengine 5.15.9 hd2b0992_4
qtwebkit 5.212 h4eab89a_4
readline 8.1.2 h7f8727e_1
requests 2.28.1 pyhd8ed1ab_0 conda-forge
ruamel.yaml 0.17.4 py39h3811e60_0 conda-forge
ruamel.yaml.clib 0.2.6 py39h7f8727e_0
setuptools 61.2.0 py39h06a4308_0
sip 6.6.2 py39h6a678d5_0
six 1.16.0 pyhd3eb1b0_1
sqlite 3.39.0 h5082296_0
strictyaml 1.6.1 pyhd8ed1ab_0 conda-forge
tk 8.6.12 h1ccaba5_0
toml 0.10.2 pyhd3eb1b0_0
tornado 6.1 py39h27cfd23_0
tqdm 4.64.0 pyhd8ed1ab_0 conda-forge
tzdata 2022a hda174b7_0
urllib3 1.26.11 py39h06a4308_0
wheel 0.37.1 pyhd3eb1b0_0
xz 5.2.5 h7f8727e_1
yaml 0.2.5 h7b6447c_0
zlib 1.2.12 h7f8727e_2
zstd 1.5.2 ha4553b6_0

So firstly, the command I used was:

cooc-mutbamscan -b ../SARS-CoV-2.insert.V4.txt -m ../voc/ -a NIRE-002f80_S267.sorted.bam NIRE-002f81_S268.sorted.bam -j ../out/cooc-test.json

The bams and bam indices are attached as .txt files for your testing.

NIRE-002f81_S268.sorted.bam.bai.txt
NIRE-002f81_S268.sorted.bam.txt
NIRE-002f80_S267.sorted.bam.bai.txt
NIRE-002f80_S267.sorted.bam.txt

I used the SARS-CoV-2.insert.V4.txt as from GitHub (and also tested V3 in case I was wrong about the primer version used).
The voc/ folder is mirrored from GitHub.

The output was:
insertions not supported (yet): +AACA : AACA
insertions not supported (yet): +ACT : ACT
insertions not supported (yet): +GAGCCAGAA : GAGCCAGAA
76_B16173 is identical to 76_ka
77_ga is identical to 77_be
76_de is identical to 76_AY42
92_de is identical to 92_AY42
78_BA1 is identical to 78_BA2
79_BA1 is identical to 79_BA2
82_BA1 is identical to 82_om1
88_BA1 is identical to 88_om1
94_BA1 is identical to 94_BA2
95_BA1 is identical to 95_BA2
autodecting reference as MN908947.3
amplicon_31_om2 9336 9467 {9424: 'G', 9534: 'T'} amplion: 410
Traceback (most recent call last):
File "/home/evan/anaconda3/envs/cojac/bin/cooc-mutbamscan", line 479, in
main()
File "/home/evan/anaconda3/envs/cojac/bin/cooc-mutbamscan", line 451, in main
table[sample]=scanbam(alnfname, amplicons,rq_chr)
File "/home/evan/anaconda3/envs/cojac/bin/cooc-mutbamscan", line 184, in scanbam
amp_results[amp_name] = scanamplicon(amplicon_iter, mut_dict)
File "/home/evan/anaconda3/envs/cojac/bin/cooc-mutbamscan", line 134, in scanamplicon
(t_pos, t_read) = test_read(R, mut_dict)
File "/home/evan/anaconda3/envs/cojac/bin/cooc-mutbamscan", line 82, in test_read
found_site = [p for p,m in mut_dict.items() if read.reference_start <= (p-1) <= (read.reference_end-len(m))]
File "/home/evan/anaconda3/envs/cojac/bin/cooc-mutbamscan", line 82, in
found_site = [p for p,m in mut_dict.items() if read.reference_start <= (p-1) <= (read.reference_end-len(m))]
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'

The program execution issue may be resolved by rolling back some to previous versions of some packages/using an earlier version of python, so if you could please advise me as to this, I would appreciate it. Alternatively if you can discern something about my input files that makes cojac unsuitable to analyse them, this would also be useful information.

I also get a similar output when running the V3 primer:
cooc-mutbamscan -b ../nCoV-2019.insert.V3.bed -m ../voc/ -a NIRE-002f80_S267.sorted.bam NIRE-002f81_S268.sorted.bam -j ../out/cooc-test.json
insertions not supported (yet): +AACA : AACA
insertions not supported (yet): +ACT : ACT
insertions not supported (yet): +GAGCCAGAA : GAGCCAGAA
76_B16173 is identical to 76_ka
76_ga is identical to 76_be
76_de is identical to 76_AY42
91_de is identical to 91_AY42
78_BA1 is identical to 78_BA2
93_BA1 is identical to 93_BA2
autodecting reference as MN908947.3
amplicon_31_om2 9275 9472 {9424: 'G', 9534: 'T'} amplion: 460
Traceback (most recent call last):
File "/home/evan/anaconda3/envs/cojac/bin/cooc-mutbamscan", line 479, in
main()
File "/home/evan/anaconda3/envs/cojac/bin/cooc-mutbamscan", line 451, in main
table[sample]=scanbam(alnfname, amplicons,rq_chr)
File "/home/evan/anaconda3/envs/cojac/bin/cooc-mutbamscan", line 184, in scanbam
amp_results[amp_name] = scanamplicon(amplicon_iter, mut_dict)
File "/home/evan/anaconda3/envs/cojac/bin/cooc-mutbamscan", line 134, in scanamplicon
(t_pos, t_read) = test_read(R, mut_dict)
File "/home/evan/anaconda3/envs/cojac/bin/cooc-mutbamscan", line 82, in test_read
found_site = [p for p,m in mut_dict.items() if read.reference_start <= (p-1) <= (read.reference_end-len(m))]
File "/home/evan/anaconda3/envs/cojac/bin/cooc-mutbamscan", line 82, in
found_site = [p for p,m in mut_dict.items() if read.reference_start <= (p-1) <= (read.reference_end-len(m))]
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'

All the best,
Evan

Issue with MutationOccurrenceVisualization notebook

When I run JupyterLab's notebook MutationOccurrenceVisualization I get this error:

ValueError                                Traceback (most recent call last)
<ipython-input-19-5431d1030684> in <module>
      5         print(sample, amplicon, amplicon_data)
      6 
----> 7         max_entry = max(amplicon_data['sites'])  # corresponds to all mutations
      8 
      9         if max_entry not in amplicon_data['muts']:

ValueError: max() arg is an empty sequence

I don´t kwon how to solve this isseu.
This is my data, which I have obtained after running V-pipe and getting the cooc-test.json file.
issue1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.