Giter Site home page Giter Site logo

shamir-lab / recycler Goto Github PK

View Code? Open in Web Editor NEW
58.0 9.0 7.0 7.52 MB

This is the codebase for Recycler, described in our manuscript: https://academic.oup.com/bioinformatics/article/33/4/475/2623362, by Roye Rozov, Aya Brown Kav, David Bogumil, Naama Shterzer, Eran Halperin, Itzhak Mizrahi, and Ron Shamir

License: BSD 3-Clause "New" or "Revised" License

Python 49.24% Makefile 17.90% C 32.57% Shell 0.28%
assembly-graph plasmid metagenomes recycler denovo assembly circular-sequences

recycler's Issues

''Node not found'' issue

Hi

Im trying to use recycler for identifying circular elements from a mobilome (extracted plasmids from an environmental sample)

when i run the command: recycle.py -g ../assembly_graph.fastg -k 127 -b reads_pe_primary.sort.bam -i False -o .

I get the error back after a little time:

Traceback (most recent call last):
File "/usr/local/bin/recycle.py", line 4, in
import('pkg_resources').run_script('recycler==0.62', 'recycle.py')
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 658, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 1445, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python2.7/dist-packages/recycler-0.62-py2.7.egg/EGG-INFO/scripts/recycle.py", line 173, in

File "build/bdist.linux-x86_64/egg/recyclelib/utils.py", line 340, in is_good_cyc
File "build/bdist.linux-x86_64/egg/recyclelib/utils.py", line 324, in get_contigs_of_mates
File "/usr/local/lib/python2.7/dist-packages/networkx/algorithms/shortest_paths/generic.py", line 40, in has_path
nx.shortest_path(G, source, target)
File "/usr/local/lib/python2.7/dist-packages/networkx/algorithms/shortest_paths/generic.py", line 170, in shortest_path
paths = nx.bidirectional_shortest_path(G, source, target)
File "/usr/local/lib/python2.7/dist-packages/networkx/algorithms/shortest_paths/unweighted.py", line 223, in bidirectional_shortest_path
raise nx.NodeNotFound(msg.format(source, target))
networkx.exception.NodeNotFound: Either source EDGE_10343701_length_848_cov_11.642164' or target EDGE_10682581_length_5431_cov_2.680430 is not in G

Thanks in advance for your help :)

recycler output

Hi,

I am trying to use your program. There is no description on the output so I am not sure if I am doing correctly. My output is a text file with all the EDGEs, for ex:
('EDGE_132_length_60_cov_85', 'EDGE_623_length_203_cov_23.7027').

Is this suppose to be the output? Do you have a way where I can extract the sequence from the fastg file?

Thanks,
Cheng

RuntimeError: dictionary changed size during iteration

Good morning Shamir and community,

I am trying to apply Recycler over my SPAdes graph but I keep getting this error:

Traceback (most recent call last):
File "/usr/local/home/lmt243/anaconda3/pkgs/recycler-0.6.2-py27_0/bin/recycle.py", line 70, in
G.remove_nodes_from(nx.isolates(G))
File "/usr/local/home/lmt243/.local/lib/python2.7/site-packages/networkx-2.1-py2.7.egg/networkx/classes/digraph.py", line 555, in remove_nodes_from
for n in nodes:
File "/usr/local/home/lmt243/.local/lib/python2.7/site-packages/networkx-2.1-py2.7.egg/networkx/algorithms/isolate.py", line 94, in
return (n for n, d in G.degree() if d == 0)
File "/usr/local/home/lmt243/.local/lib/python2.7/site-packages/networkx-2.1-py2.7.egg/networkx/classes/reportviews.py", line 368, in iter
for n in self._nodes:
RuntimeError: dictionary changed size during iteration

I know this error has been reported and closed before, but I have tried the solution described there and I still cannot solve this error (downgrading Python networkX from 2.0 to 1.11)

System: Ubuntu Linux

numpy 1.14.0
networkx 2.0
pysam 0.11.2.2
nose 1.3.7
BWA 0.7.17
samtools 1.7
SPAdes 3.11.1

Recycler and these associated dependencies were installed through conda.

Thanks a lot in advance, if you need more info, please, drop me a line!

Best,
Juanma

UPDATE: Recycler seems to work installing the associated dependencies with conda and cloning the last version from here, as dpellow suggested. I did not updated before because the script is still running,may have got stuck at some point. But the script.py launches with no problem.

ImportError: No module named recyclelib.utils

I recently had to reinstall pysam (using conda) and now Recycler is not working for some reason... Here is what I am trying to run in the terminal:
Recycler/bin/recycle.py -g Downloads/SPAdes/ThesisRecycler/assembly_graph.fastg -k 77 -b Downloads/SPAdes/ThesisRecycler/reads_pe_primary.sort.bam

This is the error I am getting:
Traceback (most recent call last):
File "Recycler/bin/recycle.py", line 4, in
from recyclelib.utils import *
ImportError: No module named recyclelib.utils

I am running MacOSX. I am very new to Python/programming, any help would be appreciated.

make_fasta_from_fastg.py :: incorrect help message

Hello

bigmess:Recycler-0.62/paper > make_fasta_from_fastg.py -h 
usage: make_fasta_from_fastg.py [-h] -g GRAPH

recycle extracts cycles likely to be plasmids from metagenome and genome
assembly graphs

should'nt be something like
fastq to fasta converter

regards

Samtools sort changed

Hi,

when I use the following command samtools sort reads_pe_primary.bam reads_pe_primary.sort.bam
I get this error:

[bam_sort] Use -T PREFIX / -o FILE to specify temporary and final output files
Usage: samtools sort [options...] [in.bam]
Options:
-l INT Set compression level, from 0 (uncompressed) to 9 (best)
-m INT Set maximum memory per thread; suffix K/M/G recognized [768M]
-n Sort by read name
-t TAG Sort by value of TAG. Uses position as secondary index (or read name if -n is set)
-o FILE Write final output to FILE rather than standard output
-T PREFIX Write temporary files to PREFIX.nnnn.bam
--no-PG do not add a PG line
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
-O, --output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
-@, --threads INT
Number of additional threads to use [0]
--verbosity INT
Set level of verbosity

When I add -o like this samtools sort reads_pe_primary.bam -o reads_pe_primary.sort.bam it is working.
My samtools version is 1.15.

RuntimeWarning: invalid value encountered in sqrt

This warning occurs for some of my files but not all.

~/anaconda3/lib/python3.6/site-packages/recycler-0.62-py3.6.egg/recyclelib/utils.py:178: RuntimeWarning: invalid value encountered in sqrt

after I issue the recycle.py command, It will run through several "# nodes remain in component" and eventually get stuck somewhere. I've tried letting it run for several hours, but it never progresses.

output sequence fasta

Hello,
I would like to know if the .cycs.fasta file is meant to contain the sequences for each of the predicted plasmids of a given genome and if not how do you obtain this?

I've tried:
python recycle.py -g assembly_graph.fastg -k 77 -b p9.SA7518_S79_R1_001.bam -i True > p9.cycs.fasta

With my current output being:
154.187 2112.15358778 312.39175
================== path, coverage levels when added ====================
6 nodes remain in component

2 nodes remain in component

('EDGE_134_length_79_cov_402.5', 'EDGE_110_length_91_cov_243.929', 'EDGE_61_length_58956_cov_140.228')
before [402.5, 243.929, 140.228]
after [262.10192547616219, 103.5309254761622, 0]
474 nodes remain in component

('EDGE_270_length_49661_cov_147.474', 'EDGE_201_length_95_cov_266.333', 'EDGE_134_length_79_cov_402.5', 'EDGE_109_length_112_cov_164.886', 'EDGE_242_length_92_cov_256.467', 'EDGE_90_length_33867_cov_154.374', 'EDGE_278_length_203_cov_192.841', 'EDGE_181_length_13154_cov_136.653', "EDGE_269_length_103_cov_258.308'")
before [147.474, 266.333, 262.10192547616219, 164.886, 256.467, 154.374, 192.841, 136.653, 258.308]
after [0, 117.66025606000321, 113.42918153616537, 16.213256060003175, 107.79425606000316, 5.7012560600031748, 44.168256060003188, 0, 109.63525606000317]
470 nodes remain in component

470 nodes remain in component

6 nodes remain in component

15 nodes remain in component

15 nodes remain in component

2 nodes remain in component

==================final_paths identities after updates: ================
('EDGE_270_length_49661_cov_147.474', 'EDGE_201_length_95_cov_266.333', 'EDGE_134_length_79_cov_402.5', 'EDGE_109_length_112_cov_164.886', 'EDGE_242_length_92_cov_256.467', 'EDGE_90_length_33867_cov_154.374', 'EDGE_278_length_203_cov_192.841', 'EDGE_181_length_13154_cov_136.653', "EDGE_269_length_103_cov_258.308'")

('EDGE_134_length_79_cov_402.5', 'EDGE_110_length_91_cov_243.929', 'EDGE_61_length_58956_cov_140.228')

('EDGE_88_length_3692_cov_10938.4',)

("EDGE_13_length_2024_cov_1430.94'",)

Thank you,
Emily

Length of circular contigs

Hey

Just a minor issue.

The length of the resulting circular contigs does not appear to match the length written in the headers. The length in the headers are always a little larger than the length of the contig. The mismatch in lengths appear to always be a multiple of 86, at least in my case.

I use version v0.7 with a fastg from MEGAHIT

Cheers,
Jakob

AttributeError: Digraph object

Hi @dpellow

I tried running the recycler with the command below and I got an error. Please advice

recycle.py -g assembly_graph.fastg -k 50 -b ".mybam -o mydir

Error message
143.506632 6899.693496211945 258.95512825000003
Traceback (most recent call last):
File "/home/user/.local/bin/recycle.py", line 4, in
import('pkg_resources').run_script('recycler==0.62', 'recycle.py')
File "/opt/apps/Python/Python-3.5/lib/python3.5/site-packages/pkg_resources/init.py", line 658, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/opt/apps/Python/Python-3.5/lib/python3.5/site-packages/pkg_resources/init.py", line 1445, in run_script
exec(script_code, namespace, namespace)
File "/home/user/.local/lib/python3.5/site-packages/recycler-0.62-py3.5.egg/EGG-INFO/scripts/recycle.py", line 100, in
File "/home/user/.local/lib/python3.5/site-packages/recycler-0.62-py3.5.egg/recyclelib/utils.py", line 205, in get_long_self_loops
AttributeError: 'DiGraph' object has no attribute 'nodes_with_selfloops'

error creating the BAM file

Hi I'm trying to create the BAM file and it gives an error.
I have installed recycler using bioconda.

This is the error:

Danielas-Mac:Recycler niazevedo$ make_fasta_from_fastg.py -g assembly_graph1.fastg
Traceback (most recent call last):
File "/Users/niazevedo/anaconda/bin/make_fasta_from_fastg.py", line 4, in
import('pkg_resources').run_script('recycler==0.6', 'make_fasta_from_fastg.py')
File "/Users/niazevedo/anaconda/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/init.py", line 744, in run_script
File "/Users/niazevedo/anaconda/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/init.py", line 1506, in run_script
File "/Users/niazevedo/anaconda/lib/python2.7/site-packages/recycler-0.6-py2.7.egg/EGG-INFO/scripts/make_fasta_from_fastg.py", line 3, in
requires = 'recycler==0.6'
File "/Users/niazevedo/anaconda/bin/recycle.py", line 4, in
import('pkg_resources').run_script('recycler==0.6', 'recycle.py')
File "/Users/niazevedo/anaconda/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/init.py", line 744, in run_script
File "/Users/niazevedo/anaconda/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/init.py", line 1506, in run_script
File "/Users/niazevedo/anaconda/lib/python2.7/site-packages/recycler-0.6-py2.7.egg/EGG-INFO/scripts/recycle.py", line 5, in

File "/Users/niazevedo/anaconda/lib/python2.7/site-packages/pysam/init.py", line 5, in
from pysam.libchtslib import *
ImportError: dlopen(/Users/niazevedo/anaconda/lib/python2.7/site-packages/pysam/libchtslib.so, 2): Library not loaded: @rpath/libhts.1.dylib
Referenced from: /Users/niazevedo/anaconda/lib/python2.7/site-packages/pysam/libchtslib.so
Reason: image not found

error after running recycle.py

recycle.py -g assembly_graph.fastg -k 55 -b reads_pe_primary.sort.bam
I got the below erro
Traceback (most recent call last):
File "/home/omnah486/miniconda2/bin/recycle.py", line 4, in
import('pkg_resources').run_script('recycler==0.6', 'recycle.py')
File "/home/omnah486/miniconda2/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/init.py", line 744, in run_script
File "/home/omnah486/miniconda2/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/init.py", line 1506, in run_script
File "/home/omnah486/miniconda2/lib/python2.7/site-packages/recycler-0.6-py2.7.egg/EGG-INFO/scripts/recycle.py", line 62, in

AttributeError: 'module' object has no attribute 'AlignmentFile'

not an issue: small suggestion for recycler.py

I thought it would be useful (for debugging purposes, for example) to add date and time to the recycler.py script.
Thus, i added to my local recycler.py, on line 194

print(str(datetime.now().strftime('%Y-%m-%d %H:%M:%S')) + " --> " + str(len(COMP.nodes())) + " nodes remain in component")

with the obvious

from datetime import datetime

Hope this can be useful

Support for assembly graphs in .gfa format

Many assemblers create assembly graphs as .gfa file.
Are there plans to support this file format in the future? Can you recommend any conversion tools in the meantime?
Thanks for creating this tool!

ZeroDivisionError: float division by zero

Hello. Thanks for introducing such a wonderful tool.
Recently I tried to apply Recycler in several isolate sequence data, yet some of them got this error. Any idea how I fix this?
Traceback (most recent call last): File "Recycler/bin/recycle.py", line 131, in <module> paths = enum_high_mass_shortest_paths(COMP) File "recyclelib/utils.py", line 242, in enum_high_mass_shortest_paths G.add_edge(e[0], e[1], cost = 1./get_spades_base_mass(G, e[0])) ZeroDivisionError: float division by zero

Fastg file

Congratulations for the script.

I used it to predict plasmids in a new assembly with SPAdes and PlasmidSPAdes and had no problems.

I wonder if I can make the same prediction, for example, with an output of CISA with different assemblies.

make_fasta_from_fastg.py

Hi

The make_fasta_from_fastg.py script pops an error when I try to apply it to a Megahit produced fastg file. It is asking for a bam file while that command should not require one based on your example on how to create the BAM file.
image

make_fasta_from_fastg.py -g assembly_graph.fastg

Running it with the BAM file (reads vs contigs used to create fastg) gives this error:
image

Thanks

Ruben

recycler.py doesn't end (metagenome data)

Hi!

I am trying run a metagenome data in Recycler, however the run starts well but never ends (no error is given). I had no problems making the bam file. The terminal looks like this:
"
("NODE_280594_length_1837_cov_7.6655_ID_561187'", 'NODE_662693_length_2433_cov_6.4467_ID_1325385', "NODE_1079610_length_163_cov_6.5000_ID_2159219'", 'NODE_11346_length_524_cov_7.3073_ID_22691')
before [7.6655, 6.4467, 6.5, 7.3073]
after [0.6738814017310526, 0, 0, 0.3156814017310525]
2019-10-04 12:21:56 4 nodes remain in component
2019-10-04 12:21:57 2 nodes remain in component
2019-10-04 12:21:57 3 nodes remain in component
2019-10-04 12:21:57 2 nodes remain in component
2019-10-04 12:21:57 6 nodes remain in component
2019-10-04 12:21:57 15 nodes remain in component"

It ran for 3 days and did not leave this step.

I used Recycle with plasmid data and it went well.

mapping input contig names to output

Given that I get a contig name like RNODE_1_length_2185_cov_1230.39000, how would I know which contig that is in the original file? I can search the coverage field 1230.39 to find the original contig name NODE_18_length_2240_cov_1230.39_component_1 but I hope there is a better way. Even in the stdout, the contig name is different as EDGE_703_length_2240_cov_1230.39_component_1.

This output is from plasmid spades. Sorry, I don't know if/when I would have the time to do a head to head comparison, regarding my previous issue #8.

Recycler with Bowtie2 and MEGAHIT

Hi,

Not an issue really, I just want to let you know that Recycler worked seamlessly for me with using Bowtie2 (2.3.4.1) for creating the BAM file and MEGAHIT (1.1.3) for the fastg file.

Thank you for a great tool.

Cheers,
Jakob

Recycle.py: Runtime error

When I try to execute recycle.py, I get the following error:

Traceback (most recent call last):
File "/home/imss/.local/bin/recycle.py", line 4, in
import('pkg_resources').run_script('recycler==0.62', 'recycle.py')
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 738, in run_script
File "build/bdist.linux-x86_64/egg/pkg_resources/init.py", line 1506, in run_script
File "/home/imss/.local/lib/python2.7/site-packages/recycler-0.62-py2.7.egg/EGG-INFO/scripts/recycle.py", line 83, in

File "/home/imss/.local/lib/python2.7/site-packages/networkx-2.0b2-py2.7.egg/networkx/classes/digraph.py", line 535, in remove_nodes_from
for n in nbunch:
File "/home/imss/.local/lib/python2.7/site-packages/networkx-2.0b2-py2.7.egg/networkx/algorithms/isolate.py", line 94, in
return (n for n, d in G.degree() if d == 0)
File "/home/imss/.local/lib/python2.7/site-packages/networkx-2.0b2-py2.7.egg/networkx/classes/reportviews.py", line 367, in iter
for n in self._nodes:
RuntimeError: dictionary changed size during iteration

System: Ubuntu Linux
For what it's worth, I was able to get everything up and running on my personal comp (mac osx), but I'm hoping to use this [unfortunately problematic] computer as it is much faster.

minor typo in 'Preparing the BAM input:'

Using samtools Version: 0.1.19

samtools sort reads_pe_primary.bam > reads_pe_primary.sort.bam

I believe should be ..

samtools sort reads_pe_primary.bam reads_pe_primary.sort

test

Using Recycler-master.zip with python3 and try to get tests run.
Introduce following patch:
test.txt

Can you provide JJ1886_pe_primary.sort.bam for testing ?

Support for GFA graph format?

Hello

Is there any chance to add support for GFA graph format? Currently as I understand Recycler only works with FASTG format
New tools like Unicycler (basically a Spades assembler optimizer for isolates) output only in GFA format.

Best,
Vadim

No output produced

I have got the recent pull of Recycler installed. The contigs/scaffolds were produced by SPAdes-3.11.1. The kmer size etc were all under 55 etc. Recycler is NOT generating any outputs. The BAM file was produced as per the tutorial.

Any ideas what could be the issues? I am have got about 7 contigs from the assembly ( SPAdes) >500bp ( largest is about 1500bp). Simply mapping these contigs to a reference via BLAST, I have got a really good score and %identity. Also the edges do overlaps. However, no output from Recycler.

best assembler for the job?

Hi, I was wondering what assembler the authors recommend before running Recycler. It seems like spades is recommended, but then I was also wondering what parameters you would recommend. Is there some set of parameters that, even though it might be worse for the chromosome, might be better for discovering the plasmid?

Also: Would you also recommend running spades with --plasmid as input for Recycler?

I have a problem when I used recycler

This was the commad: python recycle.py -g assembly_graph.fastg -k 55 -b reads_pe_primary.sort.bam
but I got the below error
================== path, coverage levels when added ====================
Traceback (most recent call last):
File "recycle.py", line 131, in
last_node_count = len(COMP.nodes())
TypeError: object of type 'generator' has no len()
what should be done to solve this problem?
Thanks

Empty output files

Hi everyone,

I am running recycler and I am getting empty outputs. I am no sure what I am doing wrong because all the steps before running Recycler seem to run fine. I am sure that there is a plasmid in my sequencing data, so I am not sure why the empty outputs. Could you please send me some tips to troubleshoot this? Thanks!

Laura

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.