Comments (4)
It's possible this has been fixed in the development version. Try installing pybedtools from the github repository (see http://packages.python.org/pybedtools/main.html#installation-for-developers-or-for-running-tests) to see if that helps.
If it still doesn't work, then could you paste the relevant part of the script here so I can debug?
from pybedtools.
Hi Ryan,
The development version from github is the one I'm using. Below is the relevant code. FYI cl.args is the command line arguments. I think the main problem is the multiple calls to bed-creation and sequence-fetching.
Thanks,
Olga
import random as r
import pybedtools
import itertools
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
all_transcripts = pybedtools.BedTool(cl.args['bed'])
random_id = r.randint(100, 999)
fastq1 = open(str(cl.args['n_reads'])+"reads_"+str(random_id)+"-1.fastq", "wb")
fastq2 = open(str(cl.args['n_reads'])+"reads_"+str(random_id)+"-2.fastq", "wb")
for i in range(cl.args['n_reads']):
fragment_size = 0 # fragment size
txpt = r.choice(all_transcripts) # random transcript
fragment_size = int(r.gauss(cl.args['fragment_size'],
cl.args['fragment_size_var']))
if fragment_size < cl.args['min']:
fragment_size = cl.args['min']
elif fragment_size > cl.args['max']:
fragment_size = cl.args['max']
if fragment_size > txpt.stop-txpt.start:
fragment_size = txpt.stop-txpt.start
# print 'fragment size:', fragment_size
read_start = r.randint(txpt.start, txpt.stop-fragment_size+1)
read_name = ":" + ":".join([txpt.name,
str(read_start),
str(fragment_size),
str(r.random())])
if txpt.strand == '+':
interval1 = " ".join([txpt.chrom,
str(read_start),
str(read_start+cl.args['read1_length']),
txpt.strand])
interval2 = " ".join([txpt.chrom,
str(read_start+fragment_size-cl.args['read2_length']),
str(read_start+fragment_size),
txpt.strand])
else:
interval1 = " ".join([txpt.chrom,
str(read_start+fragment_size-cl.args['read1_length']),
str(read_start+fragment_size),
txpt.strand])
interval2 = " ".join([txpt.chrom,
str(read_start),
str(read_start+cl.args['read2_length']),
txpt.strand])
# print 'making BED'
bed1 = pybedtools.BedTool(interval1, from_string=True)
bed2 = pybedtools.BedTool(interval2, from_string=True)
# print 'Getting sequence'
seq1 = bed1.sequence(cl.args['fasta']).print_sequence().split()
seq2 = bed2.sequence(cl.args['fasta']).print_sequence().split()
# print "making SeqRecord"
rec1 = SeqRecord(Seq(seq1[1]),
id="".join([seq1[0][1:], read_name, "/1"]))
rec2 = SeqRecord(Seq(seq2[1]),
id="".join([seq2[0][1:], read_name, "/2"]))
# print 'Assigning PHRED quality scores'
rec1.letter_annotations["phred_quality"] = \
list(itertools.repeat(40, len(seq1[1])))
rec2.letter_annotations["phred_quality"] = \
list(itertools.repeat(40, len(seq2[1])))
# print 'writing sequences'
SeqIO.write(rec1, fastq1, "fastq")
SeqIO.write(rec2, fastq2, "fastq")
if i > 0 and i % 1000 == 0:
print 'finished generating', i, 'paired-end reads'
fastq1.close()
fastq2.close()
from pybedtools.
Yep, by making 2 BedTools for every iteration, you'll quickly hit the open file limit of your OS. There's a more efficient strategy that I should make more prominent in the docs: you can create a BedTool out of an iterable of Interval objects. Here's a skeleton of an example that uses a generator of random Intervals:
import pybedtools
def make_intervals(n):
for i in range(n):
# Do stuff to get chrom/start/stop/strand for each end
#
# Note that start and stop should be ints and that name and score
# fields must be provided if you want to use strand
# Read 1
yield pybedtools.create_interval_from_list(
[chrom, start1, stop1, 'read_%s/1' % i, '.', strand1])
# Read 2
yield pybedtools.create_interval_from_list(
[chrom, start2, stop2, 'read_%s/2' % i, '.', strand2])
# Create BedTool out of the generator.
bed = pybedtools.BedTool(make_intervals(1000))
# Use name=True to use feature names created in generator
bed_with_seq = bed.sequence(fi='genome.fa', fo='random_reads.fa', name=True)
# Do stuff to create fastq1 and fastq2 filehandles
# Parse the FASTA file, edit SeqRecords, and write out to file
records = SeqIO.parse('random_reads.fa', format='fastq')
for i, record in enumerate(records):
# Do stuff to records -- editing names, adding qualscores, etc, based on
# whether it's read 1 or read 2
if i % 2 == 0: # even, so first end
# do stuff
SeqIO.write(record, fastq1, 'fastq')
else: # odd, so second end
# do stuff
SeqIO.write(record, fastq2, 'fastq')
fastq1.close()
fastq2.close()
from pybedtools.
Thank you for the speedy and thorough response!
from pybedtools.
Related Issues (20)
- pybedtools.bedtool.BedTool.save_seqs leaves open .tmp files
- Support Python 3.10 and 3.11 HOT 1
- "python setup.py bdist_wheel did not run successfully" when pip installing with python v3.11 HOT 8
- to_dataframe() creates 0th row with generic names in nucleotide_content HOT 2
- build failure under python 3.11 HOT 6
- pybedtools intersect error HOT 2
- Cannot create a BedTool object from list of regions that uses np.int64 coordinates
- remove historical py27 support HOT 1
- bedtools intersect reported incorrect interval intersection HOT 3
- Cythonizing files requires `language_level=2` to be set in cythonize() HOT 4
- pybedtools multi_bam_coverage assistance HOT 2
- "fastaFromBed" error HOT 2
- intersect with multiple -b arguments not working with -sorted HOT 1
- Unable to install pybedtools==0.9.1 in Python3.10 HOT 4
- Len modifying the Bedtools after a filter HOT 2
- Has pybedtools considered packaging bedtools? HOT 3
- how to mask gap regions for randomization? HOT 1
- Issue while doing pip install pybedtools HOT 3
- Inconsistent behaviour when using files from `pathlib.PosixPath` with BedTool functions...
- pybedtools.bedtool.Bedtool.sort()
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pybedtools.