Giter Site home page Giter Site logo

Comments (4)

daler avatar daler commented on July 28, 2024

It's possible this has been fixed in the development version. Try installing pybedtools from the github repository (see http://packages.python.org/pybedtools/main.html#installation-for-developers-or-for-running-tests) to see if that helps.

If it still doesn't work, then could you paste the relevant part of the script here so I can debug?

from pybedtools.

olgabot avatar olgabot commented on July 28, 2024

Hi Ryan,
The development version from github is the one I'm using. Below is the relevant code. FYI cl.args is the command line arguments. I think the main problem is the multiple calls to bed-creation and sequence-fetching.
Thanks,
Olga

import random as r
import pybedtools
import itertools
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
        all_transcripts = pybedtools.BedTool(cl.args['bed'])
        random_id = r.randint(100, 999)
        fastq1 = open(str(cl.args['n_reads'])+"reads_"+str(random_id)+"-1.fastq", "wb")
        fastq2 = open(str(cl.args['n_reads'])+"reads_"+str(random_id)+"-2.fastq", "wb")
        for i in range(cl.args['n_reads']):
            fragment_size = 0 # fragment size
            txpt = r.choice(all_transcripts) # random transcript
            fragment_size = int(r.gauss(cl.args['fragment_size'], 
                            cl.args['fragment_size_var']))
            if fragment_size < cl.args['min']:
                fragment_size = cl.args['min']
            elif fragment_size > cl.args['max']:
                fragment_size = cl.args['max']

            if fragment_size > txpt.stop-txpt.start:
                fragment_size = txpt.stop-txpt.start
            # print 'fragment size:', fragment_size
            read_start = r.randint(txpt.start, txpt.stop-fragment_size+1)
            read_name = ":" + ":".join([txpt.name, 
                                    str(read_start), 
                                    str(fragment_size),
                                    str(r.random())])
            if txpt.strand == '+':
                interval1 = " ".join([txpt.chrom, 
                                    str(read_start),
                                    str(read_start+cl.args['read1_length']),
                                    txpt.strand])
                interval2 = " ".join([txpt.chrom, 
                                    str(read_start+fragment_size-cl.args['read2_length']), 
                                    str(read_start+fragment_size),
                                    txpt.strand])
            else:
                interval1 = " ".join([txpt.chrom, 
                                    str(read_start+fragment_size-cl.args['read1_length']),
                                    str(read_start+fragment_size),
                                    txpt.strand])
                interval2 = " ".join([txpt.chrom, 
                                    str(read_start),
                                    str(read_start+cl.args['read2_length']),
                                    txpt.strand])
            # print 'making BED'
            bed1 = pybedtools.BedTool(interval1, from_string=True)
            bed2 = pybedtools.BedTool(interval2, from_string=True)

            # print 'Getting sequence'
            seq1 = bed1.sequence(cl.args['fasta']).print_sequence().split()
            seq2 = bed2.sequence(cl.args['fasta']).print_sequence().split()

            # print "making SeqRecord"
            rec1 = SeqRecord(Seq(seq1[1]), 
                id="".join([seq1[0][1:], read_name, "/1"]))
            rec2 = SeqRecord(Seq(seq2[1]), 
                id="".join([seq2[0][1:], read_name, "/2"]))

            # print 'Assigning PHRED quality scores'
            rec1.letter_annotations["phred_quality"] = \
                list(itertools.repeat(40, len(seq1[1])))
            rec2.letter_annotations["phred_quality"] = \
                list(itertools.repeat(40, len(seq2[1])))

            # print 'writing sequences'
            SeqIO.write(rec1, fastq1, "fastq")
            SeqIO.write(rec2, fastq2, "fastq")

            if i > 0 and i % 1000 == 0:
                print 'finished generating', i, 'paired-end reads'

        fastq1.close()
        fastq2.close()

from pybedtools.

daler avatar daler commented on July 28, 2024

Yep, by making 2 BedTools for every iteration, you'll quickly hit the open file limit of your OS. There's a more efficient strategy that I should make more prominent in the docs: you can create a BedTool out of an iterable of Interval objects. Here's a skeleton of an example that uses a generator of random Intervals:

import pybedtools


def make_intervals(n):
    for i in range(n):
        # Do stuff to get chrom/start/stop/strand for each end
        #
        # Note that start and stop should be ints and that name and score
        # fields must be provided if you want to use strand

        # Read 1
        yield pybedtools.create_interval_from_list(
                [chrom, start1, stop1, 'read_%s/1' % i, '.', strand1])

       # Read 2
        yield pybedtools.create_interval_from_list(
                [chrom, start2, stop2, 'read_%s/2' % i, '.', strand2])

# Create BedTool out of the generator.
bed = pybedtools.BedTool(make_intervals(1000))

# Use name=True to use feature names created in generator
bed_with_seq = bed.sequence(fi='genome.fa', fo='random_reads.fa', name=True)

# Do stuff to create fastq1 and fastq2 filehandles

# Parse the FASTA file, edit SeqRecords, and write out to file
records = SeqIO.parse('random_reads.fa', format='fastq')
for i, record in enumerate(records):
    # Do stuff to records -- editing names, adding qualscores, etc, based on
    # whether it's read 1 or read 2
    if i % 2 == 0:  # even, so first end
        # do stuff
        SeqIO.write(record, fastq1, 'fastq')
    else:  # odd, so second end
        # do stuff
        SeqIO.write(record, fastq2, 'fastq')

fastq1.close()
fastq2.close()

from pybedtools.

olgabot avatar olgabot commented on July 28, 2024

Thank you for the speedy and thorough response!

from pybedtools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.