Giter Site home page Giter Site logo

Comments (6)

daler avatar daler commented on July 28, 2024

Hi Wolfgang, thanks for reporting this. Which version are you using? I pasted in your exact commands in doctest_mode and can't reproduce this:

>>> In [1]: import pybedtools
>>> In [2]: gtf = pybedtools.BedTool("test.gtf")
>>> In [4]: gtf_intervals = list(gtf)
>>> In [5]: gtf_intervals
[Interval(chr5:137924247-137924468), Interval(chr5:137925208-137925388)]
>>> In [24]: new_bt = pybedtools.BedTool(gtf_intervals)
>>> new_bt
[Interval(chr5:137924247-137924468), Interval(chr5:137925208-137925388)]
>>> In [26]: new_bt_merged = new_bt.merge()
>>> In [27]: new_bt_merged
<BedTool(/tmp/pybedtools.kHuwlf.tmp)>
>>> In [28]: new_bt_merged[0]
Interval(chr5:137924247-137924468)
>>> In [29]: new_bt_merged[0].start
137924247L

Can you try this short script to see what results you get?

import pybedtools
x = pybedtools.BedTool('test.gtf')
xm = x.merge()
iter_xm = pybedtools.BedTool(i for i in x).merge()
print x
print xm
print x[0].start
print xm[0].start
print iter_xm[0].start

i get the following:

chr5    ucsc_refseq exon    137924248   137924468   .   -   .   gene_id "13856"; gene_name "Epo"; transcript_id "NM_007942"; tss_id "tss_07393"; exon_number 5
chr5    ucsc_refseq exon    137925209   137925388   .   -   .   gene_id "13856"; gene_name "Epo"; transcript_id "NM_007942"; tss_id "tss_07393"; exon_number 4

chr5    137924247   137924468
chr5    137925208   137925388

137924247
137924247
137924247

from pybedtools.

wresch avatar wresch commented on July 28, 2024

Hi Ryan,

import pybedtools
pybedtools.version
pybedtools.version
'0.5.5'


mergeBed -h

Program: mergeBed (v2.12.0)
Author: Aaron Quinlan ([email protected])
Summary: Merges overlapping BED/GFF/VCF entries into a single interval.


output from your short script:
chr5 ucsc_refseq exon 137924248 137924468 . -
. gene_id "13856"; gene_name "Epo"; transcript_id "NM_007942";
tss_id "tss_07393"; exon_number 5
chr5 ucsc_refseq exon 137925209 137925388 . -
. gene_id "13856"; gene_name "Epo"; transcript_id "NM_007942";
tss_id "tss_07393"; exon_number 4

chr5 137924248 137924468
chr5 137925209 137925388

137924247
137924248
137924248

Odd.

Thanks,
Wolf

On Tue, Feb 28, 2012 at 4:33 PM, Ryan Dale <
[email protected]

wrote:

Hi Wolfgang, thanks for reporting this. Which version are you using? I
pasted in your exact commands in doctest_mode and can't reproduce this:

>>> In [1]: import pybedtools
>>> In [2]: gtf = pybedtools.BedTool("test.gtf")
>>> In [4]: gtf_intervals = list(gtf)
>>> In [5]: gtf_intervals
[Interval(chr5:137924247-137924468), Interval(chr5:137925208-137925388)]
>>> In [24]: new_bt = pybedtools.BedTool(gtf_intervals)
>>> new_bt
[Interval(chr5:137924247-137924468), Interval(chr5:137925208-137925388)]
>>> In [26]: new_bt_merged = new_bt.merge()
>>> In [27]: new_bt_merged
<BedTool(/tmp/pybedtools.kHuwlf.tmp)>
>>> In [28]: new_bt_merged[0]
Interval(chr5:137924247-137924468)
>>> In [29]: new_bt_merged[0].start
137924247L

Can you try this short script to see what results you get?

import pybedtools
x = pybedtools.BedTool('test.gtf')
xm = x.merge()
iter_xm = pybedtools.BedTool(i for i in x).merge()
print x
print xm
print x[0].start
print xm[0].start
print iter_xm[0].start

i get the following:

chr5    ucsc_refseq     exon    137924248       137924468       .       -
      .       gene_id "13856"; gene_name "Epo"; transcript_id "NM_007942";
tss_id "tss_07393"; exon_number 5
chr5    ucsc_refseq     exon    137925209       137925388       .       -
      .       gene_id "13856"; gene_name "Epo"; transcript_id "NM_007942";
tss_id "tss_07393"; exon_number 4

chr5    137924247       137924468
chr5    137925208       137925388

137924247
137924247
137924247

Reply to this email directly or view it on GitHub:
#53 (comment)

from pybedtools.

daler avatar daler commented on July 28, 2024

Can you pull the latest pybedtools from GitHub? I actually haven't changed the version number since 0.5.5 despite many changes elsewhere in the code base (though I'm about to release 0.6).

You also might want to upgrade BEDTools to 2.15 if possible. It's working on my end, using the github versions of pybedtools and BEDTools, so it's likely this issue has been fixed -- just not in released versions yet.

from pybedtools.

wresch avatar wresch commented on July 28, 2024

Actually just upgrading to bedtools 2.15 solved it. I had not realized i
was behind the curve by that much...

Thanks for the very quick reply!

Wolf

chr5 ucsc_refseq exon 137924248 137924468 . -
. gene_id "13856"; gene_name "Epo"; transcript_id "NM_007942";
tss_id "tss_07393"; exon_number 5
chr5 ucsc_refseq exon 137925209 137925388 . -
. gene_id "13856"; gene_name "Epo"; transcript_id "NM_007942";
tss_id "tss_07393"; exon_number 4

chr5 137924247 137924468
chr5 137925208 137925388

137924247
137924247
137924247

On Tue, Feb 28, 2012 at 5:03 PM, Ryan Dale <
[email protected]

wrote:

Can you pull the latest pybedtools from GitHub? I actually haven't
changed the version number since 0.5.5 despite many changes elsewhere in
the code base (though I'm about to release 0.6).

You also might want to upgrade BEDTools to 2.15 if possible. It's working
on my end, using the github versions of pybedtools and BEDTools, so it's
likely this issue has been fixed -- just not in released versions yet.


Reply to this email directly or view it on GitHub:
#53 (comment)

from pybedtools.

daler avatar daler commented on July 28, 2024

Good to hear.

By the way, itertools.groupBy and the BedTool.total_coverage method will probably come in handy for what you're doing . . . just tested this on a mouse GTF file. It should run a lot faster if you can assume the input file is already sorted by gene name.

import pybedtools
import itertools

ex = pybedtools.BedTool('mm9.gtf.chr18')\
        .filter(lambda x: x[2]=='exon')\
        .saveas()

def key(x):
    return x['gene_name']

exons_by_gene = sorted(ex, key=key)
for gene, exons in itertools.groupby(exons_by_gene, key=key):
    print gene,
    print pybedtools.BedTool(exons).sort().total_coverage()

from pybedtools.

wresch avatar wresch commented on July 28, 2024

That is indeed better than what i was doing.

On Tue, Feb 28, 2012 at 5:14 PM, Ryan Dale <
[email protected]

wrote:

Good to hear.

By the way, itertools.groupBy and the BedTool.total_coverage method
will probably come in handy for what you're doing . . . just tested this on
a mouse GTF file. It should run a lot faster if you can assume the input
file is already sorted by gene name.

import pybedtools
import itertools

ex = pybedtools.BedTool('mm9.gtf.chr18')\
       .filter(lambda x: x[2]=='exon')\
       .saveas()

def key(x):
   return x['gene_name']

exons_by_gene = sorted(ex, key=key)
for gene, exons in itertools.groupby(exons_by_gene, key=key):
   print gene,
   print pybedtools.BedTool(exons).sort().total_coverage()

Reply to this email directly or view it on GitHub:
#53 (comment)

from pybedtools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.