Comments (6)
Hi Wolfgang, thanks for reporting this. Which version are you using? I pasted in your exact commands in doctest_mode and can't reproduce this:
>>> In [1]: import pybedtools
>>> In [2]: gtf = pybedtools.BedTool("test.gtf")
>>> In [4]: gtf_intervals = list(gtf)
>>> In [5]: gtf_intervals
[Interval(chr5:137924247-137924468), Interval(chr5:137925208-137925388)]
>>> In [24]: new_bt = pybedtools.BedTool(gtf_intervals)
>>> new_bt
[Interval(chr5:137924247-137924468), Interval(chr5:137925208-137925388)]
>>> In [26]: new_bt_merged = new_bt.merge()
>>> In [27]: new_bt_merged
<BedTool(/tmp/pybedtools.kHuwlf.tmp)>
>>> In [28]: new_bt_merged[0]
Interval(chr5:137924247-137924468)
>>> In [29]: new_bt_merged[0].start
137924247L
Can you try this short script to see what results you get?
import pybedtools
x = pybedtools.BedTool('test.gtf')
xm = x.merge()
iter_xm = pybedtools.BedTool(i for i in x).merge()
print x
print xm
print x[0].start
print xm[0].start
print iter_xm[0].start
i get the following:
chr5 ucsc_refseq exon 137924248 137924468 . - . gene_id "13856"; gene_name "Epo"; transcript_id "NM_007942"; tss_id "tss_07393"; exon_number 5
chr5 ucsc_refseq exon 137925209 137925388 . - . gene_id "13856"; gene_name "Epo"; transcript_id "NM_007942"; tss_id "tss_07393"; exon_number 4
chr5 137924247 137924468
chr5 137925208 137925388
137924247
137924247
137924247
from pybedtools.
Hi Ryan,
import pybedtools
pybedtools.version
pybedtools.version
'0.5.5'
mergeBed -h
Program: mergeBed (v2.12.0)
Author: Aaron Quinlan ([email protected])
Summary: Merges overlapping BED/GFF/VCF entries into a single interval.
output from your short script:
chr5 ucsc_refseq exon 137924248 137924468 . -
. gene_id "13856"; gene_name "Epo"; transcript_id "NM_007942";
tss_id "tss_07393"; exon_number 5
chr5 ucsc_refseq exon 137925209 137925388 . -
. gene_id "13856"; gene_name "Epo"; transcript_id "NM_007942";
tss_id "tss_07393"; exon_number 4
chr5 137924248 137924468
chr5 137925209 137925388
137924247
137924248
137924248
Odd.
Thanks,
Wolf
On Tue, Feb 28, 2012 at 4:33 PM, Ryan Dale <
[email protected]
wrote:
Hi Wolfgang, thanks for reporting this. Which version are you using? I
pasted in your exact commands in doctest_mode and can't reproduce this:>>> In [1]: import pybedtools >>> In [2]: gtf = pybedtools.BedTool("test.gtf") >>> In [4]: gtf_intervals = list(gtf) >>> In [5]: gtf_intervals [Interval(chr5:137924247-137924468), Interval(chr5:137925208-137925388)] >>> In [24]: new_bt = pybedtools.BedTool(gtf_intervals) >>> new_bt [Interval(chr5:137924247-137924468), Interval(chr5:137925208-137925388)] >>> In [26]: new_bt_merged = new_bt.merge() >>> In [27]: new_bt_merged <BedTool(/tmp/pybedtools.kHuwlf.tmp)> >>> In [28]: new_bt_merged[0] Interval(chr5:137924247-137924468) >>> In [29]: new_bt_merged[0].start 137924247L
Can you try this short script to see what results you get?
import pybedtools x = pybedtools.BedTool('test.gtf') xm = x.merge() iter_xm = pybedtools.BedTool(i for i in x).merge() print x print xm print x[0].start print xm[0].start print iter_xm[0].starti get the following:
chr5 ucsc_refseq exon 137924248 137924468 . - . gene_id "13856"; gene_name "Epo"; transcript_id "NM_007942"; tss_id "tss_07393"; exon_number 5 chr5 ucsc_refseq exon 137925209 137925388 . - . gene_id "13856"; gene_name "Epo"; transcript_id "NM_007942"; tss_id "tss_07393"; exon_number 4 chr5 137924247 137924468 chr5 137925208 137925388 137924247 137924247 137924247
Reply to this email directly or view it on GitHub:
#53 (comment)
from pybedtools.
Can you pull the latest pybedtools from GitHub? I actually haven't changed the version number since 0.5.5 despite many changes elsewhere in the code base (though I'm about to release 0.6).
You also might want to upgrade BEDTools to 2.15 if possible. It's working on my end, using the github versions of pybedtools and BEDTools, so it's likely this issue has been fixed -- just not in released versions yet.
from pybedtools.
Actually just upgrading to bedtools 2.15 solved it. I had not realized i
was behind the curve by that much...
Thanks for the very quick reply!
Wolf
chr5 ucsc_refseq exon 137924248 137924468 . -
. gene_id "13856"; gene_name "Epo"; transcript_id "NM_007942";
tss_id "tss_07393"; exon_number 5
chr5 ucsc_refseq exon 137925209 137925388 . -
. gene_id "13856"; gene_name "Epo"; transcript_id "NM_007942";
tss_id "tss_07393"; exon_number 4
chr5 137924247 137924468
chr5 137925208 137925388
137924247
137924247
137924247
On Tue, Feb 28, 2012 at 5:03 PM, Ryan Dale <
[email protected]
wrote:
Can you pull the latest pybedtools from GitHub? I actually haven't
changed the version number since 0.5.5 despite many changes elsewhere in
the code base (though I'm about to release 0.6).You also might want to upgrade BEDTools to 2.15 if possible. It's working
on my end, using the github versions of pybedtools and BEDTools, so it's
likely this issue has been fixed -- just not in released versions yet.
Reply to this email directly or view it on GitHub:
#53 (comment)
from pybedtools.
Good to hear.
By the way, itertools.groupBy
and the BedTool.total_coverage
method will probably come in handy for what you're doing . . . just tested this on a mouse GTF file. It should run a lot faster if you can assume the input file is already sorted by gene name.
import pybedtools
import itertools
ex = pybedtools.BedTool('mm9.gtf.chr18')\
.filter(lambda x: x[2]=='exon')\
.saveas()
def key(x):
return x['gene_name']
exons_by_gene = sorted(ex, key=key)
for gene, exons in itertools.groupby(exons_by_gene, key=key):
print gene,
print pybedtools.BedTool(exons).sort().total_coverage()
from pybedtools.
That is indeed better than what i was doing.
On Tue, Feb 28, 2012 at 5:14 PM, Ryan Dale <
[email protected]
wrote:
Good to hear.
By the way,
itertools.groupBy
and theBedTool.total_coverage
method
will probably come in handy for what you're doing . . . just tested this on
a mouse GTF file. It should run a lot faster if you can assume the input
file is already sorted by gene name.import pybedtools import itertools ex = pybedtools.BedTool('mm9.gtf.chr18')\ .filter(lambda x: x[2]=='exon')\ .saveas() def key(x): return x['gene_name'] exons_by_gene = sorted(ex, key=key) for gene, exons in itertools.groupby(exons_by_gene, key=key): print gene, print pybedtools.BedTool(exons).sort().total_coverage()
Reply to this email directly or view it on GitHub:
#53 (comment)
from pybedtools.
Related Issues (20)
- pybedtools.bedtool.BedTool.save_seqs leaves open .tmp files
- Support Python 3.10 and 3.11 HOT 1
- "python setup.py bdist_wheel did not run successfully" when pip installing with python v3.11 HOT 8
- to_dataframe() creates 0th row with generic names in nucleotide_content HOT 2
- build failure under python 3.11 HOT 6
- pybedtools intersect error HOT 2
- Cannot create a BedTool object from list of regions that uses np.int64 coordinates
- remove historical py27 support HOT 1
- bedtools intersect reported incorrect interval intersection HOT 3
- Cythonizing files requires `language_level=2` to be set in cythonize() HOT 4
- pybedtools multi_bam_coverage assistance HOT 2
- "fastaFromBed" error HOT 2
- intersect with multiple -b arguments not working with -sorted HOT 1
- Unable to install pybedtools==0.9.1 in Python3.10 HOT 4
- Len modifying the Bedtools after a filter HOT 2
- Has pybedtools considered packaging bedtools? HOT 3
- how to mask gap regions for randomization? HOT 1
- Issue while doing pip install pybedtools HOT 3
- Inconsistent behaviour when using files from `pathlib.PosixPath` with BedTool functions...
- pybedtools.bedtool.Bedtool.sort()
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pybedtools.