Comments (5)
Good point. This is something tricky that comes up a lot when you have files with multiple features in one that overlap a single feature in another and then try to make a Venn diagram -- which has no category for "multiple hits".
The reason you run into this issue is that the BedTool.__add__
method uses the u=True
argument to intersectBed
. (__subtract__
uses v=True
to be symmetrical). If you have nested features (2 features in b that overlap a feature in a) then you'll run into this issue.
For example, what should the 2-way Venn diagram look like for these files? It's not really defined what should go in that middle overlap section of the diagram:
a.bed ------------- --------------------------
b.bed ----------- ----- ------
Using u=True
:
$ intersectBed -a a.bed -b b.bed -u | wc -l
2
$ intersectBed -a b.bed -b a.bed -u | wc -l
3
Not using u=True
results in another issue -- the total number of features for a
overlapping with b
is greater than the number of features in a
in the first place:
$ intersectBed -a a.bed -b b.bed | wc -l
3
$ intersectBed -a b.bed -b a.bed | wc -l
3
This latter version can result in the sum of the each circle in the Venn diagram being huge and having no relation to the original number of features.
Unfortunately, Venn diagrams aren't ideal for overlapping cases like this. In my applications it made sense to use the u=True
case. Do you think it makes more sense to use the u=False
?
If you'd like to play around, you can subclass BedTool and overwrite its __add__
method to do what you want:
class MyBedTool(BedTool):
def __init__(self, *args, **kwargs):
BedTool.__init__(self, *args, **kwargs)
def __add__(self, other):
return self.intersect(other)
Any suggestions on how to improve this? I'd imagine others will run into this as well, so I'll at least have to add this explanation to the docs.
from pybedtools.
I don't see how to get around this issue that:
(a + b + c) != (b + c + a)
but, we could provide a class method like:
BedTool.intersect_all(*beds, **kwargs)
from pybedtools.
so something like that recent post on the bedtools list:
def intersect_all(beds, **kwargs):
"""
Successively intersect all files in `beds`, passing kwargs
to BedTool.intersect().
Note that if kwargs like `u=True` or `v=True` are used, the order
of the list will determine the final results
"""
x = BedTool(beds[0])
for bed in beds[1:]:
x = x.intersect(bed, **kwargs)
return x
from pybedtools.
Won't that have the same problem -- it'll depend on the order of the inputs?
Maybe first you need to do something smart like cat, then merge to get a base.
Then intersect each successively with base?
from pybedtools.
Yep, as noted in the docstring. Really it just passes the buck, leaving decisions up to the caller rather than hard-coding in the overridden __add__
.
But with no kwargs, it should do the plain ol' intersect (without -u), which should be commutative. I'll have to think some more about a cat/merge/intersect strategy.
from pybedtools.
Related Issues (20)
- pybedtools.bedtool.BedTool.save_seqs leaves open .tmp files
- Support Python 3.10 and 3.11 HOT 1
- "python setup.py bdist_wheel did not run successfully" when pip installing with python v3.11 HOT 8
- to_dataframe() creates 0th row with generic names in nucleotide_content HOT 2
- build failure under python 3.11 HOT 6
- pybedtools intersect error HOT 2
- Cannot create a BedTool object from list of regions that uses np.int64 coordinates
- remove historical py27 support HOT 1
- bedtools intersect reported incorrect interval intersection HOT 3
- Cythonizing files requires `language_level=2` to be set in cythonize() HOT 4
- pybedtools multi_bam_coverage assistance HOT 2
- "fastaFromBed" error HOT 2
- intersect with multiple -b arguments not working with -sorted HOT 1
- Unable to install pybedtools==0.9.1 in Python3.10 HOT 4
- Len modifying the Bedtools after a filter HOT 2
- Has pybedtools considered packaging bedtools? HOT 3
- how to mask gap regions for randomization? HOT 1
- Issue while doing pip install pybedtools HOT 3
- Inconsistent behaviour when using files from `pathlib.PosixPath` with BedTool functions...
- pybedtools.bedtool.Bedtool.sort()
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pybedtools.