Giter Site home page Giter Site logo

Comments (5)

daler avatar daler commented on July 28, 2024

Good point. This is something tricky that comes up a lot when you have files with multiple features in one that overlap a single feature in another and then try to make a Venn diagram -- which has no category for "multiple hits".

The reason you run into this issue is that the BedTool.__add__ method uses the u=True argument to intersectBed. (__subtract__ uses v=True to be symmetrical). If you have nested features (2 features in b that overlap a feature in a) then you'll run into this issue.

For example, what should the 2-way Venn diagram look like for these files? It's not really defined what should go in that middle overlap section of the diagram:

a.bed  -------------        --------------------------
b.bed      -----------               -----     ------

Using u=True:

$ intersectBed -a a.bed -b b.bed -u | wc -l
2

$ intersectBed -a b.bed -b a.bed -u | wc -l
3

Not using u=True results in another issue -- the total number of features for a overlapping with b is greater than the number of features in a in the first place:

$ intersectBed -a a.bed -b b.bed | wc -l
3

$ intersectBed -a b.bed -b a.bed | wc -l
3

This latter version can result in the sum of the each circle in the Venn diagram being huge and having no relation to the original number of features.

Unfortunately, Venn diagrams aren't ideal for overlapping cases like this. In my applications it made sense to use the u=True case. Do you think it makes more sense to use the u=False?

If you'd like to play around, you can subclass BedTool and overwrite its __add__ method to do what you want:

class MyBedTool(BedTool):
    def __init__(self, *args, **kwargs):
        BedTool.__init__(self, *args, **kwargs)
    def __add__(self, other):
        return self.intersect(other)

Any suggestions on how to improve this? I'd imagine others will run into this as well, so I'll at least have to add this explanation to the docs.

from pybedtools.

brentp avatar brentp commented on July 28, 2024

I don't see how to get around this issue that:

(a + b + c) != (b + c + a)

but, we could provide a class method like:

BedTool.intersect_all(*beds, **kwargs)

from pybedtools.

daler avatar daler commented on July 28, 2024

so something like that recent post on the bedtools list:

def intersect_all(beds, **kwargs):
    """
    Successively intersect all files in `beds`, passing kwargs
    to BedTool.intersect().

    Note that if kwargs like `u=True` or `v=True` are used, the order 
    of the list will determine the final results
    """
    x = BedTool(beds[0])
    for bed in beds[1:]:
        x = x.intersect(bed, **kwargs)
    return x

from pybedtools.

brentp avatar brentp commented on July 28, 2024

Won't that have the same problem -- it'll depend on the order of the inputs?

Maybe first you need to do something smart like cat, then merge to get a base.
Then intersect each successively with base?

from pybedtools.

daler avatar daler commented on July 28, 2024

Yep, as noted in the docstring. Really it just passes the buck, leaving decisions up to the caller rather than hard-coding in the overridden __add__.

But with no kwargs, it should do the plain ol' intersect (without -u), which should be commutative. I'll have to think some more about a cat/merge/intersect strategy.

from pybedtools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.