Giter Site home page Giter Site logo

check for off-by-1 errors about pybedtools HOT 7 CLOSED

daler avatar daler commented on July 28, 2024
check for off-by-1 errors

from pybedtools.

Comments (7)

daler avatar daler commented on July 28, 2024

good idea.

from pybedtools.

brentp avatar brentp commented on July 28, 2024

so all starts in bedFile.h are converted to 0-based (gff, vcf have 1 subtracted) and ends are unchanged. So everything will have BED-like coordinates.
Nowhere in cbedtools.pyx do we account for this. so,

import pybedtools
b = pybedtools.BedTool('pybedtools/test/data/c.gff')
d = iter(b).next()
print d
d.start = d.start
print d

gives:
chr1 ucb gene 465 805 . + . ID=thaliana_1_465_805;match=scaffold_801404.1;rname=thaliana_1_465_805
chr1 ucb gene 464 805 . + . ID=thaliana_1_465_805;match=scaffold_801404.1;rname=thaliana_1_465_805

so the start has changed. Thoughts on how to address this?

from pybedtools.

daler avatar daler commented on July 28, 2024

Phew, nice catch.

It seems that the fundamental issue is that there are 2 different 'start' values -- one is in _bed.start and the other is in _bed.fields[idx].

import pybedtools
b = pybedtools.BedTool('pybedtools/test/data/c.gff')
d = iter(b).next()
print d.fields[3] # 465
print d.start     # 464

Is there a good reason for bedFile.h to subtract 1?

from pybedtools.

brentp avatar brentp commented on July 28, 2024

| Is there a good reason for bedFile.h to subtract 1?

then it makes things like .length work the same for all.
maybe it's easiest just to have the .start property setter do:

if self.file_type != "bed": start -= 1

will that solve everything?

from pybedtools.

daler avatar daler commented on July 28, 2024

looks like it won't . . .
the start -= 1 will continuously decrement the start position of d if you do:

d.start = d.start
d.start = d.start
etc

i just added a test for starts to be able to check this, 50974ad

from pybedtools.

daler avatar daler commented on July 28, 2024

OK, I think this is fixed now. I assumed these conventions:

  • Interval.start always contains the 0-based coord. This makes things like len() internally consistent for all feature types, and leaves bedFile.h alone.
  • even when setting Interval.start for a GFF feature, the user-provided value provided is still assumed to be 0-based
  • Interval.fields always contains the "string representation" to make the Interval a valid line for whatever format it represents. So for GFF features, Interval.fields[3] will always contain the 1-based start position
  • For GFF files, when you set .start, Interval.fields[3] is updated to be str(start + 1)

If a user tries to use .fields[3] for something, hopefully the fact that they have to do an extra int() on it will be a reminder that it's different than the already-as-an-int Interval.start.

(see c0163ce for this, which includes BED- and GFF-specific tests)

from pybedtools.

brentp avatar brentp commented on July 28, 2024

closing as this is tested and documented (thanks @daler).

from pybedtools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.