Giter Site home page Giter Site logo

Comments (8)

CiaranOMara avatar CiaranOMara commented on June 1, 2024

The coverage method in GenomicFeatures.jl will give you a pileup.

There is a plan to make this easier to query intersection and overlaps with GenomicFeatures v3, which will use the groupname method instead of the inconsistent seqname method.

from xam.jl.

panxiaoguang avatar panxiaoguang commented on June 1, 2024

The coverage method in GenomicFeatures.jl will give you a pileup.

There is a plan to make this easier to query intersection and overlaps with GenomicFeatures v3, which will use the groupname method instead of the inconsistent seqname method.

However, the coverage method can only get the base numbers in some collections of regions. We use the pileup method to get, for example,how many bases are A, G , C, and T ? How many mismatches or indels ? from the first position to the last position .Yes,we need an easy API like BAM Record.

from xam.jl.

CiaranOMara avatar CiaranOMara commented on June 1, 2024

Oh, I assume you are referring to the Pileup format. Would you provide some references to the functionality/features you are requesting, and we'll see if adding them to this package makes sense?

from xam.jl.

panxiaoguang avatar panxiaoguang commented on June 1, 2024

Oh, I assume you are referring to the Pileup format. Would you provide some references to the functionality/features you are requesting, and we'll see if adding them to this package makes sense?

A good example is python class pipeup, PileupColumn and PileupRead(https://pysam.readthedocs.io/en/latest/api.html#pysam.PileupColumn), But they are using cython, I'm not familiar with that.

from xam.jl.

MillironX avatar MillironX commented on June 1, 2024

I agree this would be extremely useful. I don't think we would need to support the pileup format, per se, but provide a pileup iterator. Basically, a pileup iterator on a SAM.Reader or BAM.Reader should return a new pileup object, something like

@kwdef struct Pileup
    chr
    pos
    ref = 'N'
    count = 0 
    reads::Vector = []
    mapquals::Vector = []
    basequals::Vector = []
end #struct

So iterating through a sorted BAM should return a set like

[
    Pileup("chr1", 1),
    Pileup("chr1", 2),
    Pileup("chr1", 3),
    Pileup("chr2", 1),
    Pileup("chr2", 2),
]

from xam.jl.

CiaranOMara avatar CiaranOMara commented on June 1, 2024

I think part of the ideal solution is to further develop GenomicFeatures.jl/tree/pu/v3.0.0's GenomicIntervalCollection and its overlap iterators. Current development of the overlap iterators work with mixed subtypes of AbstractGenomicInterval, but this restriction could be removed to allow GenomicFeatures's iterators and collections to work directly with XAM records.

from xam.jl.

jonathanBieler avatar jonathanBieler commented on June 1, 2024

I think it would be good to have something like this but it would be even better if it could be made generic enough to accodate things like grouping reads pairs into fragments, e.g if you have to overlapping reads :

R1 : ATTGCGC-----
R2 : ---GTGCATTC

A read-wise pileup at position 5 would return [C = 1 , T = 1] while a fragment-wise at position 4 would return [G = 1] while at position 5 [N=1] or [C = 1 if quality of R1 > quality of R2].

Same things for groups of PCR duplicates, possibly using UMIs.

For this I think you need 1) a (possibly) user-defined grouping criteria ("take R1 and R2 together"), and 2) a (possibly) user-defined reducing/consensus method to compute the pileup number for the grouped reads. Then the interface would like something like this :

 Pileup(FragmentWise(), ReduceWithQuality(minimum_phred=5), "chr1", 1)

That said it might be difficult to make it generic and efficient.

from xam.jl.

apraga avatar apraga commented on June 1, 2024

That would be a great addition. Here's how I count reference and alternate allele it at the moment (there may be better ways, i'm a beginner in Julia):

function getBaseRecord(record, pos)
    i = pos - BAM.position(record)
    BAM.sequence(record)[i+1]
end

function getBase(reader, chrom, pos)
    [getBaseRecord(r, pos) for r in eachoverlap(reader, chrom,pos:pos)]
end

function countSNV(ref, alt)
    ref = length(filter(x -> x == ref, bp))
    alt = length(filter(x -> x == alt, bp))
    total = length(bp)
    (ref, alt, total)
end

Usage:

reader = open(BAM.Reader, bam, index=bai)
bp = getBase(reader,chrom, pos)
close(reader)

(ref, alt, total) = countSNV(convert(DNA, "A"), convert(DNA, "G"))
println("ref=$ref, alt=$alt over $total")

from xam.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.