Comments (8)
The coverage
method in GenomicFeatures.jl will give you a pileup.
There is a plan to make this easier to query intersection and overlaps with GenomicFeatures v3, which will use the groupname
method instead of the inconsistent seqname
method.
from xam.jl.
The
coverage
method in GenomicFeatures.jl will give you a pileup.There is a plan to make this easier to query intersection and overlaps with GenomicFeatures v3, which will use the
groupname
method instead of the inconsistentseqname
method.
However, the coverage
method can only get the base numbers in some collections of regions. We use the pileup method to get, for example,how many bases are A, G , C, and T ? How many mismatches or indels ? from the first position to the last position .Yes,we need an easy API like BAM Record.
from xam.jl.
Oh, I assume you are referring to the Pileup format. Would you provide some references to the functionality/features you are requesting, and we'll see if adding them to this package makes sense?
from xam.jl.
Oh, I assume you are referring to the Pileup format. Would you provide some references to the functionality/features you are requesting, and we'll see if adding them to this package makes sense?
A good example is python class pipeup
, PileupColumn
and PileupRead
(https://pysam.readthedocs.io/en/latest/api.html#pysam.PileupColumn), But they are using cython, I'm not familiar with that.
from xam.jl.
I agree this would be extremely useful. I don't think we would need to support the pileup format, per se, but provide a pileup iterator. Basically, a pileup iterator on a SAM.Reader
or BAM.Reader
should return a new pileup object, something like
@kwdef struct Pileup
chr
pos
ref = 'N'
count = 0
reads::Vector = []
mapquals::Vector = []
basequals::Vector = []
end #struct
So iterating through a sorted BAM should return a set like
[
Pileup("chr1", 1),
Pileup("chr1", 2),
Pileup("chr1", 3),
Pileup("chr2", 1),
Pileup("chr2", 2),
]
from xam.jl.
I think part of the ideal solution is to further develop GenomicFeatures.jl/tree/pu/v3.0.0
's GenomicIntervalCollection
and its overlap iterators. Current development of the overlap iterators work with mixed subtypes of AbstractGenomicInterval
, but this restriction could be removed to allow GenomicFeatures
's iterators and collections to work directly with XAM records.
from xam.jl.
I think it would be good to have something like this but it would be even better if it could be made generic enough to accodate things like grouping reads pairs into fragments, e.g if you have to overlapping reads :
R1 : ATTGCGC-----
R2 : ---GTGCATTC
A read-wise pileup at position 5 would return [C = 1 , T = 1] while a fragment-wise at position 4 would return [G = 1] while at position 5 [N=1] or [C = 1 if quality of R1 > quality of R2].
Same things for groups of PCR duplicates, possibly using UMIs.
For this I think you need 1) a (possibly) user-defined grouping criteria ("take R1 and R2 together"), and 2) a (possibly) user-defined reducing/consensus method to compute the pileup number for the grouped reads. Then the interface would like something like this :
Pileup(FragmentWise(), ReduceWithQuality(minimum_phred=5), "chr1", 1)
That said it might be difficult to make it generic and efficient.
from xam.jl.
That would be a great addition. Here's how I count reference and alternate allele it at the moment (there may be better ways, i'm a beginner in Julia):
function getBaseRecord(record, pos)
i = pos - BAM.position(record)
BAM.sequence(record)[i+1]
end
function getBase(reader, chrom, pos)
[getBaseRecord(r, pos) for r in eachoverlap(reader, chrom,pos:pos)]
end
function countSNV(ref, alt)
ref = length(filter(x -> x == ref, bp))
alt = length(filter(x -> x == alt, bp))
total = length(bp)
(ref, alt, total)
end
Usage:
reader = open(BAM.Reader, bam, index=bai)
bp = getBase(reader,chrom, pos)
close(reader)
(ref, alt, total) = countSNV(convert(DNA, "A"), convert(DNA, "G"))
println("ref=$ref, alt=$alt over $total")
from xam.jl.
Related Issues (20)
- Migrate from BGZFStreams to CodecBGZF HOT 2
- BoundsError in eachoverlap() HOT 14
- Browsing the documentation HOT 1
- Broken with Automa 0.8.1 HOT 5
- TagBot trigger issue HOT 5
- can you add some functions to BAM.Reader? HOT 2
- Position of unmapped mate
- Install ERROR: Unsatisfiable requirements detected for package BioSequences HOT 2
- Make `seqname` consistent with BioJulia ecosystem
- how to use @threads to use multithreads when reading BAM file HOT 2
- Feature Request: Conversion from `BioAlignments` to `SAM.Record`
- Finding a map between observed bases and positions in the reference genome HOT 4
- eachoverlap running past end of stream
- Add `index!(reader::Reader, path)`
- Switch to Oneflow HOT 3
- Correct way to add records to an array HOT 1
- Add eachoverlap(reader::Reader, refname::String)
- while loop got eof error HOT 1
- [WIP] convert SAM to BAM HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xam.jl.