Comments (14)
I think my issue was that if a read is unmapped, its rightposition(record)
will become position(record) - 1
, which is smaller than the leftposition. I think the check here https://github.com/BioJulia/XAM.jl/blob/develop/src/bam/overlap.jl#L87 should account for this special case. Probably change it to the following? It worked for me.
if rid < interval[1] || (rid == interval[1] && rightposition(record) < first(interval[2]) && position(record) < first(interval[2]))
and
if rid > interval[1] || (rid == interval[1] && position(record) > last(interval[2]) && rightposition(record) > last(interval[2]))
from xam.jl.
I believe that this is actually an issue of BGZFStreams.jl
which I have tried to understand here: BioJulia/BGZFStreams.jl#22.
As the project's maintainers don't really seem to be responding, I am using my own fork: https://github.com/lysogeny/BGZFStreams.jl
To those of you with this problem does the proposed fix from there solve your issues?
from xam.jl.
Hi Yuifu,
Got the same error on a different dataset. Have you got any feedback?
from xam.jl.
@zhixingfeng Unfortunately, I've received no feedback so far.
from xam.jl.
Unfortunately, this package has one primary author, and that person hasn't been around much. If you have any interest in working on this, you're more than welcome to. I don't work with these files, so I'm afraid I can't be of much help.
from xam.jl.
@yuifu and @zhixingfeng, does the hotfix/issue-31 branch work with your datasets?
julia> add XAM#hotfix/issue-31
from xam.jl.
@CiaranOMara Thank you for your prompt feedback. The hotfix/issue-31 branch failed on my dataset as well as the test dataset of the updated XAM from branch hotfix/issue-31 with the same error message "BoundsError: attempt to access 1-element Array{BGZFStreams.Block,1} at index [2]". I tested on CentOS 7.6 on a AMD64 machine with Julia 1.5.3. Please find the error messages below
(v1.5) pkg> add XAM#hotfix/issue-31
Updating git-repo https://github.com/BioJulia/XAM.jl.git
Resolving package versions...
Updating /sc/arion/projects/LymeMIND2/.julia/packages/environments/v1.5/Project.toml
[d759349c] + XAM v0.2.6 https://github.com/BioJulia/XAM.jl.git#hotfix/issue-31
Updating /sc/arion/projects/LymeMIND2/.julia/packages/environments/v1.5/Manifest.toml
[67c07d97] + Automa v0.8.0
[28d598bf] + BGZFStreams v0.3.0
[00701ae9] + BioAlignments v2.0.0
[37cfa864] + BioCore v2.0.5
[47718e42] + BioGenerics v0.1.0
[7e6ae17a] + BioSequences v2.0.5
[3c28c6f8] + BioSymbols v4.0.4
[e1450e63] + BufferedStreams v1.0.0
[944b1d66] + CodecZlib v0.7.0
[861a8166] + Combinatorics v1.0.2
[899a7d2d] + GenomicFeatures v2.0.4
[1cb3b9ac] + IndexableBitVectors v1.0.0
[4ffb77ac] + Indexes v0.1.1
[524e6230] + IntervalTrees v1.0.0
[860ef19b] + StableRNGs v0.1.2
[3bb67fe8] + TranscodingStreams v0.9.5
[7200193e] + Twiddle v1.1.2
[d759349c] + XAM v0.2.6 https://github.com/BioJulia/XAM.jl.git#hotfix/issue-31
[ddb6d928] + YAML v0.4.6
[9abbd945] + Profile
(v1.5) pkg> test XAM
Testing XAM
Status /tmp/jl_8Opd5y/Project.toml
[67c07d97] Automa v0.8.0
[28d598bf] BGZFStreams v0.3.0
[00701ae9] BioAlignments v2.0.0
[47718e42] BioGenerics v0.1.0
[7e6ae17a] BioSequences v2.0.5
[3372ea36] FormatSpecimens v1.0.1
[899a7d2d] GenomicFeatures v2.0.4
[4ffb77ac] Indexes v0.1.1
[3bb67fe8] TranscodingStreams v0.9.5
[d759349c] XAM v0.2.6 https://github.com/BioJulia/XAM.jl.git#hotfix/issue-31
[de0858da] Printf
[8dfed614] Test
Status /tmp/jl_8Opd5y/Manifest.toml
[56f22d72] Artifacts v1.3.0
[67c07d97] Automa v0.8.0
[28d598bf] BGZFStreams v0.3.0
[00701ae9] BioAlignments v2.0.0
[37cfa864] BioCore v2.0.5
[47718e42] BioGenerics v0.1.0
[7e6ae17a] BioSequences v2.0.5
[3c28c6f8] BioSymbols v4.0.4
[e1450e63] BufferedStreams v1.0.0
[944b1d66] CodecZlib v0.7.0
[861a8166] Combinatorics v1.0.2
[34da2185] Compat v3.25.0
[864edb3b] DataStructures v0.18.9
[3372ea36] FormatSpecimens v1.0.1
[899a7d2d] GenomicFeatures v2.0.4
[1cb3b9ac] IndexableBitVectors v1.0.0
[4ffb77ac] Indexes v0.1.1
[524e6230] IntervalTrees v1.0.0
[692b3bcd] JLLWrappers v1.2.0
[bac558e1] OrderedCollections v1.4.0
[860ef19b] StableRNGs v0.1.2
[3bb67fe8] TranscodingStreams v0.9.5
[7200193e] Twiddle v1.1.2
[d759349c] XAM v0.2.6 https://github.com/BioJulia/XAM.jl.git#hotfix/issue-31
[ddb6d928] YAML v0.4.6
[83775a58] Zlib_jll v1.2.11+18
[2a0f44e3] Base64
[ade2ca70] Dates
[8bb1440f] DelimitedFiles
[8ba89e20] Distributed
[b77e0a4c] InteractiveUtils
[76f85450] LibGit2
[8f399da3] Libdl
[37e2e46d] LinearAlgebra
[56ddb016] Logging
[d6f4376e] Markdown
[a63ad114] Mmap
[44cfe95a] Pkg
[de0858da] Printf
[9abbd945] Profile
[3fa0cd96] REPL
[9a3f8284] Random
[ea8e919c] SHA
[9e88b42a] Serialization
[1a1011a3] SharedArrays
[6462fe0b] Sockets
[2f01184e] SparseArrays
[10745b16] Statistics
[8dfed614] Test
[cf7118a7] UUIDs
[4ec0a83e] Unicode
Test Summary: | Pass Total
SAM | 122 122
BAM: Error During Test at /hpc/users/fengz03/LymeMind2/.julia/packages/packages/XAM/hu7aW/test/test_bam.jl:286
Test threw exception
Expression: count((r->begin
true
end), eachoverlap(reader, "JH584304.1", 51000:51715)) == 1
BoundsError: attempt to access 1-element Array{BGZFStreams.Block,1} at index [2]
Stacktrace:
[1] getindex at ./array.jl:809 [inlined]
[2] virtualoffset(::BGZFStream{IOStream}) at /hpc/users/fengz03/LymeMind2/.julia/packages/packages/BGZFStreams/qApsr/src/bgzfstream.jl:156
[3] iterate(::XAM.BAM.OverlapIterator{IOStream}, ::XAM.BAM.OverlapIteratorState) at /hpc/users/fengz03/LymeMind2/.julia/packages/packages/XAM/hu7aW/src/bam/overlap.jl:65
[4] _simple_count at ./reduce.jl:869 [inlined]
[5] count at ./reduce.jl:864 [inlined]
[6] (::var"#9#21")(::XAM.BAM.Reader{IOStream}) at /hpc/users/fengz03/LymeMind2/.julia/packages/packages/XAM/hu7aW/test/test_bam.jl:286
[7] open(::var"#9#21", ::Type{XAM.BAM.Reader}, ::String; kwargs::Base.Iterators.Pairs{Symbol,String,Tuple{Symbol},NamedTuple{(:index,),Tuple{String}}}) at /hpc/users/fengz03/LymeMind2/.julia/packages/packages/BioGenerics/cCuGr/src/IO.jl:48
[8] top-level scope at /hpc/users/fengz03/LymeMind2/.julia/packages/packages/XAM/hu7aW/test/test_bam.jl:283
[9] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[10] top-level scope at /hpc/users/fengz03/LymeMind2/.julia/packages/packages/XAM/hu7aW/test/test_bam.jl:2
Test Summary: | Pass Error Total
BAM | 173 1 174
AuxData | 11 11
Record | 3 3
Reader | 52 52
Read long CIGARs | 3 3
Round trip | 28 28
In-Place-Reading Pattern | 7 7
Random access | 68 68
ERROR: LoadError: LoadError: Some tests did not pass: 173 passed, 0 failed, 1 errored, 0 broken.
in expression starting at /hpc/users/fengz03/LymeMind2/.julia/packages/packages/XAM/hu7aW/test/test_bam.jl:1
in expression starting at /hpc/users/fengz03/LymeMind2/.julia/packages/packages/XAM/hu7aW/test/runtests.jl:27
ERROR: Package XAM errored during testing
from xam.jl.
@CiaranOMara. As @yuifu pointed out, the error might be caused by inquiring the last record in the bam file. Tried to quit eachoverlap loop by checking eof(), it worked to avoid the BoundError bug on my data and @yuifu 's data. See the code below. Please note that the code below only worked for branch XAM#hotfix/issue-31, and failed for the master branch.
using GenomicFeatures
using XAM
using BioAlignments
function test_bam_eachoverlap(path_bam, chrom, leftpos, rightpos)
bam_reader = open(BAM.Reader, path_bam, index = path_bam * ".bai")
count = 0
for record in eachoverlap(bam_reader, chrom, leftpos:rightpos)
count += 1
eof(bam_reader) && break
end
close(bam_reader)
return count
end
test_bam_eachoverlap("SRR7993829_1.100K.forward.bam", "JH584304.1", 51000, 51715)
from xam.jl.
With this dataset, the VirtualOffset
appears to jump to a position before the starting offset (chunk.start
). This means that eof
could be reached and then the next chunk could move the offset to an earlier position. I don't think there is a requirement for chunks to be ordered, is there?
from xam.jl.
Update: I had another chance to continue bisecting the code to track down the error. It seems unlikely that seek does not work in BGZFStreams or @jakobnissen's new CodecBGZF. My current thinking is that the error lies in the offsets produced by the overlapchunks method in Indexes.jl. My reasoning is that each offset should coincide with the beginning of a chunk, from which at least one byte should be available for reading.
from xam.jl.
Hi @CiaranOMara, I encountered this issue as well and I put together a BAM file that has two reads with the same position (one is the other's unmapped mate) for reproducing the issue. I find that the eof
check fix (e.g. with feature/CodecBGZF-issue-31
branch) will actually skip one of the reads. See https://gist.github.com/Marlin-Na/9cb7767dfc87b007500a6fff4a8f5b9a
from xam.jl.
Update:
This issue appears thread-dependent. Certain multiples of threads seem to work for different datasets.
Current workaround:
Run XAM with 1 thread.
from xam.jl.
@yuifu, @lysogeny, @zhixingfeng, and @BioJulia/admin, I believe I've tracked down the cause of this issue.
If you could, please review the current hotfix/issue-31 branch and provide feedback.
julia> add XAM#hotfix/issue-31
As far as I'm aware, the fix addresses all of the relevant symptoms mentioned in this thread. If there is agreement, BioJulia/BGZFStreams.jl#27 is ready for merging upstream.
from xam.jl.
Should be fixed with #50.
from xam.jl.
Related Issues (20)
- Migrate from BGZFStreams to CodecBGZF HOT 2
- Browsing the documentation HOT 1
- Broken with Automa 0.8.1 HOT 5
- TagBot trigger issue HOT 5
- can you add some functions to BAM.Reader? HOT 2
- Position of unmapped mate
- Install ERROR: Unsatisfiable requirements detected for package BioSequences HOT 2
- Make `seqname` consistent with BioJulia ecosystem
- how to use @threads to use multithreads when reading BAM file HOT 2
- Feature Request: Conversion from `BioAlignments` to `SAM.Record`
- Finding a map between observed bases and positions in the reference genome HOT 4
- eachoverlap running past end of stream
- Add `index!(reader::Reader, path)`
- Do you have the plan to add the pileup functions? HOT 8
- Switch to Oneflow HOT 3
- Correct way to add records to an array HOT 1
- Add eachoverlap(reader::Reader, refname::String)
- while loop got eof error HOT 1
- [WIP] convert SAM to BAM HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xam.jl.