Giter Site home page Giter Site logo

BoundsError in eachoverlap() about xam.jl HOT 14 CLOSED

biojulia avatar biojulia commented on June 15, 2024
BoundsError in eachoverlap()

from xam.jl.

Comments (14)

Marlin-Na avatar Marlin-Na commented on June 15, 2024 1

I think my issue was that if a read is unmapped, its rightposition(record) will become position(record) - 1, which is smaller than the leftposition. I think the check here https://github.com/BioJulia/XAM.jl/blob/develop/src/bam/overlap.jl#L87 should account for this special case. Probably change it to the following? It worked for me.

    if rid < interval[1] || (rid == interval[1] && rightposition(record) < first(interval[2]) &&  position(record) < first(interval[2]))

and

    if rid > interval[1] || (rid == interval[1] && position(record) > last(interval[2]) && rightposition(record) > last(interval[2]))

from xam.jl.

lysogeny avatar lysogeny commented on June 15, 2024 1

I believe that this is actually an issue of BGZFStreams.jl which I have tried to understand here: BioJulia/BGZFStreams.jl#22.
As the project's maintainers don't really seem to be responding, I am using my own fork: https://github.com/lysogeny/BGZFStreams.jl

To those of you with this problem does the proposed fix from there solve your issues?

from xam.jl.

zhixingfeng avatar zhixingfeng commented on June 15, 2024

Hi Yuifu,
Got the same error on a different dataset. Have you got any feedback?

from xam.jl.

yuifu avatar yuifu commented on June 15, 2024

@zhixingfeng Unfortunately, I've received no feedback so far.

from xam.jl.

kescobo avatar kescobo commented on June 15, 2024

Unfortunately, this package has one primary author, and that person hasn't been around much. If you have any interest in working on this, you're more than welcome to. I don't work with these files, so I'm afraid I can't be of much help.

from xam.jl.

CiaranOMara avatar CiaranOMara commented on June 15, 2024

@yuifu and @zhixingfeng, does the hotfix/issue-31 branch work with your datasets?

julia> add XAM#hotfix/issue-31

from xam.jl.

zhixingfeng avatar zhixingfeng commented on June 15, 2024

@CiaranOMara Thank you for your prompt feedback. The hotfix/issue-31 branch failed on my dataset as well as the test dataset of the updated XAM from branch hotfix/issue-31 with the same error message "BoundsError: attempt to access 1-element Array{BGZFStreams.Block,1} at index [2]". I tested on CentOS 7.6 on a AMD64 machine with Julia 1.5.3. Please find the error messages below

(v1.5) pkg> add XAM#hotfix/issue-31
Updating git-repo https://github.com/BioJulia/XAM.jl.git
Resolving package versions...
Updating /sc/arion/projects/LymeMIND2/.julia/packages/environments/v1.5/Project.toml
[d759349c] + XAM v0.2.6 https://github.com/BioJulia/XAM.jl.git#hotfix/issue-31
Updating /sc/arion/projects/LymeMIND2/.julia/packages/environments/v1.5/Manifest.toml
[67c07d97] + Automa v0.8.0
[28d598bf] + BGZFStreams v0.3.0
[00701ae9] + BioAlignments v2.0.0
[37cfa864] + BioCore v2.0.5
[47718e42] + BioGenerics v0.1.0
[7e6ae17a] + BioSequences v2.0.5
[3c28c6f8] + BioSymbols v4.0.4
[e1450e63] + BufferedStreams v1.0.0
[944b1d66] + CodecZlib v0.7.0
[861a8166] + Combinatorics v1.0.2
[899a7d2d] + GenomicFeatures v2.0.4
[1cb3b9ac] + IndexableBitVectors v1.0.0
[4ffb77ac] + Indexes v0.1.1
[524e6230] + IntervalTrees v1.0.0
[860ef19b] + StableRNGs v0.1.2
[3bb67fe8] + TranscodingStreams v0.9.5
[7200193e] + Twiddle v1.1.2
[d759349c] + XAM v0.2.6 https://github.com/BioJulia/XAM.jl.git#hotfix/issue-31
[ddb6d928] + YAML v0.4.6
[9abbd945] + Profile

(v1.5) pkg> test XAM
Testing XAM
Status /tmp/jl_8Opd5y/Project.toml
[67c07d97] Automa v0.8.0
[28d598bf] BGZFStreams v0.3.0
[00701ae9] BioAlignments v2.0.0
[47718e42] BioGenerics v0.1.0
[7e6ae17a] BioSequences v2.0.5
[3372ea36] FormatSpecimens v1.0.1
[899a7d2d] GenomicFeatures v2.0.4
[4ffb77ac] Indexes v0.1.1
[3bb67fe8] TranscodingStreams v0.9.5
[d759349c] XAM v0.2.6 https://github.com/BioJulia/XAM.jl.git#hotfix/issue-31
[de0858da] Printf
[8dfed614] Test
Status /tmp/jl_8Opd5y/Manifest.toml
[56f22d72] Artifacts v1.3.0
[67c07d97] Automa v0.8.0
[28d598bf] BGZFStreams v0.3.0
[00701ae9] BioAlignments v2.0.0
[37cfa864] BioCore v2.0.5
[47718e42] BioGenerics v0.1.0
[7e6ae17a] BioSequences v2.0.5
[3c28c6f8] BioSymbols v4.0.4
[e1450e63] BufferedStreams v1.0.0
[944b1d66] CodecZlib v0.7.0
[861a8166] Combinatorics v1.0.2
[34da2185] Compat v3.25.0
[864edb3b] DataStructures v0.18.9
[3372ea36] FormatSpecimens v1.0.1
[899a7d2d] GenomicFeatures v2.0.4
[1cb3b9ac] IndexableBitVectors v1.0.0
[4ffb77ac] Indexes v0.1.1
[524e6230] IntervalTrees v1.0.0
[692b3bcd] JLLWrappers v1.2.0
[bac558e1] OrderedCollections v1.4.0
[860ef19b] StableRNGs v0.1.2
[3bb67fe8] TranscodingStreams v0.9.5
[7200193e] Twiddle v1.1.2
[d759349c] XAM v0.2.6 https://github.com/BioJulia/XAM.jl.git#hotfix/issue-31
[ddb6d928] YAML v0.4.6
[83775a58] Zlib_jll v1.2.11+18
[2a0f44e3] Base64
[ade2ca70] Dates
[8bb1440f] DelimitedFiles
[8ba89e20] Distributed
[b77e0a4c] InteractiveUtils
[76f85450] LibGit2
[8f399da3] Libdl
[37e2e46d] LinearAlgebra
[56ddb016] Logging
[d6f4376e] Markdown
[a63ad114] Mmap
[44cfe95a] Pkg
[de0858da] Printf
[9abbd945] Profile
[3fa0cd96] REPL
[9a3f8284] Random
[ea8e919c] SHA
[9e88b42a] Serialization
[1a1011a3] SharedArrays
[6462fe0b] Sockets
[2f01184e] SparseArrays
[10745b16] Statistics
[8dfed614] Test
[cf7118a7] UUIDs
[4ec0a83e] Unicode
Test Summary: | Pass Total
SAM | 122 122
BAM: Error During Test at /hpc/users/fengz03/LymeMind2/.julia/packages/packages/XAM/hu7aW/test/test_bam.jl:286
Test threw exception
Expression: count((r->begin
true
end), eachoverlap(reader, "JH584304.1", 51000:51715)) == 1
BoundsError: attempt to access 1-element Array{BGZFStreams.Block,1} at index [2]
Stacktrace:
[1] getindex at ./array.jl:809 [inlined]
[2] virtualoffset(::BGZFStream{IOStream}) at /hpc/users/fengz03/LymeMind2/.julia/packages/packages/BGZFStreams/qApsr/src/bgzfstream.jl:156
[3] iterate(::XAM.BAM.OverlapIterator{IOStream}, ::XAM.BAM.OverlapIteratorState) at /hpc/users/fengz03/LymeMind2/.julia/packages/packages/XAM/hu7aW/src/bam/overlap.jl:65
[4] _simple_count at ./reduce.jl:869 [inlined]
[5] count at ./reduce.jl:864 [inlined]
[6] (::var"#9#21")(::XAM.BAM.Reader{IOStream}) at /hpc/users/fengz03/LymeMind2/.julia/packages/packages/XAM/hu7aW/test/test_bam.jl:286
[7] open(::var"#9#21", ::Type{XAM.BAM.Reader}, ::String; kwargs::Base.Iterators.Pairs{Symbol,String,Tuple{Symbol},NamedTuple{(:index,),Tuple{String}}}) at /hpc/users/fengz03/LymeMind2/.julia/packages/packages/BioGenerics/cCuGr/src/IO.jl:48
[8] top-level scope at /hpc/users/fengz03/LymeMind2/.julia/packages/packages/XAM/hu7aW/test/test_bam.jl:283
[9] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
[10] top-level scope at /hpc/users/fengz03/LymeMind2/.julia/packages/packages/XAM/hu7aW/test/test_bam.jl:2

Test Summary: | Pass Error Total
BAM | 173 1 174
AuxData | 11 11
Record | 3 3
Reader | 52 52
Read long CIGARs | 3 3
Round trip | 28 28
In-Place-Reading Pattern | 7 7
Random access | 68 68
ERROR: LoadError: LoadError: Some tests did not pass: 173 passed, 0 failed, 1 errored, 0 broken.
in expression starting at /hpc/users/fengz03/LymeMind2/.julia/packages/packages/XAM/hu7aW/test/test_bam.jl:1
in expression starting at /hpc/users/fengz03/LymeMind2/.julia/packages/packages/XAM/hu7aW/test/runtests.jl:27
ERROR: Package XAM errored during testing

from xam.jl.

zhixingfeng avatar zhixingfeng commented on June 15, 2024

@CiaranOMara. As @yuifu pointed out, the error might be caused by inquiring the last record in the bam file. Tried to quit eachoverlap loop by checking eof(), it worked to avoid the BoundError bug on my data and @yuifu 's data. See the code below. Please note that the code below only worked for branch XAM#hotfix/issue-31, and failed for the master branch.

using GenomicFeatures
using XAM
using BioAlignments

function test_bam_eachoverlap(path_bam, chrom, leftpos, rightpos)
        bam_reader = open(BAM.Reader, path_bam, index = path_bam * ".bai")
        count = 0
	for record in eachoverlap(bam_reader, chrom, leftpos:rightpos)
		count += 1
                eof(bam_reader) && break
	end
	close(bam_reader)
	return count
end

test_bam_eachoverlap("SRR7993829_1.100K.forward.bam", "JH584304.1", 51000, 51715)

from xam.jl.

CiaranOMara avatar CiaranOMara commented on June 15, 2024

With this dataset, the VirtualOffset appears to jump to a position before the starting offset (chunk.start). This means that eof could be reached and then the next chunk could move the offset to an earlier position. I don't think there is a requirement for chunks to be ordered, is there?

from xam.jl.

CiaranOMara avatar CiaranOMara commented on June 15, 2024

Update: I had another chance to continue bisecting the code to track down the error. It seems unlikely that seek does not work in BGZFStreams or @jakobnissen's new CodecBGZF. My current thinking is that the error lies in the offsets produced by the overlapchunks method in Indexes.jl. My reasoning is that each offset should coincide with the beginning of a chunk, from which at least one byte should be available for reading.

from xam.jl.

Marlin-Na avatar Marlin-Na commented on June 15, 2024

Hi @CiaranOMara, I encountered this issue as well and I put together a BAM file that has two reads with the same position (one is the other's unmapped mate) for reproducing the issue. I find that the eof check fix (e.g. with feature/CodecBGZF-issue-31 branch) will actually skip one of the reads. See https://gist.github.com/Marlin-Na/9cb7767dfc87b007500a6fff4a8f5b9a

from xam.jl.

CiaranOMara avatar CiaranOMara commented on June 15, 2024

Update:
This issue appears thread-dependent. Certain multiples of threads seem to work for different datasets.

Current workaround:
Run XAM with 1 thread.

from xam.jl.

CiaranOMara avatar CiaranOMara commented on June 15, 2024

@yuifu, @lysogeny, @zhixingfeng, and @BioJulia/admin, I believe I've tracked down the cause of this issue.

If you could, please review the current hotfix/issue-31 branch and provide feedback.

julia> add XAM#hotfix/issue-31

As far as I'm aware, the fix addresses all of the relevant symptoms mentioned in this thread. If there is agreement, BioJulia/BGZFStreams.jl#27 is ready for merging upstream.

from xam.jl.

CiaranOMara avatar CiaranOMara commented on June 15, 2024

Should be fixed with #50.

from xam.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.