Giter Site home page Giter Site logo

Comments (21)

ekg avatar ekg commented on August 19, 2024

The error message should be a little more clear. It would be easy to get it to show us what read didn't align.

My guess is that some of the alignments aren't against chr17. This might lead to a situation in which the graph that we try to subject the read against is empty.

Would you please try to run the same thing, but use the path list option to specify all chromosomes in the reference?

-i, --into-paths FILE   surject into path names listed in FILE (one per line)

You would just do ( seq 22; echo X; echo Y ) >h37.paths and then surject with -i h37.paths.

If this doesn't fail then there is no need for more complex test cases.

from vg.

mlin avatar mlin commented on August 19, 2024

Hmm, seemingly no change:

$ cat h37.paths
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
Y
$ time vg/vg surject -d wg.index.k27e11 -t 32 -i h37.paths -b aln.gam > aln.surj.bam
vg: alignment.cpp:462: std::string vg::cigar_against_path(const vg::Alignment&): Assertion `alignment.has_path()' failed.
Aborted

real    0m12.542s
user    0m21.569s
sys     0m2.418s

from vg.

pgrosu avatar pgrosu commented on August 19, 2024

Probably a vg index --dump -d wg.index.k27e11 might be required at this stage to inspect that the database is okay.

from vg.

ekg avatar ekg commented on August 19, 2024

Dumping the db will take a very long time! Thankfully, it looks like a
different kind of problem.

Mike, try running with a single thread and debugging. Log the stderr to a
file. This should show us the last read before failure. If you send that to
me I can try to reproduce it on my end.
On Jun 14, 2015 11:26 PM, "Paul Grosu" [email protected] wrote:

Probably a vg index --dump -d wg.index.k27e11 might be required at this
stage to inspect that the database is okay.


Reply to this email directly or view it on GitHub
#46 (comment).

from vg.

pgrosu avatar pgrosu commented on August 19, 2024

Agreed that it would take a long time, but isn't debugging only enabled for vg map via --debug?

from vg.

ekg avatar ekg commented on August 19, 2024

I think you're right. I completely forgot about that. So I'll need to
propagate that up to the CLI.

So either way I need to patch before Mike runs again.
On Jun 14, 2015 11:39 PM, "Paul Grosu" [email protected] wrote:

Agreed that it would take a long time, but isn't debugging only enabled
for vg map via --debug?


Reply to this email directly or view it on GitHub
#46 (comment).

from vg.

pgrosu avatar pgrosu commented on August 19, 2024

No problemo - probably it might help to have a separate program that implements some routine database inspection features for troubleshooting purposes.

from vg.

ekg avatar ekg commented on August 19, 2024

Probably there is something in rocksdb to help with that. If you add
features like a command line index inspector and your patches pass the
build I will pull them in.
On Jun 14, 2015 11:52 PM, "Paul Grosu" [email protected] wrote:

No problemo - probably it might help to have a separate program that
implements some routine database inspection features for troubleshooting
purposes.


Reply to this email directly or view it on GitHub
#46 (comment).

from vg.

pgrosu avatar pgrosu commented on August 19, 2024

Sounds good - let me give it a shot, since I haven't seen one out there in the wild for rocksdb as a cli.

from vg.

pgrosu avatar pgrosu commented on August 19, 2024

Probably the easiest thing to first check is the expected size, which should be around 90 GB.

Mike, can you run the following command, and tell us what you get as an output:

du -h wg.index.k27e11

Thanks,
~p

from vg.

ekg avatar ekg commented on August 19, 2024

He was able to map all these reads so I doubt it is a problem with the db
itself.
On Jun 15, 2015 12:32 AM, "Paul Grosu" [email protected] wrote:

Probably the easiest thing to first check is the expected size, which
should be around 90 GB.

Mike, can you run the following command, and tell us what you get as an
output:

du -h wg.index.k27e11

Thanks,
~p


Reply to this email directly or view it on GitHub
#46 (comment).

from vg.

pgrosu avatar pgrosu commented on August 19, 2024

Thinking...interesting so he did this as a test to see if he gets back the same thing?

from vg.

mlin avatar mlin commented on August 19, 2024

Sorry I'm slow to look at these suggestions- on vacation currently but will get to it.

from vg.

pgrosu avatar pgrosu commented on August 19, 2024

No worries Mike - hope you have a wonderful vacation!

Erik - Well...I guess I didn't look hard enough. There are a couple tools in the wild after all, and below is the link to one of them:

https://github.com/facebook/rocksdb/wiki/Ldb-Tool

In our case, I performed the following to get the compiled version from within the vg directory, and then copied the compiled version into my local bin directory - there are also a whole bunch of other tools (i.e. cache_bench, db_bench, db_sanity_test, db_repl_stress, db_stress, memtablerep_bench, and table_reader_bench) that get built as well, which I will have to look deeper at:

$ cd vg
$ cd rocksdb
$ make
$ cp ldb ~/bin
$ cp sst_dump ~/bin

Then one can run the commands like this:

$ ldb --db=x.vg.index scan | less
$ sst_dump --file=x.vg.index --command=scan | less

Though the output understandably has lots of structures like this:

from [] to []
Process x.vg.index/000017.sst
Sst file format: block-based
' =>^A^P^B
' =>^A^P^C
' =>
^HCAAATAAG^X^A
' =>
^B^H^A
' =>^B^P^D
' =>^B^P^E
' =>
^AG^X^B
' =>
^B^H^B
...

So I will need to look a little deeper to extract out the structures within the database as well - lots of fun stuff :)

~p

from vg.

mlin avatar mlin commented on August 19, 2024

surject appears to have succeeded following a22af56. I noticed one odd thing in the resulting BAM file, namely the quality score encoding:

$ samtools view aln.surn.bam
C2KC2ACXX_1:6:1101:3304:0/1     6       *       11075625        0       1S99M   *       11075625
        0       TGGGTTGATGCCATGGAAAGGGGCAGTAACTTCCTGATGTTACCATGGCAACAGTAAACTAACATGGCACACTGGTGTCTAATG
GGGGAGGTGCTTCTGC    <84><84><84><88><88><88><88><88><88><88><88><88><88><8B><8B><8B><8B><8B><8B><8B>
<8B><8B><8B><88><8B><8B><84><88><84><88><8B><8B><8B><8B><8B><8B><88><88><8B><8B><8B><88><8B><88><88>
<88><88><88><84><88><8B><8B><8B><88><88><84><88><88><8B><8B><88><88><88><88><88><88><88><88><84><88>
<88><88><88><88><88><88><84><84><84><88><88><84><88><88><88><88><88><88><84><88><88>~<84><84><84>
<84><84><88><88><88>
C2KC2ACXX_1:6:1101:3573:0/1     22      *       11077961        0       100M    *       11077961
        0       AGCAGCAGTGTTTCTGAACAGCTTCAGGAAGAGCTTGCCACTTTCAGGCTCTCACAAATGGAGAGACTTCTTATTAATCTCTTT
CTCTCCACTGCAGGCA    <84><84><84><88><88><88><88><88><88><88><88><88><88><88><88><88><88><88><88><88>
<88><88><84><88><88><88><88><88><8B><8B><88><8B><88><8B><88><88><88><88><88><8B><8B>~<88><88><8B>
<88><88><88><88><88><88><88><8B><8B><8B><8B><8B><88><8B><8B>~<88><84><84><88><88><88><8B><8B><8B>
<8B><88><8B><84><88><88><88><88><88><88>~<88><88><88>~<84><88><88><84><88><84><84><84><84><84><84>
<84><84><84><84>
C2KC2ACXX_1:6:1101:3928:0/1     22      *       11030553        0       1S99M   *       11030553
        0       GGGTAGTCTGAAAGAGCTTGTTCCTCCCCGCCTCTCTCTCTCTCTTGCTCTCTCTCTTGCCATGTAACATTCAGGCTCCTCCTT
CACCTTCCAACATGGT    <84><84><84><88><88><88><88><88><88><88><88><88><88><8B>~<88><88><88><8B><88>
<8B><8B><8B><88><88><8B><8B><88><8B><8B><8B><8B><8B><88><8B><8B><8B><8B><8B><8B><8B><8B><8B><8B><88>
<8B><8B><88><8B><8B><8B><8B><8B><8B><8B><84><88><88><8B><8B><88><88><8B><88><88><88><88><88><88><88>
<84><84><88><84>riiiriyririirrr<84>iiriririyi

This is what appeared on the terminal so I think <88>, <8B> are hex characters...

Out of curiosity what is POS being set to when RNAME is *?

from vg.

mcshane avatar mcshane commented on August 19, 2024

I found this problem the other day too. @ekg put in this fix (e017b30), but requires remaking the gam file.

from vg.

mlin avatar mlin commented on August 19, 2024

Thanks, I'm running that now!

from vg.

mcshane avatar mcshane commented on August 19, 2024

For the RNAME=* problem, if you are seeing that for all the reads, then you might want to check that the path names file (-i option) is correct. Looks like it doesn't throw if the file doesn't exist. Had that problem too.

from vg.

mlin avatar mlin commented on August 19, 2024

I'd provided -p 17 to surject only onto one chromosome (although it looks like vg still does some processing on all the alignments as this was not appreciably faster or smaller output than all the chromosomes). So I think the setting of RNAME is appropriate for the way I'm running it -- but that POS is set to some seemingly arbitrary value for all of them seems curious. Could it be some in-memory data structure is being reused, and vg is outputting a stale/dirty value in that field?

from vg.

mcshane avatar mcshane commented on August 19, 2024

Perhaps. Looks like when I have an unmapped read, POS is set to the value of the last mapped read (in the case I just looked at anyway).

from vg.

mlin avatar mlin commented on August 19, 2024

Thanks I'll file a separate issue about that!

from vg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.