Comments (21)
The error message should be a little more clear. It would be easy to get it to show us what read didn't align.
My guess is that some of the alignments aren't against chr17. This might lead to a situation in which the graph that we try to subject the read against is empty.
Would you please try to run the same thing, but use the path list option to specify all chromosomes in the reference?
-i, --into-paths FILE surject into path names listed in FILE (one per line)
You would just do ( seq 22; echo X; echo Y ) >h37.paths
and then surject with -i h37.paths
.
If this doesn't fail then there is no need for more complex test cases.
from vg.
Hmm, seemingly no change:
$ cat h37.paths
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
Y
$ time vg/vg surject -d wg.index.k27e11 -t 32 -i h37.paths -b aln.gam > aln.surj.bam
vg: alignment.cpp:462: std::string vg::cigar_against_path(const vg::Alignment&): Assertion `alignment.has_path()' failed.
Aborted
real 0m12.542s
user 0m21.569s
sys 0m2.418s
from vg.
Probably a vg index --dump -d wg.index.k27e11
might be required at this stage to inspect that the database is okay.
from vg.
Dumping the db will take a very long time! Thankfully, it looks like a
different kind of problem.
Mike, try running with a single thread and debugging. Log the stderr to a
file. This should show us the last read before failure. If you send that to
me I can try to reproduce it on my end.
On Jun 14, 2015 11:26 PM, "Paul Grosu" [email protected] wrote:
Probably a vg index --dump -d wg.index.k27e11 might be required at this
stage to inspect that the database is okay.—
Reply to this email directly or view it on GitHub
#46 (comment).
from vg.
Agreed that it would take a long time, but isn't debugging only enabled for vg map
via --debug
?
from vg.
I think you're right. I completely forgot about that. So I'll need to
propagate that up to the CLI.
So either way I need to patch before Mike runs again.
On Jun 14, 2015 11:39 PM, "Paul Grosu" [email protected] wrote:
Agreed that it would take a long time, but isn't debugging only enabled
for vg map via --debug?—
Reply to this email directly or view it on GitHub
#46 (comment).
from vg.
No problemo - probably it might help to have a separate program that implements some routine database inspection features for troubleshooting purposes.
from vg.
Probably there is something in rocksdb to help with that. If you add
features like a command line index inspector and your patches pass the
build I will pull them in.
On Jun 14, 2015 11:52 PM, "Paul Grosu" [email protected] wrote:
No problemo - probably it might help to have a separate program that
implements some routine database inspection features for troubleshooting
purposes.—
Reply to this email directly or view it on GitHub
#46 (comment).
from vg.
Sounds good - let me give it a shot, since I haven't seen one out there in the wild for rocksdb as a cli.
from vg.
Probably the easiest thing to first check is the expected size, which should be around 90 GB.
Mike, can you run the following command, and tell us what you get as an output:
du -h wg.index.k27e11
Thanks,
~p
from vg.
He was able to map all these reads so I doubt it is a problem with the db
itself.
On Jun 15, 2015 12:32 AM, "Paul Grosu" [email protected] wrote:
Probably the easiest thing to first check is the expected size, which
should be around 90 GB.Mike, can you run the following command, and tell us what you get as an
output:du -h wg.index.k27e11
Thanks,
~p—
Reply to this email directly or view it on GitHub
#46 (comment).
from vg.
Thinking...interesting so he did this as a test to see if he gets back the same thing?
from vg.
Sorry I'm slow to look at these suggestions- on vacation currently but will get to it.
from vg.
No worries Mike - hope you have a wonderful vacation!
Erik - Well...I guess I didn't look hard enough. There are a couple tools in the wild after all, and below is the link to one of them:
https://github.com/facebook/rocksdb/wiki/Ldb-Tool
In our case, I performed the following to get the compiled version from within the vg
directory, and then copied the compiled version into my local bin
directory - there are also a whole bunch of other tools (i.e. cache_bench
, db_bench
, db_sanity_test
, db_repl_stress
, db_stress
, memtablerep_bench
, and table_reader_bench
) that get built as well, which I will have to look deeper at:
$ cd vg
$ cd rocksdb
$ make
$ cp ldb ~/bin
$ cp sst_dump ~/bin
Then one can run the commands like this:
$ ldb --db=x.vg.index scan | less
$ sst_dump --file=x.vg.index --command=scan | less
Though the output understandably has lots of structures like this:
from [] to []
Process x.vg.index/000017.sst
Sst file format: block-based
' =>^A^P^B
' =>^A^P^C
' =>
^HCAAATAAG^X^A
' =>
^B^H^A
' =>^B^P^D
' =>^B^P^E
' =>
^AG^X^B
' =>
^B^H^B
...
So I will need to look a little deeper to extract out the structures within the database as well - lots of fun stuff :)
~p
from vg.
surject appears to have succeeded following a22af56. I noticed one odd thing in the resulting BAM file, namely the quality score encoding:
$ samtools view aln.surn.bam
C2KC2ACXX_1:6:1101:3304:0/1 6 * 11075625 0 1S99M * 11075625
0 TGGGTTGATGCCATGGAAAGGGGCAGTAACTTCCTGATGTTACCATGGCAACAGTAAACTAACATGGCACACTGGTGTCTAATG
GGGGAGGTGCTTCTGC <84><84><84><88><88><88><88><88><88><88><88><88><88><8B><8B><8B><8B><8B><8B><8B>
<8B><8B><8B><88><8B><8B><84><88><84><88><8B><8B><8B><8B><8B><8B><88><88><8B><8B><8B><88><8B><88><88>
<88><88><88><84><88><8B><8B><8B><88><88><84><88><88><8B><8B><88><88><88><88><88><88><88><88><84><88>
<88><88><88><88><88><88><84><84><84><88><88><84><88><88><88><88><88><88><84><88><88>~<84><84><84>
<84><84><88><88><88>
C2KC2ACXX_1:6:1101:3573:0/1 22 * 11077961 0 100M * 11077961
0 AGCAGCAGTGTTTCTGAACAGCTTCAGGAAGAGCTTGCCACTTTCAGGCTCTCACAAATGGAGAGACTTCTTATTAATCTCTTT
CTCTCCACTGCAGGCA <84><84><84><88><88><88><88><88><88><88><88><88><88><88><88><88><88><88><88><88>
<88><88><84><88><88><88><88><88><8B><8B><88><8B><88><8B><88><88><88><88><88><8B><8B>~<88><88><8B>
<88><88><88><88><88><88><88><8B><8B><8B><8B><8B><88><8B><8B>~<88><84><84><88><88><88><8B><8B><8B>
<8B><88><8B><84><88><88><88><88><88><88>~<88><88><88>~<84><88><88><84><88><84><84><84><84><84><84>
<84><84><84><84>
C2KC2ACXX_1:6:1101:3928:0/1 22 * 11030553 0 1S99M * 11030553
0 GGGTAGTCTGAAAGAGCTTGTTCCTCCCCGCCTCTCTCTCTCTCTTGCTCTCTCTCTTGCCATGTAACATTCAGGCTCCTCCTT
CACCTTCCAACATGGT <84><84><84><88><88><88><88><88><88><88><88><88><88><8B>~<88><88><88><8B><88>
<8B><8B><8B><88><88><8B><8B><88><8B><8B><8B><8B><8B><88><8B><8B><8B><8B><8B><8B><8B><8B><8B><8B><88>
<8B><8B><88><8B><8B><8B><8B><8B><8B><8B><84><88><88><8B><8B><88><88><8B><88><88><88><88><88><88><88>
<84><84><88><84>riiiriyririirrr<84>iiriririyi
This is what appeared on the terminal so I think <88>, <8B> are hex characters...
Out of curiosity what is POS being set to when RNAME is *?
from vg.
I found this problem the other day too. @ekg put in this fix (e017b30), but requires remaking the gam file.
from vg.
Thanks, I'm running that now!
from vg.
For the RNAME=*
problem, if you are seeing that for all the reads, then you might want to check that the path names file (-i
option) is correct. Looks like it doesn't throw if the file doesn't exist. Had that problem too.
from vg.
I'd provided -p 17
to surject only onto one chromosome (although it looks like vg
still does some processing on all the alignments as this was not appreciably faster or smaller output than all the chromosomes). So I think the setting of RNAME
is appropriate for the way I'm running it -- but that POS
is set to some seemingly arbitrary value for all of them seems curious. Could it be some in-memory data structure is being reused, and vg
is outputting a stale/dirty value in that field?
from vg.
Perhaps. Looks like when I have an unmapped read, POS is set to the value of the last mapped read (in the case I just looked at anyway).
from vg.
Thanks I'll file a separate issue about that!
from vg.
Related Issues (20)
- Low-complexity pruning is slow in surjection
- Turn off surjection anchor limit by default
- Bump necessary magic numbers for long-read index types HOT 1
- vg autoindex --workflow mpmap HOT 1
- GAF nodes do not correspond to VCF nodes HOT 3
- Fastq analysis in euka HOT 2
- Protobuf errors mapping HOT 4
- vg augment crashes in combination with giraffe .gam outputs HOT 5
- when i do the code :vg stats -a var1.gam > var1.stats ,show me a bug HOT 2
- vg surject and pack fail on GraphAligner produced .gam HOT 4
- About the usage of vg deconstruct HOT 5
- Right method to annotate genes on pangenome graph HOT 1
- vg call report missing allele HOT 2
- vg call error HOT 2
- How do different methods affect precision? HOT 4
- Giraffe alignment is very slow and produces warning[vg::Watchdog] messages unless rescue is disabled HOT 12
- Giraffe needs to better deal with systems where multiple processes opening the distance index causes slowdowns
- Any option like BWA MEM '-C'?
- vg convert failes while converting gbz to xg with signal 6 error HOT 5
- Release vg v1.60.0
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vg.