derrickwood / kraken2 Goto Github PK
View Code? Open in Web Editor NEWThe second version of the Kraken taxonomic sequence classification system
License: MIT License
The second version of the Kraken taxonomic sequence classification system
License: MIT License
Hi. I have encountered hundreds of unmapped accessions, for examples NZ_LS483329, when building database from virus, bacteria, archaea, fungi, protozoa and human reference genomes, with est, gb, gss and wgs accession2taxid. In fact these accessions exists. Is it possible to implement extra accession mapping through NCBI eutils in lookup_accession_numbers.pl? I have tried that, and it works.
I managed to build the standard DB, and it worked flawlessy - nice work!
The rsync is a great addition to the process.
The DB hash file was ~32GB.
When i ran the first kraken2 test, it took about 30 seconds before it printed anything to the screen, and i wasn't sure if it was working or not (top showed classify running with 40% of 1 CPU only).
Could you have a --verbose option, or just print something to STDERR at the start?
To pacify premature CTRL-C people like me?
Hello,
I've been trying to generate a report with the formatting for --use-mpa-style, and my code is just giving the exact same output as when I don't use the options --report --use-mpa-style. Here is what I am inputting exactly:
kraken2 --db /projects/b1052/krakenDB/ /projects/b1052/Wells_b1042/Morgan/assembled/idba/S_AS1_contig.fa --report --use-mpa-style > /projects/b1052/Wells_b1042/Morgan/Kraken2/Output/S_AS1_Kraken_Output_Report.txt
But the output still looks like this, which is identical to the output without the extra report options specified:
C contig-101_31860 990316 600 990316:566
Can you please let me know why this isn't working?
Similar to the issue here with Kraken v1,( DerrickWood/kraken#114 ), in Kraken2 the downloading and updating the RefSeq databases is not working. I get the following error:
kate@.../software/kraken2-master$ ./kraken2-build --standard --threads 50 --db Aug2018_RefSeq Downloading nucleotide est accession to taxon map...rsync: failed to connect to ftp.ncbi.nlm.nih.gov (165.112.9.229): Connection refused (111) rsync: failed to connect to ftp.ncbi.nlm.nih.gov (2607:f220:41e:250::7): Network is unreachable (101) rsync error: error in socket IO (code 10) at clientserver.c(128) [Receiver=3.1.1]
There is a python workaround using wget instead of rsync ( https://github.com/sejmodha/MiscScripts/blob/master/UpdateKrakenDatabases.py ) but that has some issues too. Any chance you could fix this? Thanks.
First file was ok, but I don't think the second one exists?
This looks like it might be it? without the 'nr' ?
ftp://ftp.arb-silva.de//release_132/Exports/taxonomy/tax_slv_ssu_132.acc_taxid
kraken2-build --db silva --threads 36 --special silva
--2018-06-28 14:48:38-- ftp://ftp.arb-silva.de//release_132/Exports/SILVA_132_SSURef_Nr99_tax_silva.fasta.gz
=> ‘SILVA_132_SSURef_Nr99_tax_silva.fasta.gz’
SILVA_132_SSURef_Nr99_ta 100%[=================================>] 229.21M 102KB/s in 36m 10s
2018-06-28 15:24:52 (108 KB/s) - ‘SILVA_132_SSURef_Nr99_tax_silva.fasta.gz’ saved [240343558]
--2018-06-28 15:24:59-- ftp://ftp.arb-silva.de//release_132/Exports/taxonomy/tax_slv_ssu_nr_132.acc_taxid
=> ‘tax_slv_ssu_nr_132.acc_taxid’
==> PASV ... done. ==> RETR tax_slv_ssu_nr_132.acc_taxid ...
No such file ‘tax_slv_ssu_nr_132.acc_taxid’.
Hi,
I finished downloading "nt" database and I am now stuck at the step of masking low-complexity sequences. The dustmasker script has been running for almost a day. I see that it not multi-threaded. Is it typical for this step to take such a long time ? Would it be possible to replace "dustmasker" with a more efficient tool such as "repeatmasker" ?
Ajay.
I noticed on the website for Kraken2 under 'specialist databases':
"Note that these databases may have licensing restrictions regarding their data, and it is your responsibility to ensure you are in compliance with those restrictions; please visit the databases' websites for further details."
This was presumably a reference to the dual-licencing of SILVA, which was not previously available without licences for commercial use. This restriction is now gone, it is part of ELIXIR and a 'Core Data Resource'
See DerrickWood/kraken#127 for original submission. Example output line (with initial "C" but taxid 0) is below:
C M00963:162:000000000-AHPVV:1:1101:14752:1397 0 123|123 0:89 |:| 0:15 27592:5 9913:8 0:19 0:22 0:20
@stuber Can you give me some more information to help me track this down? What kind of Kraken 2 database was being run against here? What was the full command (starting with kraken2
) that would replicate this output?
Hi there, is it possible to implement kraken-mpa-report/kraken-report like scripts? Once we need both report and mpa report, or try different confidence value, currently there seems an only way to rerun whole kraken2 classification.
At the moment sequences without accession number as build template either lead to bailout of kraken2-build
, or are simply ignored. I propose an option --default-taxid
which when set leads to sequences without accession number being given the specified taxid. This way one can train sequences that are not in NCBI's taxonomy---at the moment these are ignored and will give no hit. If this gives too much freedom the default taxid could be set as unknown organism
. This would still be better than learning and recognizing nothing.
Hi!
I'm running kraken 2 to do the taxonomic classification of some microbiome data (assembly of PE reads) and I'm using the greengages databased. I want to get to know all the taxonomic classification of each taxids (all the leves) just like the output of kraken-translate, but I only get one level of taxonomy. I run the following lines:
kraken2-build --db./Kraken/ggdatabase --special greengages
(to build the database)
kraken2 --db ./Kraken/ggdatabase exported/dna-sequences.fasta > output_kraken.txt
(to do the classification)
kraken2 --db ./Kraken/ggdatabase --use-names exported/dna-sequences.fasta > output_kraken_names.txt
(To get the scientific names)
Any ideas on how to deal with this would be very much appreciated!
Camila
Hey,
I classified my 16S reads with "kraken2 --db $DB_Silva --threads 5 --output $out/20170519_Silva.kraken --report $out/20170519_Silva.mpareport --use-mpa-style $infile1". Although I got a normal kraken output file, I only obtained an empty mpa-report. The same for the other 16S databases (RDP and Greengenes) I tried the same command with a custom database that I used for my shotgun samples and got a normal mpa-report. So I would guess that the 16S databases could be the problem here. I recognized that in the taxonomy of the 16S databases only two files (only names.dmp and nodes.dmp) were present in contrast to the custom database with 22 files in that folder. Do you have an idea what could cause the problem here?
Thank you
Josephine
Dear developer,
The building progress stop in the step " Step 1/2: Performing rsync file transfer of requested files" for more than 24 hours. Is it normal? The command is as follows:
./kraken2-build --standard --threads 40 --db /opt/kraken2/database
--2018-07-08 15:35:38-- ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/nucl_est.accession2taxid.gz
=> “nucl_est.accession2taxid.gz”
........................................
(The output is omitted)
2018-07-08 17:59:27 (0.00 B/s) - “taxdump.tar.gz” 已保存 [43921242]Downloaded taxonomy tree data
Uncompressing taxonomy data... done.
Untarring taxonomy tree data... done.
Step 1/2: Performing rsync file transfer of requested files
Rsync file transfer complete.
Step 2/2: Assigning taxonomic IDs to sequences
Processed 272 projects (408 sequences, 687.34 Mbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library...done.
Step 1/2: Performing rsync file transfer of requested files
Derrick,
There is a feature we would really like to add to Kraken, but before filing a pull request etc we wanted to discuss the idea.
Basically, as you know, NCBI has many taxanomic errors - either from sloppy submission, lack of knowledge at time of submission, or just ignorance. There is lots of domain expertise out in the world, especially in public health labs where species ID is our bread and butter.
Our colleague @cgorrie here has curated the whole Klebsiella complex, and is putting together a table as follows:
ACCESSION OLD_TAXID NEW_TAXID
...
I was hoping kraken2-build
could support a --patch-taxids <file>
option which would remap anything it finds in the above table.
My goal would for other groups eg. @happykhan @lskatz @andersgs @schultzm would probably also have similar knowledge that could be contributed to make Kraken2 databases more reliable.
What are your thoughts?
Some examples:
export KRAKEN2_CMDLINE_OPTIONS="--memory-mapping"
export KRAKEN2_CMDLINE_OPTIONS="--output - --use-names --gzip-compressed"
Not sure about handling clash with other $KRAKEN2_*
etc.
FTP struggles in countries like AU where there are >25 hops to NCBI and ENA.
The way you've implemented rsync
for the genome libraries has been awesome and makes it work within hours rather than days.
Can you do the same for the taxonomy files?
It makes it 50x faster for us. Yes fifty.
wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz
154 KB/sec
rsync --progress rsync://rsync.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz .
7.86 MB/sec
I'm currently trying to build a custom database but the lookup_accession_numbers.pl
script seems to take a very long time to retrieve taxids.
I looked at the lookup_accession_numbers.pl
code and it seems the problem is similar to the one I had with the lookup_accession_numbers.pl
script in Kraken #94
Currently you are:
1) Adding each accession number from the lookup list to a Perl hash
2) Reading each *.accession2taxid NCBI file line by line and extract taxids if corresponding accession numbers are in the hash build in 1)
Do you think doing the opposite can be faster?
1) Creating a Perl hash based on each *.accession2taxid NCBI file
2) Reading the accession number lookup list line by line and extract corresponding taxids in the hash build in 1)
There is currently 674 511 357 entries in the *.accession2taxid NCBI files (downloaded with kraken2-build --taxonomy --db krakendb
today) and it guess that the lookup list will always be smaller right?
I'm trying to build a database with refseq/nt and refseq/evn-nt on rhel 7 with a 24 core machine with 160gb ram. I have pulled a fresh copy from the master branch and compiled it according to the readme. I have successfully built the standard database, but when I try the nt and env_nt database it fails. I'm going to rerun it and collect memory usage statistics.
$ kraken2-build --db nt_nt-env/ --build --threads 24
Creating sequence ID to taxonomy ID map (step 1)...
Found 175778622/175780306 targets, searched through 681337449 accession IDs, search complete.
lookup_accession_numbers: 1684/175780306 accession numbers remain unmapped, see unmapped.txt in DB directory
Sequence ID to taxonomy ID map complete. [2h19m9.840s]
Estimating required capacity (step 2)...
Estimated capacity requirement: 198587649460 bytes
Capacity estimation complete. [49m13.776s]
Building database files (step 3)...
Taxonomy parsed and converted.
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
xargs: cat: terminated by signal 13
/syn-bio/var/opt/kraken2/build_kraken2_db.sh: line 119: 3554 Done list_sequence_files
3555 Exit 125 | xargs -0 cat
3556 Aborted (core dumped) | build_db -k $KRAKEN2_KMER_LEN -l $KRAKEN2_MINIMIZER_LEN -S $KRAKEN2_SEED_TEMPLATE $KRAKEN2XFLAG -H hash.k2d.tmp -t taxo.k2d.tmp -o opts.k2d.tmp -n taxonomy/ -m $seqid2taxid_map_file -c $required_capacity -p $KRAKEN2_THREAD_CT
my linux distribution is CentOS release 6.7 (Final)
compiled kraken2 with gcc-4.9.3
g++ -fopenmp -Wall -std=c++11 -O3 -DLINEAR_PROBING -c -o mmscanner.o mmscanner.cc
In file included from mmscanner.cc:7:0:
mmscanner.h:34:23: error: 'SIZE_MAX' was not declared in this scope
size_t finish = SIZE_MAX);
^
mmscanner.cc: In constructor 'kraken2::MinimizerScanner::MinimizerScanner(ssize_t, ssize_t, uint64_t, bool, uint64_t)':
mmscanner.cc:32:18: error: 'SIZE_MAX' was not declared in this scope
if (finish_ == SIZE_MAX)
^
mmscanner.cc: In member function 'void kraken2::MinimizerScanner::LoadSequence(std::string&, size_t, size_t)':
mmscanner.cc:43:18: error: 'SIZE_MAX' was not declared in this scope
if (finish_ == SIZE_MAX)
^
make: *** [mmscanner.o] Error 1
fixed it based on suggestion in the following post
https://stackoverflow.com/a/42097570
That lead to next issue
g++ -fopenmp -Wall -std=c++11 -O3 -DLINEAR_PROBING classify.cc reports.o mmap_file.o compact_hash.o taxonomy.o seqreader.o mmscanner.o omp_hack.o aa_translate.o -o classify
classify.cc: In function 'taxid_t ClassifySequence(kraken2::Sequence&, kraken2::Sequence&, std::ostringstream&, kraken2::KeyValueStore*, kraken2::Taxonomy&, IndexOptions&, Options&, ClassificationStats&, kraken2::MinimizerScanner&, std::vector<long unsigned int>&, taxon_counts_t&, std::vector<std::basic_string<char> >&)':
classify.cc:505:33: error: 'UINT64_MAX' was not declared in this scope
uint64_t last_minimizer = UINT64_MAX;
^
make: *** [classify] Error 1
fixed it based on the suggestion from following post
https://stackoverflow.com/a/3233069
posting the patch here as a solution for any one facing this issue
fix_install_kraken2.patch.txt
Hello,
I have older 1.5 Illumina 16S data. I am wondering if kraken2 can use it?
Also, how do I get the 16S databases? Just format from the fasta? Can I add custom sequences?
Cheers
Rick
Hi,
I had the same/a similar problem as described here #38 and DerrickWood/kraken#114. However, for me rsync only failed once in a while: the taxonomy download via download_taxonomy.sh
mostly ran through. However, the syncing of the genomic DNA files sometimes died. I'm not sure if it is a problem/timeout with the NCBI rsync server or if it's client side on our cluster.
However, a potential workaround is to add a check in the rsync system call in rsync_from_ncbi.pl
to retry syncing in case rsync returns anything but 0:
my $rc = 1;
while($rc){
print STDOUT "\nTrying rsync\n";
$rc = system("rsync --no-motd --files-from=manifest.txt rsync://ftp.ncbi.nlm.nih.gov/genomes/ .");
}
This is a potential infinite loop, of course. Could be adjusted to only do it if the error msg is Network is unreachable (101) or a counter or something.
In the introduction of kraken2 ,you said kraken2 if differ from kraken with the minimizers of the k-mers?
could you introduce it in some detail to make it easier to understand?
I have a list of 303 accession IDs in protozoa
that apparently consist completely of sequences that produce hits from other domains (and so they would give false positives when using protozoa
). I would now like to have the feature that kraken, when building a database, looks up IDs in my blacklist so the accessions are no longer included with any new database build. Sounds reasonable?
Hi Derrick,
I tried classifying my sample reads with kraken database containing only "nt" sequences and found that all my reads were classified at the root. Here is the output from the first few lines of the report file:
100.00% 14552 14535 R 1 root
0.12% 17 0 R1 131567 cellular organisms
0.12% 17 17 D 2 Bacteria
I did the same classification using the "standard database" and I got similar results that I found in Kraken1:
100.00% 14552 0 R 1 root
100.00% 14552 0 R1 131567 cellular organisms
100.00% 14552 0 D 2 Bacteria
44.38% 6458 0 D1 1783270 FCB group
41.62% 6057 0 P 1224 Proteobacteria
14.00% 2037 7 D1 1783272 Terrabacteria group
I wonder if there are some kmers that are found in some bacterial species, are also annotated in weird categories placed near the root of the taxonomy tree; specifically in the case of the "nt" database ?
Kraken has upgraded to version 2, but Krakenhll could be used by kraken2 to get kmer_coverage results ?
I was doing kraken-build --db XXX --download-taxonomy
and the 4th FTP stalled, so i restarted it, and it started re-downloading everything from scratch (it takes 10 hours to get them from NCBI over FTP to my uni, they throttle FTP)
https://github.com/DerrickWood/kraken2/blob/master/scripts/download_taxonomy.sh#L24-L27
wget
option to avoid re-download?if [ ! -r $FILE ]; then wget ... ; fi
? --continue
Continue getting a partially-downloaded file. This is useful when you want to finish up a
download started by a previous instance of Wget, or by another program.
ascp
if it is installed?Hi i have already built the standard database using the kraken2-build. How can i add more genomes to the bacterial library in the database.
cp $KRAKEN2_DIR/bin/kraken2{,-build} $HOME/bin
Should -inspect
be there too?
Can you please change #!/usr/bin/perl
to #!/usr/bin/env perl
so it uses the Perl in the PATH rather than forcing system perl?
I know most ppl will have system perl, and you only use core modules, but it causes problems for packaging in bioconda and brew.
brew install brewsci/bio/kraken2
There is also a bioconda
recipe happeninh:
bioconda/bioconda-recipes#10502
It prints nothing.
Us poor users need some STDERR comfort.
It did return $? == 0 which is good.
Was the new '%' sign deliberate?
(It causes parsing failures in some downstream tools)
v1
86.44 476324 476324 U 0 unclassified
13.56 74700 1274 - 1 root
13.33 73425 244 - 131567 cellular organisms
13.27 73116 2776 D 2 Bacteria
v2
41.51% 228715 228715 U 0 unclassified
58.49% 322309 82 R 1 root
58.31% 321318 405 R1 131567 cellular organisms
57.91% 319096 4873 D 2 Bacteria
Hi, Dear Kraken Team.
I downloaded four libraries I'm interested to: archaea, bacteria, fungi and viral.
My question is:
Can I join the four libraries to build a unique database or do I have to build each separately?
I want to:
$ kraken2-build --build --threads 56 --db 'this_folder_has_four_libraries_joined/'
Thanks for your help.
Maryo.
For Kraken 1 "classic" i asked if collapsing taxids to species level would save space (like centrifuge?). The answer was no. Is this still true for Kraken 2?
If no, is there any existing code to collapse a --report
to S
level ?
Hi @DerrickWood
I'm truying to build a custom database and I have a lot of sequences to add so I tried to use the kraken2-build --add-to-library
command in parallel. I used the command from the MANUAL with the -P
option for xargs
but it seems to produce some errors. Here is the command I used:
find ${GENOMESDIR} -name *'.fna' -print0 | xargs -I{} -0 -n1 -P${CPU} kraken2-build --add-to-library {} --db ${DBNAME} &>> log.add_to_library
Where:
${GENOMESDIR}
is the path to the directory containing your FASTA files${CPU}
is the number of processors you want to use${DBNAME}
the name of your Kraken2 databaseThe problem seems to come from the add_to_library.sh
script. When using in parallel, the cp
and rm
commands at the end not always work because the temp_map.txt
file does not exit. I obtain error messages like:
cat: ${DBNAME}/kraken2/library/added/temp_map.txt: No such file or directory
and rm: cannot remove ‘${DBNAME}/kraken2/library/added/temp_map.txt’
It also seems to mess up the resulting prelim_map.txt
file.
Dear Developer,
The Kraken2 give the results of classified sequence and taxonomic ID. A report including aggregrate counts/clade also was generated. But How can I extract the classified reads of a specific taxonomic group for the following assembly? Is there any script including in kraken2 that can be used?
See issue #5 and the accompanying fix - other scripts have wget and need to have this protected against.
Can you direct link to the markdown manual in docs/ so we can read it on github?
Hi,
I started using Kraken2 and I tried to classified some reads and separate classified and unclassified reads with --classified-out
and --unclassified-out
options. Here is the command I launched:
kraken2 --db ${KRAKEN2_DB} reads.fastq --output reads.classification --classified-out reads.classified.fastq --unclassified-out reads.unclassified.fastq
And I obtained the following files:
reads.classified.fastq.fq
reads.unclassified.fastq.fq
It added .fq extention even if I gave filename with .fastq extension
In Standard Kraken Output Format, the line:
For example, "562:15 561:4 A:31 0:1 562:3" would indicate that:
should be:
For example, "562:13 561:4 A:31 0:1 562:3" would indicate that:
The HTML file has the same typo, since it was generated from the same Markdown file.
Given a file having both thousands of sequences with accession numbers, and one sequence without accession number, kraken2-build --add-to-library
will refuse to add the file. This is not necessary, especially because scan_fasta_file.pl
takes a --lenient
switch that causes it to ignore sequences without accession numbers. All it takes is to add that switch to the invocation of scan_fasta_file.pl
in add_to_library.sh
:
diff --git a/scripts/add_to_library.sh b/scripts/add_to_library.sh
index e4987e0..876d79b 100755
--- a/scripts/add_to_library.sh
+++ b/scripts/add_to_library.sh
@@ -25,7 +25,7 @@ fi
add_dir="$LIBRARY_DIR/added"
mkdir -p "$add_dir"
prelim_map=$(cp_into_tempfile.pl -t "prelim_map_XXXXXXXXXX" -d "$add_dir" -s txt /dev/null)
-scan_fasta_file.pl "$1" > "$prelim_map"
+scan_fasta_file.pl --lenient "$1" > "$prelim_map"
filename=$(cp_into_tempfile.pl -t "XXXXXXXXXX" -d "$add_dir" -s fna "$1")
The protozoa)
case is missing from the download script:
https://github.com/DerrickWood/kraken2/blob/master/scripts/download_genomic_library.sh
kraken2-build --db mydb --download-library protozoa
Unknown library type "protozoa"
Usage: kraken2-build [task option] [options]
Task options (exactly one must be selected):
--download-taxonomy Download NCBI taxonomic information
--download-library TYPE Download partial library
(TYPE = one of "archaea", "bacteria", "plasmid",
"viral", "human", "fungi", "plant", "protozoa",
"nr", "nt", "env_nr", "env_nt", "UniVec",
"UniVec_Core")
As far as I can tell, kraken2
isn't available from conda, although kraken1
is. I'm mainly interested in a conda submission because my research group uses it for installing and managing (almost) all of our bioinformatics software, and I would prefer to use the latest version of kraken
. Are you planning (or currently working on) submitting to conda?
Thanks.
Hi
I wonder how to calculate the relative abundance with the report-mpa-style table to compare different samples.
any idea?
hu
thanks
I haven't taken the time to track down the source, but I have noticed that the annotated fastq files that are output by '--classified-out' have incorrect taxids.
For example, I have a certain taxid in the kraken2 output file and report file, and regex-ing the classified-out I can't even find that taxid. If I go find the read name in the output file and pull it by name out of the classified-out fastq, the read is what I think it is, but the "taxid|XXXX" annotation is wrong. I can't see a logical reason for the taxid that is printed. However, the number printed in the fastq is consistently wrong, so taxid 9606 is represented as 19742 every time.
Hi,
This could be related to #8
When applying the change suggested in the 16S_silva_installation.sh
script
the kraken2-build command works.
bin/kraken2-build --special silva --db $PWD/silva --threads $(nproc)
However, when I run kraken2-inspect
with the database that was created in the previous step the only named node I get is root. All other nodes are not named.
bin/kraken2-inspect --db /db/silva/ | head -100:$PWD -u $(id -u)
# Database options: nucleotide db, k = 35, l = 31
# Spaced mask = 11111111111111111111111111111111111111110011001100110011001100
# Toggle mask = 1110001101111110001010001100010000100111000110110101101000101101
# Total taxonomy nodes: 13528
# Table size: 26477341
# Table capacity: 37889828
100.00% 26477341 13956 R 1 root
73.25% 19395500 1121078 R1 3
18.61% 4928033 76036 R2 2375
9.73% 2575341 146135 R3 3303
2.42% 640458 33886 R4 26341
1.43% 377766 42528 R5 26352
0.16% 43095 43095 R6 26368
0.07% 18838 18838 R6 26414
0.06% 16298 16298 R6 26384
0.06% 16233 16233 R6 26471
0.06% 15816 15816 R6 26355
0.06% 14829 14829 R6 26361
0.04% 11854 11854 R6 26358
Consequently, (which may be an unrelated issue), when I try to classify sequences with kraken2 I get the results with taxid but not names of taxa eg:
bin/kraken2 --db /db/silva --report $PWD/report.txt --classified-out $PWD/clas.txt --output $PWD/out.txt --use-names $PWD/00A00044_Ext2_Rep2_S172_L001_R1_001.fastq.gz
118679 sequences (35.72 Mbp) processed in 3.534s (2014.7 Kseq/m, 606.43 Mbp/m).
118140 sequences classified (99.55%)
539 sequences unclassified (0.45%)
kmavrommatis@ip-10-112-17-141: Tue Aug 21 22:09:02 microbiome (0)
$more out.txt
C M04141:136:000000000-B96YR:1:1101:22984:1662 918 301 912:1 3:3 912:1 3:36 913:5 918:1 3:1 912:3 918:9 912:5 3:40 918:2 913:5 918:19 912:5 918:2 3:3 918:3 3:49 913:2 912:5 3:1 912:1 918:25 913:1 3:2 918:2 3:1 0:5 3:1 918:5 3:2 913:7 918:1 0:3 913:5 0:5
C M04141:136:000000000-B96YR:1:1101:19609:1681 1863 301 3:43 1863:27 3:17 1863:5 3:15 1863:28 3:1 1863:8 3:64 1863:11 3:38 1:5 0:5
C M04141:136:000000000-B96YR:1:1101:14361:1737 1863 301 3:26 1863:1 3:16 1863:44 3:4 1863:50 3:5 1:1 3:42 1:1 3:72 1:2 0:3
C M04141:136:000000000-B96YR:1:1101:22479:1741 918 301 3:51 913:5 3:1 918:3 912:5 3:36 918:34 3:3 918:8 3:54 918:2 912:5 918:16 3:10 918:5 3:1 918:5 3:2 913:5 0:6 913:5 1780:2 0:3
C M04141:136:000000000-B96YR:1:1101:9321:1752 24660 301 3:131 913:10 3:6 913:4 3:23 913:27 912:1 24660:22 913:3 3:35 0:5
C M04141:136:000000000-B96YR:1:1101:11659:1762 11328 301 3:59 1863:1 3:5 1863:16 3:10 1863:2 3:4 1863:10 1987:5 1863:4 1987:15 3:83 1863:5 1987:3 1863:7 1672:3 1863:9 3:5 1:7 3:1 1:3 11328:5 3:2 0:3
C M04141:136:000000000-B96YR:1:1101:9249:1764 1987 301 3:59 1863:1 3:5 1863:16 3:10 1863:2 3:4 1863:10 1987:5 1863:4 1987:15 1672:5 3:3 1672:2 3:105 1:7 0:1 1:3 0:10
C M04141:136:000000000-B96YR:1:1101:10307:1811 1863 301 3:46 1863:36 3:9 1863:55 1:1 3:42 1:1 3:63 0:1 3:3 0:5 2588:2 0:3
For comparison, when the same commands are run on a kraken2 database created from refseq all results include the taxa names.
Any advice?
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.