I'm classifying representative sequences of quality controlled and clustered 16S reads with command:
java -jar AlignmentTools.jar pairwise-knn query.fq db.fa
The db file is unaligned prokaryotic subset of RDP 11.4 clustered at 99% (with some sequence length thresholds).
Is this a sensible way to assign taxonomy to my representative sequences?
In output, I see lines like:
@650A9:00200:00424 1 + 155 1.000 0 34 34 0 83 S004055894 Listeria monocytogenes; CA5 Lineage=Root;rootrank;Bacteria;domain;Firmicutes;phylum;Bacilli;class;Bacillales;order;Listeriaceae;family;Listeria;genus
As far as I can tell it's QID KNEIGHBOURS STRAND SCORE %ID QSTART QEND QEND QSTART SSTART SID. Is this the correct interpretation? Why is it that the QSTART and QEND values are displayed twice?