Giter Site home page Giter Site logo

Comments (19)

WilliamZekaiWang avatar WilliamZekaiWang commented on August 19, 2024 1

I will look further into this.

While testing this program I didn't get to test it much on IN, which could explain the many errors.

from sierra-local.

WilliamZekaiWang avatar WilliamZekaiWang commented on August 19, 2024

Hi, I couldn't replicate your issue

I've tried running an RT sequence with ambiguous nucleotides on HIVdb version 8.8 and HIVdb version 9.4 and didn't get any error messages.

could you try using the HIVdb version 9.4 through the submodule to see if this issue still occurs?

from sierra-local.

ArtPoon avatar ArtPoon commented on August 19, 2024

No follow up from user, closing

from sierra-local.

erick-dorlass avatar erick-dorlass commented on August 19, 2024

Hello,
Sorry for taking so long for a follow up.
I am now using HIVdb version 9.4 in a pipeline for detection of low-frequencies mutations that lead to ARV resistance. In this pipeline, 3 consensus sequences are generate, one of them containing low frequencies mutations and commonly has IUPAC nucleotides.
This consensus sequence generates the error first mentioned, even with version 9.4, however works fine in sierra-web. This is the consensus sequence:

>consensus
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCCAGAGCCAACAGCC
CCACCAGCAGAGAGCTTCAGGTTCGGAGAGGAGATAACCCCCTCTCMGAAGCAGGAACAG
AAGGAAGAGGGAAYATACCCTCCTTTAGCTTCCCTCAAATCACTCTTTGGCAACGACCMG
TARTCACARTAAAAGTAGRGGGACAGCTAAGGGAAGCTTTATTAGAYACAGGAGCAGATG
ATACAGTATTMGAAGACRTAGATTTACCAGGAAAMTGGAAGCCAAAAATAATAGGGGGAR
TTGGAGGTTTTSTCARAGTAAARCAGTATGATAACRTACWCATAGAAATTTGTGGRCAYA
ARGTTACAGGTACAGTAGTAGTAGGACCTACACCTRYSAAYATAATTGGAAGAAATCTGT
TGACTCAGCTGGGTTGCACTTTAAATTTTCCCATWAGTCCTATTGAAACTGTACCAGTAA
AATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAAGAAAAAA
TAAAAGCATTAACAGAAATATGTACAGAAATGGAAAAAGAAGGGAAAATTTCAAAAATTG
GGCCTGAAAATCCATAYAATACTCCARTATTTGCCATAAAGAAAAARRACAGTWCTAGAT
GGAGAAAATTAGTAGATTTCAGRGAACTTAATAAAAGAACTCAAGATTTTTGGGAGGTTC
AATTAGGAATACCGCATCCTGCGGGATTAMAAAAGAAAAAATCAGTAACAGTACTGGATG
TGGGTGATGCATATTTTTCAGTTCCCTTAGATGAAGAYTTCAGGAAGTATACTGCATTTA
CCATACCTAGTACMAATAATGAGACACCRGGGATTAGATATCAGTACAATGTGCTTCCAC
AAGGATGGAAAGGATCACCAGCAATATTCCAAAGCAGCATGACAAAAATCTTAGAGCCCT
TTAGAAAACAAAACCCGGACATAGTGATYTATCAATAYGTGGATGATTTGTATGTAGGAT
CTGATYTAGAAATAGGGCAGCATAGAACAAAAWWRAGGAACTGAGACARCATCTRTTGAC
GTGGGGATTGACCACACCAGATMAAAAACACCAGAAAGAACMTCCATTTCTTTGGATGGG
KTATGAACTCCATCCTGATAAATGGACAGTACAGCCTATAGAGCTGCCAGAAAAGGAYAG
CTGGACTGTCAATGACATACAGAAGTTAGTGGGAAAATTRAATTGGGCAAGTCAGATTTA
CCCAGGGATTAARGTAAAGCAATTATGTAGACTCCTTARGGGAACCAAGGCRCTAACAGA
AGTAGTACCMCTAACAAAAGARGCAGAGTTAGAACTGGCAGAAAACAGGGAAATTCTAAA
GGAACCAGTACATGGAGTGTATTATGACCCATCAAAAGACTTNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTCGGGATY
AGAAGTAAAYATAGTRACAGACTCACAGTATGCATTAGGAATCATTCAAGCACAACCAGA
TAAGAGTGAATCAGAGATAGTCAATCAAATAATAGAGCAATTAATAACAAAGGAAAAGGT
CTACCTGTCATGGGTACCAGCACACAAAGGAATTGGAGGAAATGAACAAGTAGATAAATT
AGTCAGTACTGGAATCAGAAAAGTACTATTTTTAGATGGAATAGATAAAGCCCAAGAAGA
ACATGAGAAGTATCACAGTAATTGGAGGGCAATGGCCAGTGATTTTAACCTGCCACCTGT
GGTAGCAAAAGAAATAGTAGCCAGCTGTGATAAATGTCAGCTAAAAGGAGAAGCCATGCA
TGGACAAGTAGACTGTAGTCCAGGAATATGGCAACTAGATTGTACACATTTAGAAGGAAA
AGTTATCCTGGTAGCAGTCCATGTAGCTAGTGGATACCTAGAAGCAGAAGTTATCCCAGC
AGAAACAGGACAAGAAACAGCCTACTTCATACTAAAGTTAGCAGGAAGATGGCCAGCAAA
GCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNAGGAARRWKGGCCAGCAAAGCTACTCTGGAAAGGTGAAGGGGCAGTAGTC
ATACAAGACAATAGTGAAATAAAGGTAGTACCAAGAAGAAAAGCAAAGATCATTAGGGAT
TATGGAAAACAGATGGCAGGTGATGATTGTGTGGCAGGTAGACAGAATGAGGATTAACNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
N

PS: The consensus sequence is generated by applying variants present in a VCF file to a reference sequence (HXB2). This sequence in particular was obtained by sequencing specific genome targets by amplicon sequencing, therefore so many N's, corresponding to regions with no coverage.

I would still appreciate help with this.
Thank you

from sierra-local.

WilliamZekaiWang avatar WilliamZekaiWang commented on August 19, 2024

Hi Erick,

No worries about the time.

I ran this sequence through the new alignment program we are swapping to--listed in #71--and it successfully aligned your sequence.

This issue should get fixed in the incoming PR

from sierra-local.

erick-dorlass avatar erick-dorlass commented on August 19, 2024

Thank you for such a quick answer.
I'll wait for the incoming PR.

from sierra-local.

erick-dorlass avatar erick-dorlass commented on August 19, 2024

Hello,

I have tried with the current version in main containing the new alignment program, and indeed I don't get the error anymore. However, the total score of some ARV are wrong, with mutations that are not there due to lack of coverage.
For example, the mutation N348I in RT don't exist, the sequence has an N in this position, wich is confirmed in the list of mutations in RT, but the mutation is still called for the score of NVP drug. Many more of this can be found along the genome, specially in the IN gene.

from sierra-local.

WilliamZekaiWang avatar WilliamZekaiWang commented on August 19, 2024

There are incorrectly labelled APOBEC mutations which I have found and working on a fix for. This is likely due to an outdated version of the apobec_drm.json file.

Regarding incorrectly identifying mutations, I compared sierralocal, running on postalign, with sierrapy, and the output position and AA matched. Could you elaborate on where you are comparing the outputs so I can have a deeper look?

from sierra-local.

erick-dorlass avatar erick-dorlass commented on August 19, 2024

Hello,

Thank you for the investigation. I'll wait for the fix on the apobec_drm.json file.

To assure that we are using the same sequence for these tests, I'll repost it here:

>consensus
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCCAGAGCCAACAGCC
CCACCAGCAGAGAGCTTCAGGTTCGGAGAGGAGATAACCCCCTCTCMGAAGCAGGAACAG
AAGGAAGAGGGAAYATACCCTCCTTTAGCTTCCCTCAAATCACTCTTTGGCAACGACCMG
TARTCACARTAAAAGTAGRGGGACAGCTAAGGGAAGCTTTATTAGAYACAGGAGCAGATG
ATACAGTATTMGAAGACRTAGATTTACCAGGAAAMTGGAAGCCAAAAATAATAGGGGGAR
TTGGAGGTTTTSTCARAGTAAARCAGTATGATAACRTACWCATAGAAATTTGTGGRCAYA
ARGTTACAGGTACAGTAGTAGTAGGACCTACACCTRYSAAYATAATTGGAAGAAATCTGT
TGACTCAGCTGGGTTGCACTTTAAATTTTCCCATWAGTCCTATTGAAACTGTACCAGTAA
AATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAAGAAAAAA
TAAAAGCATTAACAGAAATATGTACAGAAATGGAAAAAGAAGGGAAAATTTCAAAAATTG
GGCCTGAAAATCCATAYAATACTCCARTATTTGCCATAAAGAAAAARRACAGTWCTAGAT
GGAGAAAATTAGTAGATTTCAGRGAACTTAATAAAAGAACTCAAGATTTTTGGGAGGTTC
AATTAGGAATACCGCATCCTGCGGGATTAMAAAAGAAAAAATCAGTAACAGTACTGGATG
TGGGTGATGCATATTTTTCAGTTCCCTTAGATGAAGAYTTCAGGAAGTATACTGCATTTA
CCATACCTAGTACMAATAATGAGACACCRGGGATTAGATATCAGTACAATGTGCTTCCAC
AAGGATGGAAAGGATCACCAGCAATATTCCAAAGCAGCATGACAAAAATCTTAGAGCCCT
TTAGAAAACAAAACCCGGACATAGTGATYTATCAATAYGTGGATGATTTGTATGTAGGAT
CTGATYTAGAAATAGGGCAGCATAGAACAAAAWWRAGGAACTGAGACARCATCTRTTGAC
GTGGGGATTGACCACACCAGATMAAAAACACCAGAAAGAACMTCCATTTCTTTGGATGGG
KTATGAACTCCATCCTGATAAATGGACAGTACAGCCTATAGAGCTGCCAGAAAAGGAYAG
CTGGACTGTCAATGACATACAGAAGTTAGTGGGAAAATTRAATTGGGCAAGTCAGATTTA
CCCAGGGATTAARGTAAAGCAATTATGTAGACTCCTTARGGGAACCAAGGCRCTAACAGA
AGTAGTACCMCTAACAAAAGARGCAGAGTTAGAACTGGCAGAAAACAGGGAAATTCTAAA
GGAACCAGTACATGGAGTGTATTATGACCCATCAAAAGACNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTCGGGATY
AGAAGTAAAYATAGTRACAGACTCACAGTATGCATTAGGAATCATTCAAGCACAACCAGA
TAAGAGTGAATCAGAGATAGTCAATCAAATAATAGAGCAATTAATAACAAAGGAAAAGGT
CTACCTGTCATGGGTACCAGCACACAAAGGAATTGGAGGAAATGAACAAGTAGATAAATT
AGTCAGTACTGGAATCAGAAAAGTACTATTTTTAGATGGAATAGATAAAGCCCAAGAAGA
ACATGAGAAGTATCACAGTAATTGGAGGGCAATGGCCAGTGATTTTAACCTGCCACCTGT
GGTAGCAAAAGAAATAGTAGCCAGCTGTGATAAATGTCAGCTAAAAGGAGAAGCCATGCA
TGGACAAGTAGACTGTAGTCCAGGAATATGGCAACTAGATTGTACACATTTAGAAGGAAA
AGTTATCCTGGTAGCAGTCCATGTAGCTAGTGGATACCTAGAAGCAGAAGTTATCCCAGC
AGAAACAGGACAAGAAACAGCCTACTTCATACTAAAGTTAGCAGGAAGATGGCCAGCAAA
GCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNAGGAARRWKGGCCAGCAAAGCTACTCTGGAAAGGTGAAGGGGCAGTAGTC
ATACAAGACAATAGTGAAATAAAGGTAGTACCAAGAAGAAAAGCAAAGATCATTAGGGAT
TATGGAAAACAGATGGCAGGTGATGATTGTGTGGCAGGTAGACAGAATGAGGATTAACac
atggaaaagtttagtaaaataccatatgcatgtttcaaagaaagccaaaagatggtttta
tagacctcactttgaaagcatgcatccaagagtaagttcagaagtacacatcccactaga
ggaagctaaattagtaataacaacatattggggtctgcatacaggaNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNaggtgtgaatatcaagcaggacataN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
N

For generating this consensus sequence, obtained reads were mapped to HXB2 reference, variants were called (AF >= 0.15) and consensus were generated with bcftools. We used "N" for regions without coverage (this is an aplicon-based sequencing). IUPAC codes were used for with 2 or more haplotypes. The first sequence that I posted in this issue was generated by mapping in a different HIV-1 sequence (a genbank refseq NC_001802.1), so there may be some differences.

I'm using sierra-web-service for output comparison. The analysis output of sierra-web-service is attached.
analysis-reports-consensus.zip

The siera-local, running on postalign, output json file is also attached .
consensus_resistance.json.zip

from sierra-local.

ArtPoon avatar ArtPoon commented on August 19, 2024

@WilliamZekaiWang can you update us about this for next week?

from sierra-local.

WilliamZekaiWang avatar WilliamZekaiWang commented on August 19, 2024

I've looked deeper into this issue and I think the problem could be partially on our end. Post-align isn't the issue, it provides the correct data.

The issue seems to lie in the XML files for HIVDB, in our case HIVDB_9.4.xml, that we use to get comments about specific drug-resistant mutations. Updating this file with the updater script didn't fix the issue.

Of the incorrectly reported drug resistance mutations, they are all listed as an amino acid mutated to X from the nucleotides being NNN. I'm pretty sure we match the new amino acid and position to see if we get a hit in the HIVDB_9.4.xml. However, since it's ambiguous, sierra-web might ignore these cases. Making Sierra-local ignore reporting potential drug-resistant mutations if it includes X doesn't work.

another error I found while looking around is with how sierra-web displays the specific mutation in drug-resistant mutations. Below are the collated drug-resistant mutations for a specific gene.

        "PR": [
            "V82VAIMT",

compared to sierra-local

        "PR": [
            "V82A",
            "V82T"

The actual mutation is V82X with X being one of AIMTV. We do correctly report this in sierra-local. However, the HIVDB_9.4.xml file, only holds information on V82A and V82T drug resistance. Hence, we only state V82A and V82T rather than VAIMT, under partialScores for drugScores, conflicting with sierra-web's output.

from sierra-local.

ArtPoon avatar ArtPoon commented on August 19, 2024

@WilliamZekaiWang can you please generate a few more test cases by modifying a published sequence and running it through HIVdb's web service to capture the JSONs as expected outputs? Then we can close this out.

from sierra-local.

WilliamZekaiWang avatar WilliamZekaiWang commented on August 19, 2024

There are some issues with the script, but it dealt with the user's sequence fine and reported the drug-resistant scores correctly.

However, there are incorrect drug scores in one of the mutated sequences, where I just took the sample RT file and replaced parts of the sequence with N. As of right now, I'm not too sure what is causing this and I'm still looking into it

from sierra-local.

ArtPoon avatar ArtPoon commented on August 19, 2024

@WilliamZekaiWang can you please summarize the divergence and labels for test sequences (these are artificial HIV sequences) in this issue?

from sierra-local.

WilliamZekaiWang avatar WilliamZekaiWang commented on August 19, 2024

I have uploaded the test file, tests\ambiguous_seq_results.json, which has N inserted at various positions on a random RT sequence (taken from the sequence in the RT.fa file that is in the readme).

Briefly:

  • RT1_frontN has the first 661 Nucleotides replaced with N
  • RT1_endN has the last 192 Nucleotides replaced with N
  • RT1_middleN has 354 Nucleotides in the middle replaced with N
  • the rest of the sequences are non-mutated RT sequences to check if my script broke any of the original functions

Comparing the scores of the drugs to sierrapy, RT_endN has an incorrect drug resistance report at H221Y, where sierrapy had nothing.

RT_endN also has incorrect drug scores for the following drugs

            "DOR": 10.0,
            "EFV": 10.0,
            "ETR": 10.0,
            "NVP": 15.0,
            "RPV": 15.0

where sierrapy reported 0 for all these drugs

from sierra-local.

ArtPoon avatar ArtPoon commented on August 19, 2024

We're going to have to dig into the sierrapy code to check how they are handling positions without coverage (missing data). I had thought that they would take the conservative approach (assume the worst case, i.e., resistance mutations are present).

from sierra-local.

WilliamZekaiWang avatar WilliamZekaiWang commented on August 19, 2024

I didn't have much luck digging up the code from standford's repository

I made scripts to find the differences and errors and pushed them to branch iss72. I also think I found the error.

The above extra mutations were all caused by a singular mutation that we assign as H221NHDY but sierrapy assigns as H221X. It seems like it's safe to assume that any ambiguous mutations labeled with X are ignored in their drug score calculation.

Unfortunately, this likely means that either, there's post alignment after post-align, or we are using a different version than sierrapy.

But, that being said, it seems like this issue isn't actually due to ambiguous nucleotides. Sierrapy denotes this mutation as a deletion mutation of AT- missing the last nucleotide. Which might mean the fix I made ignoring NNN is sufficient in solving the user's and related issues?

from sierra-local.

ArtPoon avatar ArtPoon commented on August 19, 2024

If there is really a single nucleotide deletion in the sequence, then the correct handling is either to accommodate the frameshift (first nucleotide of next codon becomes the third nucleotide of this codon, and so on) OR we mark the sequence as defective (which might be accomplished with X?). But the point is that if it is a real deletion then we should not be handling it as missing data, i.e., an ambiguous nucleotide.

from sierra-local.

ArtPoon avatar ArtPoon commented on August 19, 2024

This situation is created by a fake test case, let's treat this as an edge case and close

from sierra-local.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.