Comments (5)
Reminds me of this issue nanoporetech/medaka#351, and the BAM looks very similar
from sars-cov-2-sequenzdaten_aus_deutschland.
As far as I remember it's not only at S:del69/70 but occured for other deletions, too (but 69/70 is the most prominent, common and problematic one).
When analyzed by nextclade the problematic sequences result in (3×n-1) dashes and an extra N
(@corneliusroemer wrote you a question about that in July on Twitter, if thats normal)
P.S.
If it's still the same as 2 months ago:
When you look at the unaligned/original FASTA sequence, it seems like that instead of a single '-' at deletions to indicate a gap of indetermined length, the erroneous sequence submissions have always an 'N'
from sars-cov-2-sequenzdaten_aus_deutschland.
Back in January, it was also S:69/70
When you look at the unaligned/original FASTA sequence, it seems like that instead of a single '-' at deletions to indicate a gap of indetermined length, the erroneous sequence submissions have always an 'N'
medaka variant
calls a 5 instead of 6 nt deletion. If there is an 'N' for this missing deletion, it means that the position is masked afterwards due to low coverage 🤔
from sars-cov-2-sequenzdaten_aus_deutschland.
Thanks for pointing to the medaka issue @MarieLataretu
So I guess the lab is not using the suggested workaround: solved with sup (super-acc) basecalling and respective medaka model
and neither fixes the issue manually. Such a frame shift in S is totally unviable.
If you drop the sequences in nextclade.org you will see the issue immediately. Weird that GISAID allowed these frame shifted sequences through - I thought they check for frameshifts.
I see - I probably shouldn't have opened this issue here as the submission didn't go through RKI? Or am I wrong?
from sars-cov-2-sequenzdaten_aus_deutschland.
I didn't had time to look at the frame shift sequences (and metadata) data in DESH.
There is only one sample sequenced at RKI with this frame shift (at least since the last frame shift wave at the beginning of 2022). However, we use the sup
model and the frame shift still appears.
A workaround is to use the nanopolish
mode instead of medaka
in the ARTIC workflow.
from sars-cov-2-sequenzdaten_aus_deutschland.
Related Issues (20)
- Potentially misdated sequences from Berlin uploaded to GISAID by RKI today HOT 1
- Fehlende Sequenz-/Metadaten (csv/fasta) Uploads ab 2022-09-27 HOT 4
- Using nextclade for pango lineage classification HOT 3
- Pipeline scheint wieder zu hängen HOT 1
- Trouble importing zipped csv into R HOT 3
- SARS-CoV-2-Sequenzdaten_Deutschland.fasta.xz - CRC-Fehler / Defektes Archiv HOT 4
- What does the "." denote in the FASTA files?
- Update 2022-12-14/15 für SARS-CoV-2-Sequenzdaten_Deutschland.csv/fasta.xz ausgefallen
- Keine Updates für SARS-CoV-2-Sequenzdaten_Deutschland csv/fasta seit 2022-12-31 HOT 1
- Sudden increase in artefactual mutations in sequences from Germany HOT 1
- Pipeline Probleme oder keine neue Daten mehr? HOT 1
- Update on the reporting of SARS-CoV-2 variants after the expiration of the Coronavirus Surveillance Regulation HOT 3
- Submission date lost for many sequences: SEQUENCE.PUSHED_TO_DWH empty for 1.099m out of 1.228m rows HOT 3
- README contains unclear explanation for meaning of SEQUENCING_REASON field HOT 1
- Update 2023-09-25?
- 2023-10-09? HOT 1
- IMS-10285-CVDP-18B5CDB7-673D-47A6-984F-0BF3CF7F7DDF HOT 2
- (Auto-)Updates of Metadata Files Missing (zenodo.json, govdata.ttl, nfdi4health.json) HOT 1
- Update 2024-01-17 HOT 2
- Update 2024-03-20 - fasta.xz not updated HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sars-cov-2-sequenzdaten_aus_deutschland.