Comments (3)
Yes, 100% feel you. Currently they aren't included as they don't pass the threshold for "at least 2% of cases in at least 40% of the 2-week periods tracked by CoVariants since the end of 2020." This is to try and ensure that the sequencing is a representative sample of the country - if sequencing numbers are too low, my concern is that they shouldn't really be translated over to cases.
Do you know how far off India might be from that?
from covariants.
I think your criteria are suboptimal. To ensure statistical robustness, it's more important to have large counts - rather than a large proportion of cases.
It doesn't really matter if it's 1 in 1000 or 1 in 50, in India your current criteria mean a country with good robustness drops out, but a country that reports low number of cases because of lack of PCR gets included even if it sequences only 5 a month.
I'd propose criteria:
Show all countries but only those periods that match either: at least 50 sequences in a period or 1% of reported cases
I'd worry more about lack of absolute count of sequences than a lack of share. Sure it may be geographically skewed, but it's better than nothing.
I also don't quite understand why you're showing all the data even if coverage may be bad in 60% of time intervals.
I'd only show time intervals in which I trust the resolution. So that may mean not showing what happened in between but that's ok - since without sequencing we don't know what happened there. See my criteria above, they should make major sequencing countries to show data for all periods - some with less sequencing may show only for periods where they sequenced (that's good) and small countries may also not show in some periods if they didn't sequence much because of not many cases (e.g. China, Iceland, Australia etc.).
Right now the criteria seem to produce not ideal results:
- Show data in countries where almost nothing was sequenced (Iceland, Australia)
- Don't show data in countries where we have reliable estimates of proportion simply because they don't sequence a huge share. 2% is a very large share. If you want a lower bar, I'd make it 1 in 1000 or 1 in 10000. But ideally just counts. After all, noise is to a large extent a function of the number of sequences, not the proportion of all sequenced cases.
from covariants.
Yes, the criteria aren't perfect, and there's trade-offs in getting good with bad. Generally I try to err on the side of not showing things that may be misleading, but it's far from a perfect balance.
Unfortunately to change this I'd need to get a good sense of how it's working what's included & not. The scripts that control/help this currently are the script that generates these plots currently and the script that lets one explore different thresholds (first, you have to run the first script with very low thresholds to include every/almost every country).
I'm happy to take PRs, but on my own development front, apart from exploring threshold tweaks to the existing code, I'm afraid this is pretty low-priority at the moment - CoV always unfortunately comes last! (And even within CoV, this isn't at the top!) But happy to evaluate PRs & ideas.
Also, on this point:
Sure it may be geographically skewed, but it's better than nothing.
I'm not sure I agree. I think it can be very misleading if we do think all sequences are coming from one place, yet plotting it as if it's the whole country. Though this is more complicated to avoid, I would ideally want to avoid this.
from covariants.
Related Issues (20)
- How to show the proportion for each clade? HOT 1
- Optional "Three letter" Amino-Acid badges for beginners HOT 11
- Please add TSV file format for the USA states variant tables file HOT 1
- Feature Request: Full AA sequence of variant HOT 1
- Adding shared mutations outside of Spike HOT 1
- Total cases inconsistent with JHU report HOT 2
- Incorporate total number of sequences over time
- Shared aa substitutions vs. shared nucleotide mutations HOT 3
- high number of 'non-greek' variants HOT 1
- Color matching between variants page and the variant graphs
- Show recombinants in left-hand menu and Cases graphs
- Support well-known variant labels, such as QB.1 HOT 1
- On hover window is very confusing HOT 2
- clade 23A HOT 4
- Prepare for Changes in Clade Naming in Nextclade/Nextstrain HOT 2
- XBB1.5 23A HOT 3
- How to interpret mutations on spike pos. 493 HOT 2
- Add download button HOT 2
- Covid-19 cases graphs are displayed incorrectly HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from covariants.