Giter Site home page Giter Site logo

Comments (3)

emmahodcroft avatar emmahodcroft commented on June 2, 2024

Yes, 100% feel you. Currently they aren't included as they don't pass the threshold for "at least 2% of cases in at least 40% of the 2-week periods tracked by CoVariants since the end of 2020." This is to try and ensure that the sequencing is a representative sample of the country - if sequencing numbers are too low, my concern is that they shouldn't really be translated over to cases.

Do you know how far off India might be from that?

from covariants.

corneliusroemer avatar corneliusroemer commented on June 2, 2024

I think your criteria are suboptimal. To ensure statistical robustness, it's more important to have large counts - rather than a large proportion of cases.

It doesn't really matter if it's 1 in 1000 or 1 in 50, in India your current criteria mean a country with good robustness drops out, but a country that reports low number of cases because of lack of PCR gets included even if it sequences only 5 a month.

I'd propose criteria:
Show all countries but only those periods that match either: at least 50 sequences in a period or 1% of reported cases

I'd worry more about lack of absolute count of sequences than a lack of share. Sure it may be geographically skewed, but it's better than nothing.

I also don't quite understand why you're showing all the data even if coverage may be bad in 60% of time intervals.

I'd only show time intervals in which I trust the resolution. So that may mean not showing what happened in between but that's ok - since without sequencing we don't know what happened there. See my criteria above, they should make major sequencing countries to show data for all periods - some with less sequencing may show only for periods where they sequenced (that's good) and small countries may also not show in some periods if they didn't sequence much because of not many cases (e.g. China, Iceland, Australia etc.).

Right now the criteria seem to produce not ideal results:

  • Show data in countries where almost nothing was sequenced (Iceland, Australia)
  • Don't show data in countries where we have reliable estimates of proportion simply because they don't sequence a huge share. 2% is a very large share. If you want a lower bar, I'd make it 1 in 1000 or 1 in 10000. But ideally just counts. After all, noise is to a large extent a function of the number of sequences, not the proportion of all sequenced cases.

from covariants.

emmahodcroft avatar emmahodcroft commented on June 2, 2024

Yes, the criteria aren't perfect, and there's trade-offs in getting good with bad. Generally I try to err on the side of not showing things that may be misleading, but it's far from a perfect balance.

Unfortunately to change this I'd need to get a good sense of how it's working what's included & not. The scripts that control/help this currently are the script that generates these plots currently and the script that lets one explore different thresholds (first, you have to run the first script with very low thresholds to include every/almost every country).
I'm happy to take PRs, but on my own development front, apart from exploring threshold tweaks to the existing code, I'm afraid this is pretty low-priority at the moment - CoV always unfortunately comes last! (And even within CoV, this isn't at the top!) But happy to evaluate PRs & ideas.

Also, on this point:

Sure it may be geographically skewed, but it's better than nothing.

I'm not sure I agree. I think it can be very misleading if we do think all sequences are coming from one place, yet plotting it as if it's the whole country. Though this is more complicated to avoid, I would ideally want to avoid this.

from covariants.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.