Giter Site home page Giter Site logo

Comments (8)

airallergy avatar airallergy commented on August 15, 2024 1

In terms of cases by region, if you look at Wales historical data csv file provided on this dashboard, England and Wales are actually on the same page (both using Specimen Date), compared to Scotland, as I mentioned above.

What England and Wales are doing on their dashboards is the first option (shifting the latest specimen date to match the latest reporting date of Scotland) you mentioned, which I think is the most realistic approach to check the latest cases breakdown. To justify, this approach can be regarded as the means to obtain the latest available cases breakdown. This also makes sense considering cases with unknown regions.

But do bear in mind that this approach only makes sense subject to 1. latest data and 2. cumulative data. If other types, either historical or daily, are involved, I see no sensible option of consistency across all nations, unless they unify the publishing standard.

from covid-19-uk-data.

airallergy avatar airallergy commented on August 15, 2024 1

That is because among all the data of a range of specimen dates the gov is receiving every day, only a fairly small number come from yesterday. No expert here, but I guess this means very few tests can have results in merely one day. This is one of the reasons for the daily revision, so the data of 27/04/2020 would be more reasonable if you look at it on 29/04/2020 than on 28/04/2020. Btw, the revision can affect data of over a month ago, not for every region though.

However, I am not quite sure about tom's current daily update process. I mentioned it in #41, but It seems this revision thing hasn't been fully addressed according to your attached data above. The way I deal with this is to simply overwrite the historical data for England and Wales with new ones on a daily basis.

from covid-19-uk-data.

tomwhite avatar tomwhite commented on August 15, 2024

I'm getting the data from the cases CSV file from the PHE dashboard (https://coronavirus.data.gov.uk/). The latest data in there is for 2020-04-26, so that's what appears in this repo. Before I was just getting the latest data and assigning it to the current date, which was not necessarily the correct thing to do.

Interestingly, the the deaths CSV file does have data for 2020-04-27.

from covid-19-uk-data.

airallergy avatar airallergy commented on August 15, 2024

@timday There are now divergences between the date types used by different statistics from varied data sources.

As to positive case numbers of England and Wales, the terminology is Specimen Date, which indicates the date of the first positive specimen of any tested individual in the lab. In contrast, other data, such as death figures aforementioned by @tomwhite, use Reporting Date or similar ones, indicating the date that the data were published by the government after receiving them in batches from the lab, which contain results from many different specimen dates.

In this sense, there is apparently an inconsistency between the corresponding dates of the latest available data, unless they unify them some day. However, this inconsistency might be mitigated depending on what type of data you are looking at. For example, if the cumulative figures, either in total or in breakdown, concern you, the latest specimen date of the cumulative cases in England is essentially the same thing as the latest reporting date of those in Scotland, as far as I understand.

from covid-19-uk-data.

timday avatar timday commented on August 15, 2024

Hmmm.... thanks, interesting.

For the purposes of looking at cases by region across England, Scotland and Wales (and using only days for which data is available for all nations) I'm now wondering whether the "best"/"most realistic" thing to do is either:

  • Add one day to all the England dates, to bring them up in-line with Wales and Scotland.

or

  • Ignore the last day of Wales and Scotland (until England "catches up" the next day).

Not at all clear to me which is more "correct".

It's only the covid-19-cases-uk.csv file I'm looking at, not deaths.

from covid-19-uk-data.

timday avatar timday commented on August 15, 2024

Another thing I notice from charting the England cases data:

Compare the output from
grep 2020-04-27 data/covid-19-cases-uk.csv | grep England | head

2020-04-27,England,E09000003,Barnet,1176
2020-04-27,England,E08000016,Barnsley,607
2020-04-27,England,E09000004,Bexley,597
2020-04-27,England,E08000025,Birmingham,2782
2020-04-27,England,E06000009,Blackpool,413
2020-04-27,England,E08000032,Bradford,796
2020-04-27,England,E09000005,Brent,1330
2020-04-27,England,E06000023,"Bristol, City of",591
2020-04-27,England,E09000006,Bromley,1027
2020-04-27,England,E08000002,Bury,434

with
grep 2020-04-28 data/covid-19-cases-uk.csv | grep England | head

2020-04-28,England,E09000002,Barking and Dagenham,448
2020-04-28,England,E09000003,Barnet,1176
2020-04-28,England,E08000016,Barnsley,608
2020-04-28,England,E06000022,Bath and North East Somerset,203
2020-04-28,England,E06000055,Bedford,424
2020-04-28,England,E09000004,Bexley,597
2020-04-28,England,E08000025,Birmingham,2782
2020-04-28,England,E06000008,Blackburn with Darwen,301
2020-04-28,England,E06000009,Blackpool,413
2020-04-28,England,E08000001,Bolton,732

the places listed in both haven't changed at all (or declined by 1 in Barnsley's case). This seems most unlikely given the general rate of increase previously and it looks more like the data from 27th has simply been "reused" on the 28th.

There's also something new going on with some regions becoming more "gappy" (e.g Isle of Wight); I'm sure I'd have noticed that before as it results in gaps appearing in some of my charts which were continuous lines before.

from covid-19-uk-data.

timday avatar timday commented on August 15, 2024

Just looking at today's update.
The pattern of the last 2 days' numbers often (not actually done a comprehensive survey) being the same continues e.g
grep E08000025 data/covid-19-cases-uk.csv | tail

2020-04-20,England,E08000025,Birmingham,2494
2020-04-21,England,E08000025,Birmingham,2558
2020-04-22,England,E08000025,Birmingham,2621
2020-04-23,England,E08000025,Birmingham,2674
2020-04-24,England,E08000025,Birmingham,2719
2020-04-25,England,E08000025,Birmingham,2757
2020-04-26,England,E08000025,Birmingham,2789
2020-04-27,England,E08000025,Birmingham,2799
2020-04-28,England,E08000025,Birmingham,2801
2020-04-29,England,E08000025,Birmingham,2801

but comparing with the numbers in my previous comment, it can be seen the numbers for the 27th and 28th have been bumped up from 2782 (both) to 2799 and 2801.
So, yes, presumably each regions' case-count curve can be thought of as converging with some "true" number as the data trickles in over time. But for the last day given it seems nothing has arrived yet and the number given is just the previous.
It does perhaps make charts look a bit misleading though... always looking like they've just reached the point of flattening off. Makes me wonder if I should just ditch that last day's datapoint, but it's already one day behind Scotland and Wales.

from covid-19-uk-data.

airallergy avatar airallergy commented on August 15, 2024

Yes these latest data could be misleading, ditching last day's data could be useful to reveal the true trend in a sense.

May I suggest another method if you want to keep the England and Wales historical data consistent with Scotland, which is to concatenate all the latest total numbers in each daily file, i.e. 29/04/2020 cumulative data published on 30/04/2020, 28/04/2020 data on 29/04/2020, etc. Though this can you transform specimen data to reporting ones. tom has archived all the old csv files, which makes it quite easy to do so.

from covid-19-uk-data.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.