Giter Site home page Giter Site logo

tomwhite / covid-19-uk-data Goto Github PK

View Code? Open in Web Editor NEW
162.0 18.0 79.0 139.5 MB

Coronavirus (COVID-19) UK Historical Data

Home Page: http://tom-e-white.com/covid-19-uk-data/

License: The Unlicense

HTML 99.49% Python 0.50% Shell 0.01%
covid-19 coronavirus data dataset daily-counts confirmed-cases historical-data deaths england scotland

covid-19-uk-data's Introduction

COVID-19 UK Historical Data

⚠️ Update: 1 August 2020. This repository is deprecated and is no longer updated. Users are encouraged to move to official upstream data sources which are listed below ⚠️

Data on numbers of tests, confirmed cases, and deaths for coronavirus (COVID-19) in the UK is published by the government, but it is fragmented and not always provided in consistent or machine-friendly formats. Also, in many cases only the latest numbers are available so it's not possible to look at changes over time.

This site collates the historical data and provides it in an easily consumable format (CSV), in both wide and tidy data forms.

Ideally the data publishers will start doing this so this site becomes redundant.

Data files

The following CSV files are available (note they are no longer updated):

  • data/covid-19-cases-uk.csv: daily counts of confirmed cases for (upper tier) local authorities in England, health boards in Scotland and Wales, and local government district for Northern Ireland.
    • Note that prior to 18 March 2020 Wales data was broken down by local authority, not heath board, and prior to 27 March 2020 there were no breakdowns by area for Northern Ireland.
  • data/covid-19-totals-uk.csv: daily counts of tests, confirmed cases, deaths for the whole of the UK
  • data/covid-19-totals-england.csv: daily counts of tests, confirmed cases, deaths for England
  • data/covid-19-totals-northern-ireland.csv: daily counts of tests, confirmed cases, deaths for Northern Ireland
  • data/covid-19-totals-scotland.csv: daily counts of tests, confirmed cases, deaths for Scotland
  • data/covid-19-totals-wales.csv: daily counts of tests, confirmed cases, deaths for Wales
  • data/covid-19-indicators-uk.csv: daily counts of tests, confirmed cases, deaths for the whole of the UK and individual countries in the UK (England, Scotland, Wales, Northern Ireland). This is a tidy-data version of covid-19-totals-*.csv combined into one file.
  • data/daily/*.csv: daily counts, with a separate file for each date and country.

Interpreting the numbers (more information on this DHSC/PHE page, and the PHE dashboard about page)

  • "Tests" are the number of people tested, not the number of samples tested.
  • "Confirmed cases" are the number of people with a positive test.
  • "Deaths" are hospital deaths, so they don't include deaths of people with COVID-19 who died at home for example. (Although this changed in England on 29 April 2020.)

Note that the totals for the UK don't necessarily equal the sum of the totals of the four nations (England, Scotland, Wales, Northern Ireland), due to differences in date reported.

You can use these files without reading the rest of this document.

There is an experimental Datasette instance hosting the data. This is useful for running simple SQL on the data, or exporting in JSON format.

News

  • 1 August 2020. Retired this repo. See discussion here.
  • 2 July 2020. PHE started including Pillar 2 data in England confirmed case numbers. This data is now being included in this repository.
  • 1 July 2020. England UTLA confirmed case data is no longer being included since it doesn't have Pillar 2 tests, which make up the vast majority of tests.
  • 1 July 2020. NI data is no longer being included since the (undocumented) backend API changed again, and the NI Department of Health does not provide a machine-readable alternative. (See 2 June 2020 entry below.)
  • 30 June 2020. With the new Leicester lockdown, media attention around the lack of Pillar 2 data in England has increased. I have added a prominent warning to the top of this README.
  • 2 June 2020. I received a reply from the NI Department of Health to my enquiry about making machine readable downloads available. For this reason I may stop collating NI data in this repository, since the JSON API the code uses is undocumented and changes from time-to-time. See #63.

Mr White

Thank you for your query. Currently, the information on which the dashboard statistics are based is being drawn from live systems and the data is continually being revised. This means that we do not at this time feel it would be appropriate to provide data that is still volatile and is subject to both revision and change.

Regards

Information and Analysis Directorate

  • 28 May 2020. DHSC is now providing a timeseries of testing data, linked to from this DHSC/PHE page.
  • 23 May 2020. DHSC is no longer reporting the number of people tested (daily or cumulative) in Pillar 2, hence it is not possible to give an overall total.
  • 12 May 2020. The PHW dashboard data download link is no longer static - it changes every day, and there is no easy way to retrieve it, since it is dynamically generated in Tableau.
  • 1 May 2020. The NI Department of Health dashboard has been re-instated.
  • 28 April 2020. The NI Department of Health is no longer reporting the number of people tested, just the number of tests.
  • 21 April 2020. The PHA NI dashboard was suspended since it was reporting incorrect data. Test and total confirmed case numbers are being announced on Twitter by @healthdpt. Area breakdowns are no longer being provided.
  • 21 April 2020. The PHW dashboard now has a link to download the data in XLSX format. The URL is dynamically generated however, so it's still not easy to automate the download.
  • 20 April 2020. The PHE dashboard now has stable URLs for its CSV downloads.
  • 18 April 2020. PHA NI launched a dashboard to replace the daily surveillance reports.
  • 15 April 2020. A new dashboard for UK and England was launched, replacing the ArcGIS one. As a part of this change the XLSX/CSV files for daily indicators, and case counts by region and UTLA (in England) are no longer being produced. They have been replaced by CSV files, or - for programmatic access - a JSON feed.
  • 14 April 2020. No per-area case numbers produced for NI, even though it is a weekday (Tuesday). Yesterday was a bank holiday, and no case numbers were produced either.
  • 9 April 2020. The reporting period for case numbers in Wales changed. "For operational reasons, we are moving the point at which we count new cases of Novel Coronavirus (Covid-19) back from 7pm to 1pm. Case numbers on Thursday [9 April] will therefore be lower than usual, and will return to normal on Friday [10 April]."
  • 8 April 2020. Scotland started publishing numbers for people in hospital and intensive care, by health board. They also started reporting numbers that were less 5 as "*".
  • 6 April 2020. Wales published a new interactive dashboard, which gives data for confirmed cases, and testing episodes, broken down by local authority and health board. There is historical data too. Unfortunately there is currently no way of exporting the raw data from the dashboard.
  • 2 April 2020. Scotland reported a more timely process for counting deaths.
  • 29 March 2020. There's a new spreadsheet that includes historical data for the dashboard. This includes cases (by country, English UTLA, English NHS region), deaths (by country), and recovered patients (although this isn't being updated at the time of writing).
  • 27 March 2020. UK daily indicators now include number of deaths for UK, England, Scotland, Wales, and Northern Ireland.
  • 26 March 2020. Northern Ireland's Public Health Agency (PHA) started publishing confirmed cases by Local Government District (LGD) on weekdays.
  • 25 March 2020. The reporting period for number of deaths changed. Previously it was for the 24 hour period starting and ending at 9am. The new period starts and ends at 5pm, and is reported the following afternoon at 2pm. (So the number of deaths reported on 25 March (cumulative total 463) represents the period 9am to 5pm on 24 March.) The testing and case numbers continue to be the 9am period.
  • 24 March 2020. Northern Ireland's Public Health Agency (PHA) started producing a Daily COVID-19 Surveillance Bulletin in PDF form. It contains test numbers (also broken down by Health and Social Care Trust), and case numbers but only on a choropleth map (and broken down by age and gender).
  • 21 March 2020. PHW is back to health board (not LA) breakdowns again, this time it looks permanent.
  • 20 March 2020. PHW is providing LA area breakdowns again, after not doing so for two days.
  • 18 March 2020. PHW is no longer providing LA area breakdowns. "Novel Coronavirus (COVID-19) is now circulating in every part of Wales. For this reason, we will not be reporting cases by local authority area from today. From tomorrow, we will update daily at 12 noon the case numbers by health board of residence."

Data sources

The following sources may include more data than described here. This summary includes only Tests, Confirmed cases and Deaths.

UK

  • Source: UK testing time series (CSV)
    • Tests: number of people tested (Pillar 1 only) by day in UK, England, Scotland, NI; (Pillar 1 and 2) Wales
    • Confirmed cases: number of confirmed cases (Pillar 1 and 2) by day in UK, England, Scotland, Wales, NI
  • Source: UK daily deaths time series (CSV)
    • Deaths: number of deaths by day in UK
  • Source: UK dashboard deaths (CSV) (JSON)
    • Deaths: number of deaths by day in UK, England, Scotland, Wales, NI
  • Charts available on the PHE dashboard
  • Twitter updates: @DHSCgovuk

England

  • Source: UK dashboard cases (CSV) (JSON)
    • Confirmed cases: number of confirmed cases (Pillar 1 and 2) by day in England, regions, UTLAs, LTLAs
  • Charts available on the PHE dashboard
  • Twitter updates: @PHE_uk

Scotland

  • Source: Trends in daily COVID-19 data (XLSX) (CSVs)
    • Tests: number of people tested (Pillar 1, and Pillar 2 since 15 June) by day in Scotland (CSV)
    • Confirmed cases: number of confirmed cases (Pillar 1, and Pillar 2 since 15 June) by day in Scotland (CSV)
    • Deaths: number of deaths by day in Scotland (CSV)
  • Source: COVID-19 data by NHS Board (XLSX) (CSV)
    • Confirmed cases: number of confirmed cases (Pillar 1, and Pillar 2 since 15 June) by day by health board
  • See also statistics.gov.scot
  • Charts available on the PHS dashboard
  • Twitter updates: @scotgov

Wales

  • Source: Data download (XLSX)
    • Tests: number of people tested (Pillars 1 and 2) by day by local authority
    • Confirmed cases: number of confirmed cases (Pillars 1 and 2) by day by local authority
    • Deaths: number of deaths by day in Wales; number of cumulative deaths by health board
  • More information and charts available on the PHW dashboard
  • Twitter updates: @PublicHealthW

Northern Ireland

  • Source: No machine-readable dataset available
  • Charts available on the Department of Health dashboard
    • Includes number of people tested and confirmed cases for Pillar 1, and Pillar 2 since 24 June.
  • Twitter updates: @healthdpt

Local Authority and Health Board metadata

Related projects/datasets

Wishlist

Here are my suggestions for how to improve the data being published by public bodies.

The short version: publish everything in CSV format, and include historical data!

  • Public Health Agency, Northern Ireland: Provide a machine readable version of the historical data on the dashboard.

The reporting systems have changed a lot since the outbreak began, and overall they have improved, both in the amount of information being published, and the ease of access of machine-readable datasets. (Public Health Scotland provides all their data in XLSX and CSV format, including historical data. Public Health Wales provides a XLSX spreadsheet with historical data.)

Tools

There are command line tools for downloading, parsing, and processing the data. They rely on Python 3.

To install the tools, create a virtual environment, activate it, then install the required packages:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Daily workflow

A sqlite DB is now used to store and aggregate intermediate data. The CSV files remain the point of record.

The crawl tool will see if the reseouce (webpage, date file) has already been downloaded, and if it hasn't download it if it's available for the specified date (today). (If not available the tool will exit.) If available, the tool will then extract the relevant information from it and update the sqlite database. This means that you can just run crawl until it finds new updates.

The convert_sqlite_to_csvs tool will extract the data from sqlite and update the CSV files.

The updates tool runs crawl then convert_sqlite_to_csvs, and issues interactive prompts for if you want to commit the changes to git.

There is also a crawl_all tool (and corresponding update_all tool) that uses machine-readable sources to update all historical data for that source. This is not available for all sources yet.

./tools/update_all.sh phw
./tools/update_all.sh phs
./tools/update.sh NI
./tools/update.sh UK
./tools/update_all.sh phe

The equivalent done manually (just for Wales):

DATE=$(date +'%Y-%m-%d')
./tools/crawl.py $DATE Wales
./tools/convert_sqlite_to_csvs.py
git add data/; git commit -am "Update for $DATE for Wales"

NI updates are being done manually since there are currently no machine-readable sources.

# edit covid-19-totals-northern-ireland.csv and add tests/cases/deaths
./tools/convert_totals_to_indicators.py
csvs-to-sqlite --replace-tables -t indicators -pk Date -pk Country -pk Indicator data/covid-19-indicators-uk.csv data/covid-19-uk.db
./tools/convert_sqlite_to_csvs.py
git commit -a # "Update for xxx for NI from https://twitter.com/healthdpt"

Updates are not always made at a consistent time of day, so the following command can be run continuously in a terminal to check for updates every 10 minutes. The -b option makes it beep if there is a new update.

watch -n 600 -b ./tools/crawl.py

Check data consistency

./tools/check_indicators.py
./tools/check_totals.py

Manual overrides

Sometimes it's necessary to fix data by hand. In this case the following tools are useful:

Repopulate the sqlite database from the CSV files:

rm data/covid-19-uk.db
csvs-to-sqlite --replace-tables -t indicators -pk Date -pk Country -pk Indicator data/covid-19-indicators-uk.csv data/covid-19-uk.db
csvs-to-sqlite --replace-tables -t cases -pk Date -pk Country -pk AreaCode -pk Area data/covid-19-cases-uk.csv data/covid-19-uk.db

covid-19-uk-data's People

Contributors

actions-user avatar desholmes avatar j4mie avatar ryanunitas avatar tomwhite avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covid-19-uk-data's Issues

Data Processing Tools

I'm doing some stats analysis for my local Lib Dems and have been following the stats on Arcgis for some time, I'm also a software developer in the engineering and scientific community but I've been furloughed so I've time on my hands.

I picked up the new Covid 19 stats today from the new PHE webpage (disappointing) but the data is 2 days out of date and completely unstructured. I started writing a tool to restructure it this afternoon and then found your Git (to see if anyone else was useing the dataset). Would my tool be useful to the community? It's written using LabVIEW so you'll need to install a free Runtime to use it.

Missing Data

Hey, thanks for putting this together - it's really useful to have the historical data, not just the current day's figures.

I noticed a couple of discrepancies in the data:

  • The daily stats for Barking and Dagenham are missing for the last couple of days.
  • data/covid-19-cases-uk.csv is missing the data for English areas for 21st March

For the first of these, I noticed that Barking is the first Area on the csv file downloaded from arcgis.com so wondered if there is something stripping the first data row? I don't have the data for 22/3, but today's value is 42 if that helps.

What is areaCode?

Hi
thanks for the work.

I'm wondering what is areadCode displayed on https://coronavirus.data.gov.uk/#countries ?

Example:

"areaCode": "E06000049", "areaType": "Upper tier local authority", "areaName": "Cheshire East",

I mean literally, how would we get the Longitude and the Latitude of that placed based on that source?

Thanks a lot

Crawl data from new PHE dashboard

The new PHE dashboard at https://coronavirus.data.gov.uk/ has data for

  • UK, England, Scotland, Wales, NI cases
  • UK, England, Scotland, Wales, NI deaths
  • English region cases
  • English UTLA cases

Plus historical data for all of these.

There are CSV links, but the page is dynamically generated HTML. So it might be easier to pull data from the underlying JSON feed, see https://github.com/PublicHealthEngland/coronavirus-dashboard/blob/master/src/hooks/useLoadData.js.

Also discussed here: https://twitter.com/DavidBeavan/status/1250454570070933504

Totals mismatch between UK and countries breakdown

Hello,

I've been reusing your great work on UK data within my dashboard here: https://boogheta.github.io/coronavirus-countries/#country=UK

While adding Tests data since you completed it across all countries, I encountered something that looks to me like an error but I might read things wrong:
When looking at confirmed cases for whole UK https://github.com/tomwhite/covid-19-uk-data/blob/master/data/covid-19-totals-uk.csv and for just England https://github.com/tomwhite/covid-19-uk-data/blob/master/data/covid-19-totals-england.csv, there appears to be greater values for just England than for the whole UK until April 10th.

I realised it because I'm completing England figures by doing UK - Wales - Scotland - Eire whenever an England figure is missing since others are all complete but maybe I'm misunderstanding something?

Number of tests by pillars

Hi, thank you very much for the amazing work! I was wondering if it is possible to add a break of the total number of tests (and confirmed cases as well) by the testing "pillars" strategy. Right now, the government webpage https://www.gov.uk/guidance/coronavirus-covid-19-information-for-the-public reports that daily, but I don't know how far back in time this was true, and if it is possible to retrieve that info from the past. Also, I don't know if that kind of data is released at the Englad/Wales/Scotland/NI level.

Do you think this is possible, or do you know of any source collecting this kind of data?

Thanks!

Norway and Sweden data...

Slightly off topic unless you have any interest in extending your project...

My brother lives in Norway so I was curious, found this has nice map for anyone curious.
https://github.com/jalgroy/covid19-norway-map

Interestingly there is a nice map on wikipedia for Sweden
https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_Sweden

source?:

https://www.arcgis.com/sharing/rest/content/items/b5e7488e117749c19881cce45db13f7e/data

Statistikdatum,Totalt_antal_fall,Blekinge,Dalarna,Gotland,Gävleborg,Halland,Jämtland_Härjedalen,Jönköping,Kalmar,Kronoberg,Norrbotten,Skåne,Stockholm,Sörmland,Uppsala,Värmland,Västerbotten,Västernorrland,Västmanland,Västra_Götaland,Örebro,Östergötland
2/4/20,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/5/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/6/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/7/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/8/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/9/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/10/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/11/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/12/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/13/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/14/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/15/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/16/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/17/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/18/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/19/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/20/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/21/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/22/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/23/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/24/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/25/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2/26/20,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
2/27/20,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
2/28/20,8,0,0,0,0,0,0,1,0,0,0,0,2,0,2,0,0,0,0,3,0,0
2/29/20,3,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,2,0,0
3/1/20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3/2/20,5,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,3,0,0
3/3/20,13,0,0,0,0,0,0,0,0,0,0,1,10,0,0,0,0,0,0,2,0,0
3/4/20,29,0,0,0,0,0,0,0,0,0,0,7,21,0,0,0,0,0,0,1,0,0
3/5/20,25,0,0,0,0,0,0,0,0,0,0,0,22,0,2,0,0,0,0,1,0,0
3/6/20,60,0,0,0,2,0,0,0,0,0,0,8,36,0,1,11,0,0,0,1,1,0
3/7/20,33,0,0,0,0,0,0,0,0,0,0,5,21,0,1,0,0,0,0,5,1,0
3/8/20,46,0,0,0,0,1,0,2,0,0,0,0,29,0,1,0,0,0,0,11,2,0
3/9/20,101,0,0,0,0,4,0,6,0,0,1,3,64,1,0,7,0,0,0,15,0,0
3/10/20,98,1,0,0,0,1,0,0,1,1,0,34,26,0,4,3,8,6,0,13,0,0
3/11/20,196,6,1,2,1,16,3,16,2,7,4,37,32,6,4,2,0,0,0,57,0,0
3/12/20,151,2,3,0,2,9,5,7,2,2,0,32,42,3,11,4,1,3,0,19,3,1
3/13/20,152,0,1,0,1,9,3,4,0,3,1,42,31,6,10,3,0,2,5,19,2,10
3/14/20,71,0,0,0,1,0,3,0,2,1,0,25,18,1,4,1,3,0,0,5,0,7
3/15/20,69,1,0,0,0,2,7,4,0,1,0,4,17,4,0,1,1,0,0,18,0,9
3/16/20,83,0,0,0,2,1,1,2,0,0,0,3,34,12,2,2,2,1,7,6,0,8
3/17/20,120,1,3,1,4,3,0,1,0,1,1,6,35,5,5,1,4,1,6,13,16,13
3/18/20,145,1,2,1,2,2,4,2,1,1,0,8,58,0,17,1,3,1,0,10,3,28
3/19/20,143,0,2,1,0,2,1,2,1,1,1,2,66,5,5,0,2,0,1,14,9,28
3/20/20,180,0,5,0,5,5,3,3,1,0,5,5,84,4,1,2,5,0,2,23,5,22
3/21/20,134,0,0,0,4,3,14,4,1,0,3,5,71,6,2,1,0,0,0,8,0,12
3/22/20,117,0,5,0,0,1,9,1,1,0,3,3,59,11,5,1,1,0,0,4,0,13
3/23/20,182,0,9,0,3,4,0,2,3,0,5,7,99,2,8,2,0,2,3,9,6,18
3/24/20,230,0,9,0,5,0,4,5,1,2,6,5,105,14,11,3,3,2,4,10,11,30
3/25/20,314,3,13,1,7,7,2,7,2,1,5,13,154,37,15,0,4,2,3,19,8,11
3/26/20,286,0,8,4,5,9,7,9,6,2,3,7,132,16,12,0,3,3,5,20,6,29
3/27/20,366,2,15,1,9,3,3,15,5,4,4,10,176,26,20,1,2,2,11,18,6,33
3/28/20,301,0,6,0,12,6,8,9,1,2,4,2,147,9,7,3,5,2,3,25,8,42
3/29/20,281,4,10,0,11,2,2,8,1,4,2,3,150,4,11,1,1,9,0,15,3,40
3/30/20,413,0,9,0,10,5,2,15,3,1,5,5,169,60,21,1,7,6,23,27,17,27
3/31/20,475,1,23,1,14,7,1,17,2,5,6,7,209,49,15,0,8,13,10,29,11,47
4/1/20,486,5,19,0,30,4,0,13,5,1,5,8,206,49,24,2,5,2,11,29,5,63
4/2/20,554,3,6,0,17,9,4,32,5,7,8,8,217,34,28,0,12,1,18,47,28,70
4/3/20,601,1,20,2,16,12,2,29,2,2,6,24,246,59,38,1,17,3,27,46,21,27
4/4/20,357,4,18,0,12,2,2,15,1,2,3,12,129,17,11,1,17,6,14,30,3,58
4/5/20,341,1,7,0,7,3,0,12,3,2,2,6,172,27,9,1,9,2,7,30,0,41
4/6/20,390,0,16,0,12,10,4,10,5,3,10,7,130,18,31,2,5,4,18,54,12,39
4/7/20,739,1,28,0,16,13,5,23,4,14,17,24,243,42,37,7,14,10,46,64,74,57
4/8/20,656,2,28,1,17,9,8,19,2,7,12,15,271,33,29,1,12,5,23,68,37,57
4/9/20,605,0,30,1,18,11,8,21,1,5,5,11,229,37,27,4,12,4,24,100,4,53
4/10/20,123,0,0,0,0,0,0,0,0,0,0,0,96,0,0,0,0,0,2,8,0,17

02/05/2020 testing number typo

In today's update for covid-19-tests-uk.csv, 2020-05-02,UK,105937,1129907,63,667,825946, aka row 100, has a redundant comma in 63667. might be a glitch during the data parsing process.

Couple of area codes wrong in covid-19-cases-uk.csv

Noticed a couple of oddities in the file in the course of processing it. Not looked at the details of how you're pulling the data at all, so I've no idea if this is an "upstream" issue you can do nothing about (and even if it was, whether your tools should be aiming to clean such things up or just present things "as is"). I'm looking at the 2020-03-22 update of covid-19-cases-uk.csv.

Both Orkney and Lothian have area code S08000024. But Orkney should be S08000025.

2020-03-02,Scotland,S08000024,Lothian,0
2020-03-02,Scotland,S08000024,Orkney,0

Both Tayside and Western Isles have area code S08000030. But Western Isles should be S08000028.

2020-03-01,Scotland,S08000030,Tayside,0
2020-03-01,Scotland,S08000030,Western Isles,0

I note there are only 5 records for Orkney and Western Isles (2020-03-01 - 2020-03-05) c.f the full 22 for Tayside and Lothian.

Easily enough fixed-up when importing the data but thought it worth bringing to your attention.

Only new death data in Covid-19-totals-uk.csv

Since the change by UK Gov so that reported death numbers now including more than just those for confirmed cases in hospital it looks like you are missing the tests and confirmed cases numbers in this file.

Spike in cases for mid April

Hi,
Firstly, thanks for all your hard work on this, it's a valueable resource.

Whilst visualising the data I noticed a large spike for cases around April 11th - there seem to be in excess of 8,000 new cases on that day. This doesn't appear to be reflected in any of the other datasets that I have seen, such as Wikipedia or BBC for example. Comparing the data to Wikipedia's (which they also say they have taken from the government's website) there seems to be small anomolies round this time and then a large jump for April 11th. Is this an error or is there another explanation - the new cases for each country don't seem to add up to anything like this number either?
Many thanks,
Tony.

England cases breakdown

The England cases breakdown are one-day shifted (e.g. the data of 16/04/2020 are actually those of 15/04/2020).

Also the breakdown data from the new dashboard can date back to day 1 of outbreak, which seems to be constantly revised on a daily basis. And the breakdown of Hackney and City of London is provided throughout the whole time series. It seems reasonable to replace the England breakdown with these new data, instead of attaching daily ones to the existing database.

Potential wrong data import

For 26.03.2020 there seems to be a problem on data import, since Swansea Bay and Powys data are switched. Please review the import process...
Also, by chance, any idea if someone has created a population dataset or lookup for the reported areas that I can use?
I would like to use your data into my data visualization, if I can match the data with proper population figures:
https://covid19viz.github.io

unplotted cases

Tom currently this is unplotted data.

I am not sure what to do about

  • "Resident outside Wales"
  • "To be confirmed"
  • "Unknown"
  • "Awaiting confirmation"
    any suggestions, do you think it would be worth having a Clean separate csv that filters out complex unattributable data?

I need to add some new locations to my map, but I don't yet have an outline for them..
Derry and Strabane
North Down and Ards

Can you provide more details on "12 Mar, England 491 ill" & "20 Mar, England 3384 ill", are these summaries or data in additional to regions. If they are summaries they don't belong in the file?

"To be confirmed" and "Awaiting Confirmation" should be merged to a single name, need to be more consistant than gov?

Perhaps best to filter out the ugly data from the main source and provide a supplemental source. Obviously I can filter out problem data but I think it would be better to remove them from the main source so aspects like change of name can be focused on.

I was thinking of 3D render with simple extrusion per day using my trilateral2 webgl library do you think that would be useful?

Could we perhaps create a team between interested parties to release an app to gather more data direct from public, I could definitely create mobile phone graphics applications with haxe that would run on android, iphone ... with some 3D with assistance on mobile integration... all works in theory but apple especially I don't have phone to test, I don't have funds for servers, buying domain names paying for apple dev licences, and promotion etc.. ?

If your free weekend we could have a group skype or similar tech meetup with interested parties, the gov data is very limited.

- unplotted
5 Mar, awaiting clarification 8 ill,
7 Mar, awaiting clarification 14 ill,
8 Mar, Awaiting confirmation 20 ill,
9 Mar, Awaiting confirmation 26 ill,
10 Mar, Awaiting confirmation 15 ill,
12 Mar, England 491 ill,
17 Mar, Resident outside Wales 2 ill,
20 Mar, England 3384 ill,
20 Mar, To be confirmed 1 ill,
20 Mar, Resident outside Wales 2 ill,
21 Mar, To be confirmed 1 ill,
21 Mar, Resident outside Wales 2 ill,
22 Mar, To be confirmed 1 ill,
22 Mar, Resident outside Wales 3 ill,
23 Mar, To be confirmed 1 ill,
23 Mar, Resident outside Wales 3 ill,
24 Mar, To be confirmed 2 ill,
24 Mar, Resident outside Wales 3 ill,
25 Mar, To be confirmed 4 ill,
25 Mar, Resident outside Wales 4 ill,
26 Mar, Unknown 7 ill,
26 Mar, Derry and Strabane 8 ill,
26 Mar, North Down and Ards 22 ill,
26 Mar, Unknown 7 ill,
26 Mar, Resident outside Wales 5 ill,
26 Mar, To be confirmed 7 ill,
27 Mar, Derry and Strabane 9 ill,
27 Mar, North Down and Ards 25 ill,
27 Mar, Unknown 7 ill,
27 Mar, Resident outside Wales 6 ill,
27 Mar, To be confirmed 13 ill,
28 Mar, Resident outside Wales 6 ill,
28 Mar, To be confirmed 18 ill,
29 Mar, Resident outside Wales 10 ill,
29 Mar, To be confirmed 22 ill,
30 Mar, Derry and Strabane 23 ill,
30 Mar, North Down and Ards 52 ill,
30 Mar, Unknown 22 ill,
30 Mar, Resident outside Wales 12 ill,
30 Mar, To be confirmed 24 ill,
31 Mar, Derry and Strabane 24 ill,
31 Mar, North Down and Ards 61 ill,
31 Mar, Unknown 29 ill,
31 Mar, Resident outside Wales 13 ill,

Publish case numbers for Wales by Local Authority or Health Board?

The new PHW dashboard at https://public.tableau.com/profile/public.health.wales.health.protection#!/vizhome/RapidCOVID-19virology-Public/Headlinesummary breaks down case numbers by local authority again.

To summarize:

  • Before 21 March, cases numbers were by LA.
  • From 21 March to 5 April, case numbers were by HB.
  • From 6 April, case numbers were by LA.

Unfortunately we don't have LA case numbers for the period 21 March to 5 April (unless someone knows a source).

I'm thinking of publishing LA case numbers again. Consumers of the data can then either

  • Aggregate at the HB level to show numbers for the whole time period with no gaps.
  • Use the finer-grained LA number, but with a two week gap (from 21 March to 5 April). Depending on the application this may be OK - e.g. show gaps in the visualization.

Thoughts?

Stability of CSV files

Hi @tomwhite , first of all thanks so much for this -- awesome work! I am now using your CSV files to add region-level data to my Open COVID-19 project.

I am currently pulling the CSV files from the GitHub Raw cache (like this one) and I wanted to understand how stable these files are, whether you are likely to change the name, location, format, etc. In case of any changes, how can I be notified so I can update my scripts appropriately?

Odd Glasgow numeric formatting on 2020-04-12

Note the last record:

$ grep Glasgow data/covid-19-cases-uk.csv | tail
2020-04-03,Scotland,S08000031,Greater Glasgow and Clyde,779
2020-04-04,Scotland,S08000031,Greater Glasgow and Clyde,851
2020-04-05,Scotland,S08000031,Greater Glasgow and Clyde,931
2020-04-06,Scotland,S08000031,Greater Glasgow and Clyde,984
2020-04-07,Scotland,S08000031,Greater Glasgow and Clyde,1094
2020-04-08,Scotland,S08000031,Greater Glasgow and Clyde,1166
2020-04-09,Scotland,S08000031,Greater Glasgow and Clyde,1251
2020-04-10,Scotland,S08000031,Greater Glasgow and Clyde,1314
2020-04-11,Scotland,S08000031,Greater Glasgow and Clyde,1387
2020-04-12,Scotland,S08000031,Greater Glasgow and Clyde,"1,449"

Seems a bit odd to suddenly switch the way numbers are formatted. Again, nothing that can't be dealt along with the 1 to 4 and NaN type things, and it's perfectly well formed CSV... but it does seem a bit odd when that region has never done the comma thousands numeric formatting thing before (and none of the other regions with counts over 1000 does either I think).

Oddball NaN entry for 2020-04-08,Scotland,S08000028,Western Isles

Not sure whether it's just being true to the upstream or something else going wrong somewhere. Don't think I've seen any NaNs before in this stuff. Different enough from the other "to be confirmed" / "unknown location" type things to be worth remarking on anyway:

$ grep S08000028 data/covid-19-cases-uk.csv 
2020-03-01,Scotland,S08000028,Western Isles,0
2020-03-02,Scotland,S08000028,Western Isles,0
2020-03-03,Scotland,S08000028,Western Isles,0
2020-03-04,Scotland,S08000028,Western Isles,0
2020-03-05,Scotland,S08000028,Western Isles,0
2020-04-01,Scotland,S08000028,Western Isles,3
2020-04-02,Scotland,S08000028,Western Isles,3
2020-04-03,Scotland,S08000028,Western Isles,3
2020-04-04,Scotland,S08000028,Western Isles,3
2020-04-05,Scotland,S08000028,Western Isles,4
2020-04-06,Scotland,S08000028,Western Isles,4
2020-04-07,Scotland,S08000028,Western Isles,4
2020-04-08,Scotland,S08000028,Western Isles,NaN

Oh just noticed there's another one too

$ grep S08000025 data/covid-19-cases-uk.csv 
2020-03-01,Scotland,S08000025,Orkney,0
2020-03-02,Scotland,S08000025,Orkney,0
2020-03-03,Scotland,S08000025,Orkney,0
2020-03-04,Scotland,S08000025,Orkney,0
2020-03-05,Scotland,S08000025,Orkney,0
2020-04-01,Scotland,S08000025,Orkney,2
2020-04-02,Scotland,S08000025,Orkney,2
2020-04-03,Scotland,S08000025,Orkney,2
2020-04-04,Scotland,S08000025,Orkney,4
2020-04-05,Scotland,S08000025,Orkney,4
2020-04-06,Scotland,S08000025,Orkney,4
2020-04-07,Scotland,S08000025,Orkney,4
2020-04-08,Scotland,S08000025,Orkney,NaN

Dataset; covid-19-uk-data/data/covid-19-cases-uk.csv has non numerical values in some of the total cases field

@tomwhite - Tom is there any chance you can change the '1 to 4' values in the Total Cases field? Your dataset is the best one I've found (I've been grabbing the NHSEngland from the ArcGis dashboard, but it doesn't include Wales and Scotland), so would like to reference yours instead - I work for Qlik and we are helping out where we can - our software allows for a huge amount of analysis in to the data and your set is perfect. Any additional mapping fields etc.. and data I'm working on happy to share, and the dashboard.

England is not the UK

It’s great that you’re doing this, but I notice that the links from the README for England are to the UK csv, and the England csv is pretty sparse - you could generate it automatically from UK minus other countries if you don’t have any other source... ?

Area name not constant for area code E09000012 and E06000052

Minor point easily dealt with, and I'm not sure where it's being introduced, but there's a couple of areas where the names are a bit unstable earlier on.

$ grep E09000012 data/covid-19-cases-uk.csv 
2020-03-05,England,E09000012,Hackney,0
2020-03-07,England,E09000012,Hackney and City of London,2
2020-03-08,England,E09000012,Hackney,2
2020-03-09,England,E09000012,Hackney and City of London,2
2020-03-10,England,E09000012,Hackney and City of London,3

(and then always "Hackney and City of London")

$ grep E06000052 data/covid-19-cases-uk.csv 
2020-03-05,England,E06000052,Cornwall,1 to 4
2020-03-07,England,E06000052,Cornwall and Isles of Scilly,2
2020-03-08,England,E06000052,Cornwall,3
2020-03-09,England,E06000052,Cornwall and Isles of Scilly,4
2020-03-10,England,E06000052,Cornwall and Isles of Scilly,4

(and then always "Cornwall and Isles of Scilly")

Don't know if it reflects the upstream sources not actually having data for the "City of London" and "Isles of Scilly" that day or not.

Lower-upper tier local authority area codes for Scotland?

Thanks for making this excellent resource available.

When trying to merge it with other more fine-grained regional data, it quickly becomes apparent that I need to get to grips with the upper and lower tier regional authority codes (apologies if my terminology is off... I'm quite new to this).

There is an promising table at https://geoportal.statistics.gov.uk/datasets/lower-tier-local-authority-to-upper-tier-local-authority-december-2017-lookup-in-england-and-wales ... but that only covers England and Wales and I am having no luck finding anything similar for Scotland.

Anyone any idea where to find such a resource?

(For example covid-19-cases-uk.csv here has S08000024,Lothian but other more fine-grained data at lower-tier level will have records for all of S12000010,East Lothian & S12000040,West Lothian & S12000019,Midlothian & S12000036,City of Edinburgh instead. Easy enough to resolve this for England and Wales using the ONS geoportal data mentioned above but Scotland a problem until I can find something similar).

Thank you!

Thank you all for the work that has gone into compiling the government data and making it easily machine readable. It is proving invaluable for contingency planning at a local level and comparing the outbreak here in the UK to those elsewhere. I just wanted to let you know that by contributing you are providing a valuable service to your country and it is really appreciated. I'm new to git hub but have been able to pull the data automatically into a basic google sheets file for local level planning. That is the kind of user you are helping immensely. Keep up the good work!

No Data for Indicators for England for 08/04/2020

Just to let you know the English figures haven't come in for yesterday - also there is some discrepancy with overall English figures, I'll have a look as I think I know what it is, but need to check first :-)

Tests & Tested

Hi Tom, I was trying to locate and compare the historical data for the number tests issued (Tests) against the number of Tested samples. Yesterdays figures were 81,611 Tests with 54,575 Tested.
I can see the 54,575 in the CSV file but not the Tests column?

I have experimented with rendering your csv data.

I have experimented with rendering your csv data with canvas, you can read details of setting up if you wish to experiment with the code, you can view current online test and see the simple visual from the readme:

https://github.com/nanjizal/covid19/blob/master/README.md

Obviously more interesting renders could be created, but had to find lat/long conversion code which I ported, test and fix my low level csv parser and find lat/long data for the e number areas, related links are commented within code. If you want direct link to your repo data on readme please let me know, or if interested in further collaboration. I can also with more effort render in webgl with some depth or with svg, since I use haxe can also create mobile/desktop versions, using openfl, kha or heaps time permitting.

Use today's date by default

You say in your TODO you want to add a default date.
Why not adding an argument parser and add a calculation for a date instead of using sys.argv[i]?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.