Giter Site home page Giter Site logo

sledilnik / data Goto Github PK

View Code? Open in Web Editor NEW
20.0 8.0 43.0 209.43 MB

Collecting and organising COVID-19 data for Slovenia as they come in from various sources

Home Page: https://covid-19.sledilnik.org/en/data

License: GNU Affero General Public License v3.0

Shell 0.19% Python 2.26% Perl 0.15% HTML 97.40%
covid19-data slovenia slovenija covid19-tracker covid-19-tracker covid-data-project covid-dataset covid-19 covid19

data's Introduction

Slovenia COVID-19 Data Collection - Sledilnik.org DOI

Vaccination update Vaccination OPSI update Lab tests update EPISARI OPSI data update Sewage OPSI data update

Disabled/obsolete workflows: GSheets update OstaniZdrav update Schools update Sewage update

Visualized at COVID-19 Sledilnik Home Page

Collecting and organising data as they come in from various sources.

This repository is for organising our collaboration better: wikis, issues etc.

Python 3.8+ is required to run scripts in this repo.

Vaccination update depends on py-cepimose with a specific subset of Tests for Sledilnik.org

How to run scripts


In this folder run:

  1. python3 -m venv venv or virtualenv -p python3.8 venv
  2. source venv/bin/activate
  3. pip install -r requirements.txt
  4. export COVID_DATA_PATH=<the location of the COVID-DATA folder>
  5. python update.py or python transform/nijz_daily.py or python transform/nijz_weekly.py...

Updating data

Most GitHub:octocat: workflows are scheduled to be ran periodically and can also be triggered manually on the Actions page.

Changelog

2020-04-28

  • stats.csv: rename cases.active.todate to cases.active issue #11

2020-04-26

  • stats.csv: added tests.regular.* and tests.ns-apr20.* to separate tests for National Survey April 2020
  • stats.csv: changed tests.positive.* to report positive actual tests (slightly higher than cases.confirmed.*)

2020-04-25

  • dict-municipality.csv: fixed region for Gornja Radgona (was lj - is ms now)

2020-04-20

  • dict-age-groups.csv: age groups with population (total, male, female)

2020-04-18

  • dict-retirement_homes.csv: added tax-id for each retirement home

data's People

Contributors

dependabot[bot] avatar igzebedze avatar ikolar avatar jurijbajzelj avatar kesma01 avatar lukarenko avatar majazaloznik avatar mihamarkic avatar mkadunc avatar mojca avatar nusaja avatar overlordtm avatar romunov avatar stefanb avatar tainn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data's Issues

EPI: XLS to GDocs

S 15.10. je NIJZ prešel iz DOC poročila na XLS poročilo.

Podatke trenutno ročno vnašamo v GDocs in sicer v naslednje tabele:

  • Podatki: skupno število potrjenih primerov (tb1)
  • EPI: po starostnih skupinah - potrjeno okuženi (tb4), umrli (tb6)
  • Kraji: potrjeno okuženi po občinah (tb2)
  • Umrli:Kraji: umrli po občinah (tb6)

Skripta bi lahko ta polja dodala v GDocs + skopirale dodatne formule.

Za več informacij: @lukarenko, @kesma01 ali @matejmeglic

NIJZ XLS: age-confirmed.csv, age-deceased.csv

From NIJZ XLS process the regions data from Tabela 4 to new age-confirmed.csv with following columns:

age.female.0-4.todate | age.female.5-14.todate | age.female.15-24.todate | age.female.25-34.todate | age.female.35-44.todate | age.female.45-54.todate | age.female.55-64.todate | age.female.65-74.todate | age.female.75-84.todate | age.female.85+.todate | age.female.todate 

age.male.0-4.todate | age.male.5-14.todate | age.male.15-24.todate | age.male.25-34.todate | age.male.35-44.todate | age.male.45-54.todate | age.male.55-64.todate | age.male.65-74.todate | age.male.75-84.todate | age.male.85+.todate | age.male.todate

age.unknown.0-4.todate | age.unknown.5-14.todate | age.unknown.15-24.todate | age.unknown.25-34.todate | age.unknown.35-44.todate | age.unknown.45-54.todate | age.unknown.55-64.todate | age.unknown.65-74.todate | age.unknown.75-84.todate | age.unknown.85+.todate | age.unknown.todate

age.0-4.todate | age.5-14.todate | age.15-24.todate | age.25-34.todate | age.35-44.todate | age.45-54.todate | age.55-64.todate | age.65-74.todate | age.75-84.todate | age.85+.todate | age.todate 

After processing, we should add copied row also to age-deceased.csv (similar as we do in deceased-region.csv in transform/region.py). This is to have matching last day in both CSV files.

Windows: nijz_daily.py fails on Windows

@AuroraBode is using Windows to run nijz_daily. Script fails with error on ajdovščina so the suspect is encoding issue.

INFO:C:\Users\Delo\Documents\GitHub\data\transform\nijz_daily.py:SOURCE_FILE: C:\Users\Delo\Documents\COVID-DATA\EPI\dnevni_prikazi20210225.xlsx
Traceback (most recent call last):
  File "C:\Users\Delo\Documents\GitHub\data\transform\nijz_daily.py", line 62, in <module>
    df = df.rename(mapper=get_municipality_header, axis='columns')  # transform of municipality names
  File "C:\Users\Delo\Documents\GitHub\data\venv\lib\site-packages\pandas\util\_decorators.py", line 309, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Delo\Documents\GitHub\data\venv\lib\site-packages\pandas\core\frame.py", line 4300, in rename
    return super().rename(
  File "C:\Users\Delo\Documents\GitHub\data\venv\lib\site-packages\pandas\core\generic.py", line 947, in rename
    new_index = ax._transform_index(f, level)
  File "C:\Users\Delo\Documents\GitHub\data\venv\lib\site-packages\pandas\core\indexes\base.py", line 4836, in _transform_index
    items = [func(x) for x in self]
  File "C:\Users\Delo\Documents\GitHub\data\venv\lib\site-packages\pandas\core\indexes\base.py", line 4836, in <listcomp>
    items = [func(x) for x in self]
  File "C:\Users\Delo\Documents\GitHub\data\transform\nijz_daily.py", line 52, in get_municipality_header
    region = municipalities[m]['region']
KeyError: 'ajdovščina'

nijz_daily.py: vaccination.csv support

We should merge Tabela 2 from NIJZ report (1. and 2. dose) with daily delivered data from Vaccination GSheet.

This should replace manual copying of daily vaccination data from XLS to GSheet.

My suggestion is to:

  • move delivered data to E:Delivered
  • pull E:Delivered from GSheet + Tabela 2 (daily vaccination data)
  • calculate .todate and .used.todate fields

Daily NIJZ to tests-cases.csv (daily update from morning partial update)

This is continuation of #60, which happens around 13-14h, when daily NIJZ report is available.
As described, morning processing adds tests.* and placeholders for cases.* in stats.csv.

When we get NIJZ daily report, we can modify tests-cases.csv by doing the following:

  • copy Tabela 1 column N to cases.confirmed.todate and recalculate cases.* formula
  • use Tabela 5 column B to calculate todate numbers for cases.rh.occupant.confirmed.todate andrecalculate cases.* formula

test_list_xlsx: one file is missing

def test_list_xlsx():
    actual = list_xlsx(dir=DATA_DIR)
    expected = [
        'health_centers_tests/data/HOS/Bolnišnice COVID 12052020.xlsx',
        'health_centers_tests/data/HOS/2020-04/Bolnišnice COVID 30042020.xlsx'
    ]  # this should be length of 3
    for a, e in zip(actual, expected):
        assert a.endswith(e)

Tests to stats.csv morning update

Every morning around 9:00 we get only two numbers from NIJZ:

  • tests.regular.performed
  • tests.regular.positive

This data is entered into Tests GSheet where we have historical data about test per-lab. GHeet is exported to lab-tests.csv via update.py.

These data is also used to fill-up stats.csv via legacy Podatki GSheet.

We should introduce new tests-cases.csv which is made of:

  • old legacy data from stats-legacy.csv (which should be static file from now on.
  • newly created data from lab-tests.csv (`tests.regular.performed, tests.regular.positive``) - all other fields are calculated
  • cases.confirmed.todate is simply calculated as previous day data + tests.regular.positive) - all other fields are calculated

CSV rename

  • region-confirmed.csv (change name from regions-cases.csv)

  • region-active.csv (change name from active-regions.csv)

  • region-deceased (as is)

  • region-cases.csv = join all three region tables (not municipalities cases as is (?))

  • municipality-confirmed.csv (currently regions.csv)

  • municipality-active.csv (currently inside municipality.csv)

  • municipality-deceased-legacy.csv (currenlty inside municipality.csv, no changes after 12.12)

  • municipality-cases.csv = join all three municipality tables (like now, but censor deceased after 12.12)

HOS: XLS v GDocs

HOS poročilo dobivamo dnevno v XLS obliki.
Od 1.10. ga zbirajo preko aplikacije, zato je tudi format XLS postal stabilen, ker ni več ročnih vnosov.

Podatki se trenutno ročno vnašajo v GDocs HOS tabelo, posredno pa polnijo Pacienti tabelo, ki se uporablja za export v patients.csv.

Ideja:

  • dnevni HOS XLS processing, ki doda novo vrstico v HOS tabelo (lahko tudi novo vrstico z formulami v Pacienti)
  • še vedno mora nekdo pregledati zadevo, zaradi neskladij pri sprejeti/odpuščeni in uskladitve z ICU poročilom/tabelo

Več informacij o HOS: @lukarenko ali Maja Založnik

Implement support for OurWorldInData

The data can be fetched from their Github repository: https://github.com/owid/covid-19-data/tree/master/public/data - it is available both in CSV and JSON formats.

The existing use cases we need covered (@joahim please verify this):

  1. Get data for all dates for a specific list of countries.
  2. Get data starting from a specific date, for a specific list of countries.
  3. Get data starting from a specific date, for all countries.

Countries are specified by their ISO codes.

Data columns we currently need:

  • "date"
  • "iso_code"
  • "new_cases"
  • "new_cases_per_million"
  • "total_cases"
  • "total_cases_per_million"
  • "total_deaths"
  • "total_deaths_per_million"

The format of the request should be something simple, no need for some kind of a generic query mechanism. I would suggest offering following parameters (whether in query string or input JSON, I don't know):

  1. List of country ISO codes. If not specified, data for all countries is returned.
  2. Starting and ending dates (both are optional).
  3. Output format (requested by @joahim): JSON or CSV.

Also, please leave an option for the null properties in JSON to be skipped (not right now, since our parser currently doesn't support it - I think).

cc: @joahim, @MihaMarkic, @lukarenko, @stefanb

stats.csv: rename cases.active.todate to cases.active

We have now added estimated recovered, closed (recovered+deceased) and active cases.

  • cases.confirmed.todate
  • cases.recovered.todate = cases.confirmed.todate(today) - cases.confirmed.todate(-21days)
  • cases.closed.todate = cases.recovered.todate + state.deceased.todate
  • cases.active = cases.confirmed.todate - cases.closed.todate

As cases.active is is current (today's) state, we should remove wrong .todate suffix.

Note: as this fields were not used much before, we do not expect any breakage.

Automation of daily stats.csv processing with NIJZ daily report

stats.csv is currently exported directly from old GSheet

We update stats.csv three times during the day as the data becomes available:

1. LAB data (9:00 update)
The data is entered in Tests GSheet and exported to lab-tests.csv with update.py script

After export, we need to add to stats.csv previous day (data is added to existing row):

  • tests data: tests.*columns D-M (N-R are not used anymore and left empty)
  • active cases: cases.* columns S-AC (S value (active cases) is calculated as S(day-1)+Positive(day))

2. HOS + ICU + deaths (10:30 update)
Patients data is collected in Patients GSheet and is exported to patients.csv via update.py script.

After export, we need to add to stats.csv current day (new row is added):

  • patients data: state.* columns AL-AS

3. EPI data (final update - around 13:30)
New NIJZ report in XLS is computer generated and ready to be parsed automatically and converted to individual .CSV files.

The following data needs to be extracted:

When we have all above data, we can merge all these CSV files to stats.csv:

  • region.* from new regions-cases.csv
  • age.* from new age-confirmed.csv
  • deceased.* from new age-deceased.csv
  • cases.* from new cases.csv

NIJZ XLS: regions-cases.csv

From NIJZ XLS process the regions data from Tabela 3 to new regions-cases.csv with following columns:

region.lj.todate
region.ce.todate
region.mb.todate
region.ms.todate
region.kr.todate
region.nm.todate
region.za.todate
region.sg.todate
region.po.todate
region.ng.todate
region.kp.todate
region.kk.todate
region.foreign.todate
region.unknown.todate
region.todate

stats-weekly.csv automated processing with NIJZ weekly report

NIJZ publishes two weekly reports on Monday:

We should automate processing of XLSX into stats-weekly.csv.

Currently we have GSheet for weekly data.

Existing data can be processed from:

week | date | date.to

  • obvious

week.confirmed | week.rhoccupant

  • umrli - Tabela 1

week.investigated

  • okuzeni - Tabela 1 (Skupaj)

week.healthcare

  • okuzeni - Tabela 4

week.src.import | week.src.import-related | week.src.local | week.src.unknown

  • okuzeni - Tabela 1

week.loc.family | week.loc.work | week.loc.school | week.loc.hospital | week.loc.otherhc | week.loc.rh | week.loc.prison | week.loc.transport | week.loc.shop | week.loc.restaurant | week.loc.sport | week.loc.gathering_private | week.loc.gathering_organized | week.loc.other | week.loc.unknown

  • okuzeni - Tabela 2

week.from.<country>

  • okuzeni - Tabela 3 - just needs to be rotated

Additionaly, we can add additional data:

week.deceased | week.deceased.rhoccupant

  • umrli - Tabela 2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.