dkobak / excess-mortality Goto Github PK

Excess mortality during COVID-19 pandemic

License: GNU General Public License v3.0

Jupyter Notebook 100.00%

excess-mortality's Introduction

Excess mortality during the COVID-19 pandemic

Publication: Karlinsky & Kobak, 2021, Tracking excess mortality across countries during the COVID-19 pandemic with the World Mortality Dataset. eLife 10:e69336. https://elifesciences.org/articles/69336.

See the elife2021 folder for reproducible analysis from the paper. The figures shown below are being continuously updated after the publication.

NOTE: We will not provide any excess mortality estimates starting from 2024. Our excess mortality calculations are based on linear extrapolation of 2015--2019 trends, and this becomes more and more tenuous as the years go by. The repository will keep being updated together with the World Mortality Dataset but will only show excess until the end of 2023.

Analysis code: all-countries.ipynb (can be run in Colab).

The data are sourced from the World Mortality Dataset. Excess mortality is computed relative to the baseline obtained using linear extrapolation of the 2015–19 trend (different baselines for 2020, 2021, 2022, 2023). In each subplot in the figure below, gray lines are 2015–19, black line is baseline for 2020, red line is 2020, blue line is 2021, orange line is 2022, green line is 2023. Countries are sorted by the total excess mortality as % of the 2020 baseline.

Red number: excess mortality starting from the first officially reported Covid-19 death.
Gray number: excess mortality as a % of the annual (2020) baseline deaths.
Black number: excess mortality per 100,000 population.
Blue number: ratio to the daily reported Covid-19 deaths over the same period (sourced from WHO).

The same data but now represented as the number of deaths per 1000 people per year, and with countries sorted accordingly:

Top-10 countries in the World Mortality Dataset according to different metrics (only countries with over 500,000 population are shown):

See full table in CSV: excess-mortality.csv. Compare with: FT, NYT, The Economist, WSJ.

Tracking of excess mortality and official Covid deaths:

Note: On March 1, 2023, I removed the European 2020 heatwave correction that we did in the eLife paper. There were multiple heatwaves since then, and we are not able to systematically correct for it always. So this correction has been dropped.

Extrapolation until today

Daily reported Covid-19 mortality and estimated excess mortality across the countries with the most reported Covid-19 deaths. Note that in this figure the excess mortality in all countries is FORECASTED using the undercount coefficient and the LATEST daily reported number of deaths.

Excess mortality in Europe by NUTS regions

The code is in the europe-nuts.ipynb notebook. The weekly data are sourced from Eurostat, excess mortality is computed using the WMD model (i.e. accounting for yearly trend and extrapolating it forward; total excess is the sum from 2020, week 10, onwards). I am using NUTS3 regions in most places, but NUTS2 are used in Belgium, Netherlands, and UK because there NUTS3 appear too small on this map (London is shown as NUTS1), and also in Serbia because I am not sure NUTS3 data are reliable.

Caveat: UK data in Eurostat are only available for 2020, meaning that excess mortality in 2021 is not taken into account.

The same figure by P-scores:

Excess mortality in Russia

The code for my February 2021 paper in Significance Excess mortality reveals Covid's true toll in Russia is available in the significance2021 folder, together with the frozen data and the final figures.

The up-to-date data can be found in the russian-data folder. Code: russia.ipynb. I am using monthly data by date of death for all years up to and including 2023. The data by date of death were provided by Rosstat upon my request. (Note that the dataset includes deaths with known year but unknown month of death; I redistributed those proportionally to the deaths with known month of death.) Thanks to Alexey Raksha for helpful discussions.

Note: this analysis follows the same principle as the analysis above: baselines are obtained using linear extrapolation from 2015--19, and baselines for 2020, 21, 22, 23, are all different.

Note: I am not planning to update this analysis with 2024+ data.

Note that ~10 thousand excess deaths in July in the Ural region and West Siberia were due to the heat wave (see also below).

Animation (English):

Animation (Russian):

Map (English):

Map (Russian):

Country as a whole:

Yearly deaths:

Back in 2019 Rosstat made forecast until 2035 (xls). Upper/lower/middle forecasts are shown with dashed lines. The actual number in 2019 was 1,798,307. The actual number in 2020 was 2,138,586 (forecast: 1.7890 mln; 1.7413--1.8304). The actual number in 2021 was 2,441,594 (forecast: 1.7877 mln; 1.7158--1.8481).

Detailed statistics in regions with the most excess deaths:

Evolution of the undercount coefficient:

Seasonal variation:

Detailed history of monthly deaths:

Weekly data

Excess based on weekly data (by date of death) from http://mortality.org (only 2020 data are available so far; note that these weekly data do not include Crimea and do not include deaths with unknown week of death):

Note the bump in weeks 28--29: that is the effect of the heatwave (e.g. in Ufa it was very hot from 10th to 20th of July, precisely these two weeks: http://www.pogodaiklimat.ru/monitor.php?id=28722&month=7&year=2020). It contributed around 4 thousand excess deaths per week, i.e. around 8 thousand in total, on top of the excess Covid-related mortality.

The same split by gender and narrow age groups (using "input" data to STMF shared at http://mortality.org). Here I use linear trend based only on 2018-19 for linear extrapolation:

And as summary across age groups:

excess-mortality's People

Contributors

Stargazers

Watchers

Forkers

mlzharov guaponi kenmomd eugeniorj everlee78 cmiranda16poncehealthsciencesuniversity douxross khpvo data-analisis zhzhsgsg528 hans-deck navizv svagner runhelka ctallack frencda rafael0124 jq-dev robertoalvarezm lsempe77 andreasette happybat nerdm fariqm medcodigos thummm stjordanis jamesthesnake yuster0 engrfaisal90 txwikinger lucas-1995 rquiroga7 aanubhav2147 xiaopumu marcopiani omkurz magicliang dummyyummy2020 claudioccm 5htplife jonathanreiner facegbd anastasiaarbuz deepakphuyal

excess-mortality's Issues

Strange "Excess as % of annual baseline"

Hi, it seems that excess percent in file excess-mortality.csv takes as base mortality yearly one, while other data - covid and excess deaths are taken from very beginning pf epidemic.

For instance according to excess-mortality.csv Israel on 12/09/21 has 7416 covid deaths (that is correct) and 6514 total excess, thus yielding 0.88 undercount ratio. But excess is 13.9%, as if baseline were 46863, number corresponds to one year, but not for more than 1.5 year.

The same for Sweden - it looks like base line for 1.5 year is 91762, but it's regular Sweden yearly mortality.

Or did I miss something?

Anyway it would be nice to maintain separately data for 20 (already done), 21, and from very beginning.

Monthly data by date of death for Russia

@dkobak Are you planning to update the monthly data by date of death for the regions of Russia? I noticed it has already been corrected for the country, but not for the administrative areas.

place 'placeholder' chunks in if() switch

I suggest to put 'placeholder' chunks in if() switch like

if (1):
    country='Sweden'
    X = allcountries[country][0]
    baseline = allcountries[country][1]
    ...

that's quicker & easier (for me at least) than (un)comment all relevant lines or even change the cell type (code, markup, raw nbr in Jupyter) to (de)activate the chunk as needed

Normalized mortality per age group

In order for the picture to be realistic, you need to take into account the age groups and the mortality rate in them in previous years. Otherwise you get the wrong picture when you suddenly have spikes in certain age groups of the population. And the different demographics of countries.

https://www.cebm.net/covid-19/excess-mortality-across-countries-in-2020/

Example for Sweden:

Need a population profile adjusted version for excess deaths

Hi, greatly appreciated your efforts and they were very useful in the first days of the pandemic to understand the full impact. However, I noticed people are now getting confused about excess deaths for countries that have significant population profile differences in the recent few years. For example, Japan has a massive spike (> 40%) more people in age of 70 to 80 than they did during the reference range.

This is now distorting the conversation and prominent influencers are using this to spread misunderstanding. For example, you're reporting excess deaths in Japan at a high rate, but I believe that adjusted for population profile changes that in fact the death rate is lower than expected.

Maybe you're already adjusting for this, but I think you are not?

I believe the right technique for the expected deaths would be to multiply it by the change in population in each age group since the middle of the reference range. For example, if Japan has 40% more people from age 70 to 80 and has 30% more deaths in that age range than in the reference range, then in fact they have a lower than expected death rate.

Maybe I did the math wrong, but I think that in the four years since this project started that most countries have an aging population that is significant enough to need adjustment.

Cheers!

Country image is sorted by excess percent

Country image is sorted by excess percent, not by name, while CSV is sorted by name. It's not handy to find specific country on image.

Notebook loading error

hi,

trying the 'run in browser' link from https://github.com/dkobak/excess-mortality failed with this notice:

Notebook loading error
There was an error loading this notebook. Ensure that the file is accessible and try again.

An invalid or illegal string was specified
https://github.com/dkobak/excess-mortality/blob/main/all-countries.ipynb

An invalid or illegal string was specified
GA@https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20210128-085606-RC00_354297656:1310:69
d/<@https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20210128-085606-RC00_354297656:2199:97
Fa@https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20210128-085606-RC00_354297656:19:336
Da.prototype.next_@https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20210128-085606-RC00_354297656:17:503
Ia/this.next@https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20210128-085606-RC00_354297656:20:206
f@https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20210128-085606-RC00_354297656:62:101

all-countries.ipynb: got HTTPError 'forbidden' for df_official

I'm running the code in local instance of Jupiter, at some point got an error for df_official. I see now there's a hack in the code as chunk 3, but my fix was simply

df_official = pd.read_csv('https://github.com/owid/covid-19-data/blob/master/public/data/owid-covid-data.csv?raw=true')

seems OWD wants to move data traffic onto github. Just checked this URL and original from owd and they're in sync. Indeed, now http://covid.ourworldindata.org redirects to above github account so also old original URL should just work without hack.

Moscow and St. Petersburg mortality data reliability

In the paper you state:

Moscow and St. Petersburg, two regions with arguably the most reliable reporting of Covid‐19 mortality.

However, the data for both of those cities exhibits the following peculiarity:

On 2020-07-28, 23 deaths were reported in St. Petersburg. The figure was the same on 2020-07-29.
On 2020-08-13, 11 deaths were reported in Moscow. The next day’s number was the same.
Since then, in both cities the figure never repeated on two consecutive days.

This is not characteristic of the other regions. SPb’s 192-day streak and Moscow’s 176 are followed by a very distant third, Nizhegorodskaya oblast where the figures haven’t repeated for 34 days. The median is 2 days.

A back-of-the-envelope calculation: assuming that a particular day’s figure could have been anything within range given by data reported within ±3 days of the given date (distributed uniformly), what’s the probability it never matches the previous day’s value? This probability is 5×10⁻⁴ for SPb and 5×10⁻¹² for Moscow (the latter includes 60 consecutive days, 2020-11-15 through 2020-01-13, where all the data fell into the [70, 77] interval, yet, despite its narrowness—in itself uncharacteristic of a random process—never repeated, and never exceeded the previous maximum, the 2020-05-30 value of 78).

A histogram of the Moscow data is below.

Perhaps the statement needs revision?

 0  3 |||
 1  3 |||
 2  3 |||
 3  1 |
 4  0 
 5  2 ||
 6  0 
 7  3 |||
 8  2 ||
 9  3 |||
10  9 |||||||||
11 15 |||||||||||||||
12 18 ||||||||||||||||||
13 12 ||||||||||||
14 14 ||||||||||||||
15  3 |||
16  3 |||
17  2 ||
18  1 |
19  1 |
20  2 ||
21  1 |
22  1 |
23  2 ||
24  5 |||||
25  3 |||
26  1 |
27  4 ||||
28  6 ||||||
29  4 ||||
30  1 |
31  2 ||
32  3 |||
33  1 |
34  4 ||||
35  4 ||||
36  0 
37  3 |||
38  1 |
39  2 ||
40  0 
41  3 |||
42  0 
43  0 
44  2 ||
45  0 
46  0 
47  1 |
48  2 ||
49  3 |||
50  2 ||
51  3 |||
52  5 |||||
53  4 ||||
54  2 ||
55  4 ||||
56  3 |||
57  2 ||
58  4 ||||
59  2 ||
60  1 |
61  4 ||||
62  3 |||
63  4 ||||
64  2 ||
65  3 |||
66  3 |||
67  5 |||||
68  7 |||||||
69  5 |||||
70  3 |||
71 11 |||||||||||
72  9 |||||||||
73 11 |||||||||||
74 13 |||||||||||||
75 14 ||||||||||||||
76 15 |||||||||||||||
77 10 ||||||||||
78  1 |
79  1 |
80  0 
81  1 |
82  0 
83  0 
84  1 |

incomplete all-countries.png caption

In my local copy I adjusted the caption according to code / preprint.
Use/adjust as you see fit in case.

--- figtext     2021-02-22 13:15:30.275003847 +0100
+++ figtextmp   2021-02-23 12:05:00.018288814 +0100
@@ -2 +2 @@
-'Data: World Mortality Dataset, github.com/akarlinsky/world_mortality. '
+'Data: World Mortality Dataset, github.com/akarlinsky/world_mortality, github.com/datasets/, github.com/owid/ . '
@@ -4,5 +4,6 @@
-'Excess mortality is computed relative to the baseline extrapolated from 2015–19. '
-'Red number: excess mortality starting from the first officially reported covid19 death.\n'
-'Gray: as a % of baseline yearly deaths. '
-'Black: per 100,000 population. '
-'Blue: ratio to the daily reported covid19 deaths over the same period. '
+'Excess mortality is computed relative to the baseline extrapolated from 2015–2019. '
+'Lines: black: baseline, gray: 2015-2019, red: 2020, magenta: 2021\n'
+'Numbers: red: estimated excess mortality starting from the first officially reported covid19 deaths up to last available official mortality data,\n'
+'gray: as a % of baseline yearly deaths. black: per 100,000 population, '
+'blue: ratio to the daily reported covid19 deaths over the same period.\n'
+'(*) less war / heatwave excess deaths.\n'

Caution though to define the blu ratio as undercount in general, although there are some glaring cases: e.g. for Italy it's documented (also in your ref. Beaney,2020) that ca. 1/3 excess deaths are non-covid19 due to missed cares either for fear of contagion, lockdown measures, overburden of care facilities:

in latest news https://www.adnkronos.com/covid-30mila-morti-per-altre-malattie-trascurate_2difOiSOAXHlHbkOCeX1H5
an earlier (1st wave) analysis with geo/age structure considerations https://bmcpublichealth.biomedcentral.com/articles/10.1186/s12889-020-09335-8
latest IT national stats report https://www.istat.it/it/archivio/252168 (sorry, seems EN version not available)

And e.g. for USA For 6% of the deaths, COVID-19 was the only cause mentioned of the covid19 counts while for the rest it's a matter of choosing the most (often more politically than medically) convenient label https://www.cdc.gov/nchs/nvss/vsrr/covid_weekly/index.htm#Comorbidities with a number of collateral damages like increase in drug abuse https://jamanetwork.com/journals/jama/fullarticle/2776212 and psycho issues e.g. https://www.bmj.com/content/371/bmj.m4352.short

readme: > 50K should be 50M?

In the readme:

Top-10 countries in the World Mortality Dataset according to different metrics (only countries with over 50,000 population are shown):

guess it should be 50,000,000

No data for some countries

There is no Excess per 100k and Excess as % of annual baseline columns for some countries, for instance Finland, Norway and more.

License clarification

What's the license for code in this repo?