March; Month of a Pandemic
— a spreadsheet of death and recovery
- Quick link to the March spreadsheet for those with no patience (you know who you are).
- COVID19 in April, including the instructions for maintaining the spreadsheets.
What’s in this spreadsheet?
The Wikipedia COVID-19 data organised for easy comparison and use.
The daily sheet
Daily numbers for each country or territory with COVID19 cases:
- Country or territory, with links to each territory’s pandemic page.
- Population, Confirmed cases, Deaths and Recoveries (all source: Wikipedia, snapshot taken between 10:00 and 10:30 GMT). Active cases: Confirmed-dead-recovered
- % of cases recovered, died, and active
- [Crude] Case Fatality Rate (CFR).
- Confirmed cases, active cases, and deaths per million population
- Changes from yesterday (confirmed, deaths, recoveries, active) in number and percentage growth
- Each country’s percentage of total (world) numbers (population, confirmed, deaths, recoveries, active)
For CFR, IFR and other ways to measure the deadliness of a disease I recommend Our World in Data’s pandemic page. Note this is a pandemic in progress, these numbers can be misleading if used wrongly.
Other sheets
- The title sheet: Similar informations as this document.
- Growth sheets: While the daily sheets show the daily situation for all the numbers, the growth sheets focus on a single one develops over a whole month (so far percentage growth in Confirmed Cases and Deaths)
- Delta sheets: Eventually the difference between any two days you pick in March. Currently just a few predetermined dates.
- World daily stats: All the numbers for all the month for a special territory: this planet.
How can I use it
This is a data source, Wikipedia-collated data in a somewhat machine-readable form, fit for anyone who can use a spreadsheet as a data source.
They can be shifted into real machine-readable form later, but I haven’t done it (yet?). If you think it should, feel free to do so yourself.
The spreadsheet does not have pretty graphs or treatises on exponential growth, but you can use these data for those, or any other, purposes.
Licencing
Wikipedia is Creative Commons attribution-sharealike, as is this spreadsheet.
In layman’s term: Use as you like, but attribute those who contributed and make your own work similarly available. If you want to use or expand this spreadsheet, but can’t make the result available for others, contact me.
In real terms: See the Creative Commons page
Attribution can be to this page, my Twitter account (@jaxroam), or both.
How to maintain and extend it
Maintenance is described in the April version of this document.
Changes between March and April sheets
Or, what I (and Wikipedia) learned making this sheet.
Order: The March sheet is organised according to the 6 March Wikipedia page. The template orders by total confirmed cases. This order changes from day to day as the pandemic progresses, e.g. China was on top, then in a day Italy rose to top, then the US. For a spreadsheet it is convenient to have the order fixed, e.g. Sweden is row 16 through the whole spreadsheet. We could have a daily rank column, but I don’t see the use case.
For that reason the March 6 order is thus “by country having the most cases 6 March, for those that had none, in chronological order they got their first case(s)”. Not very convenient. The April sheet will use alphabetical order.
Ships: The Diamond Princess was historically important for the COVID-19 outbreak, and the nationality of ship passengers is complicated. However, from April on the rule will be: No more ships.
Importing from it
Spreadsheets are not optimal to store data. The methods to import from spreadsheets are known. You can find them elsewhere.
Extending it
There are more source data than confirmed/died/recovered, at other levels than country. How many are hospitalised? How many are tested? At hospitals or at large? What about public policies (e.g. for social distancing)? How many hospital beds per person?
You can import the spreadsheet (as above), copy or link to it, adding the other data you are interested in.
The noise, the noise
Crowd-edited data are fine, but at no point perfect. The data sources (national statistics) are unreliable and inconsistent, some more than others.
Each territory Wikipedia page has a COVID-19 cases in <territory> inset, with daily data. This ought to be consistent with the template this spreadsheet uses as data source.
Background
Beginning of March I wanted to test a hypothesis, that death is the best arbiter of how widespread an epidemic is.
To do that I needed to collect data over time, and Wikipedia provided a source of the data I wanted to use, namely crowd-tested up-to-date country data on confirmed cases, deaths, and recoveries.
Wikipedia didn’t provide any of the derived values, like current death rate, case fatality rate, active cases, cases per million population, growth rates and so on, and they still don’t. So I copy/pasted the data into a spreadsheet, every day (WP maintains an edit history, so that is easy to do).