Skip to Main Content

Data Resources for SARS-CoV-2

Cases, Deaths, and Testing

"The Challenges of Using Real-Time Epidemiological Data in a Public Health Crisis"
https://medium.com/pew-research-center-decoded/the-challenges-of-using-real-time-epidemiological-data-in-a-public-health-crisis-c7a6c2e9c950
As is often the case, the Pew Research Center is doing much research into and analysis of public attitudes about and responses to the pandemic. In this post, analysts from Pew talk about working with data on COVID-19 cases from different sources, how results from those sources sometimes differ, and how different definitions and collection methods can contribute to those different results. The post also discusses how Pew makes use of aggregated data on case counts in its analyses of public opinion about COVID-19.

Casos de coronavírus (Covid-19) nos municípios do Amazonas (Brasil)
https://dataverse.harvard.edu/dataverse/covid19-amazonia
Municipal-level data for cities in the Brazilian state of Amazonas. See https://blog.brasil.io/2020/03/23/dados-coronavirus-por-municipio-mais-atualizados/ and https://brasil.io/dataset/covid19/caso/. Note that the sites are all in Portuguese.

China Data Lab
https://dataverse.harvard.edu/dataverse/cdl_dataverse
The China Data Lab has focused on research related to the outbreak in China, but the site has added data from the U.S. and data on global policy measures in response to outbreaks. The non-China-related data are not always updated on a regular basis.

Coronavirus (COVID-19) in Prisons in the United States, April - June 2020
https://doi.org/10.3886/E119901V1
"This is a collection of publicly reported data relevant to the COVID-19 pandemic scraped from state and federal prisons in the United States. Data are collected each night from every state and federal correctional agency’s site that has data available ... The data primarily cover the number of people incarcerated in these facilities who have tested positive, negative, recovered, and have died from COVID-19. Many - but not all - states also provide this information for staff members."

COVID Tracking Project
https://covidtracking.com/
The COVID Tracking Project provides state-level data for tests conducted and test results. In recognition of the diverging effects of the virus across races, the Project has also begun providing breakdowns of data by race/ethnicity and scoring states on whether they provide such breakdowns in their reporting of data. Note that the Project is no longer collecting new data, although tis historical data are still available.

COVID Data Tracker
https://covid.cdc.gov/covid-data-tracker/
The CDC's COVID Data Tracker is the CDC's newest portal for national, state, and county-level data on cases/deaths, vaccinations, demographic trends, and health-care contexts related to the pandemic.

COVID-19 Case Surveillance Public Use Data
https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf
These data from the CDC consist of "patient-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and states." The data are de-identified and "include demographic characteristics, exposure history, disease severity indicators and outcomes, clinical data, laboratory diagnostic test results, and comorbidities." Note that these data are missing many reported cases due to incomplete (or no) reporting from individual states.

COVID-19 Community Profile Reports
https://beta.healthdata.gov/Health/COVID-19-Community-Profile-Report/gqxm-d9w9
The Department of Health and Human Services has begun distributing community profile reports produced by the White House COVID-19 Task Force. The profiles include data on cases, deaths, testing, socio-economic and demographic characteristics, and codes for whether a given area is a pandemic hotpost. The data are for states, regions, metropolitan areas, and counties and are available either as .pdf files or as Excel workbooks.

COVID-19 Estimated Patient Impact and Hospital Capacity by State
https://healthdata.gov/dataset/covid-19-estimated-patient-impact-and-hospital-capacity-state
The Department of Health and Human Services provides data for estimates of hospital utilization in terms of inpatient beds, inpatient beds occupied by COVID-19 patients, and ICU beds. The state-level estimates are aggregated from facility-level data.

COVID-19 Reported Patient Impact and Hospital Capacity by Facility
https://healthdata.gov/dataset/covid-19-reported-patient-impact-and-hospital-capacity-facility
The Department of Health and Human Services provides weekly facility-level data on hospital utilization in terms of hospital beds, ICU beds, staffing, admissions, and confirmed COVID-19 cases. There are variables for state, county, and ZIP code of facilities. See https://github.com/CareSet/COVID_Hospital_PUF for an FAQ about the data. See https://carlsonschool.umn.edu/mili-misrc-covid19-tracking-project for county-level data taken from the facility-level data.

COVID-19 Superspreading Events Database
https://medium.com/@codecodekoen/covid-19-superspreading-events-database-4c0a7aa2342b
This database is an attempt to code data on superspreader events that have exerted a disproportionate effect on growth in COVID-19 cases. The events are coded by location, date, number of people infected, and type of event. Note the author's caveats about limitations of the data. See https://covid19settings.blogspot.com/p/about.html for the project site for the data.

covid-rt
https://dataverse.harvard.edu/dataverse/covid-rt
The Centre for the Mathematical Modelling of Infectious Diseases (CMMID) at the London School of Hygiene & Tropical Medicine produces estimates of R0/reproduction-number values for COVID-19. There are both national estimates and, for select countries, sub-national estimates. See https://cmmid.github.io/topics/covid19/ for additional pandemic-related work by the Centre.

Data Development Lab COVID India
http://www.devdatalab.org/covid
This site from the Development Data Lab focuses on India "includes estimates of hospital and clinic doctor and bed capacity (district level, and soon subdistrict), CFR predictions based on variation in local population age distribution (subdistrict level), urbanization rates and population density (subdistrict level and lower), as well as deaths and infections at the highest resolution possible." See https://github.com/devdatalab/covid for additional information about the data.

Johns Hopkins University Center for Systems Science and Engineering: Novel Coronavirus (COVID-19) Cases
https://github.com/CSSEGISandData/COVID-19
Johns Hopkins has been making availabe datasets collected as part of its much-touted/much-referenced site for tracking new cases, fatalities, and recoveries. There are time-series datasets for daily data for cases and deaths and individual daily datasets with additional variables for incident rates, case fatality rates, test results, and/or hospitalizations where such data are available.

Johns Hopkins University Center for Systems Science and Engineering: Unified COVID-19 Dataset
https://github.com/CSSEGISandData/COVID-19_Unified-Dataset
Johns Hopkins' Unified COVID-19 Dataset combines available data for cases, deaths, testing, and hospitalizations from multiple sources with breakdowns by age and gender. NOTE that data availability in terms of indicators, demographic breakdowns, and level of geographic detail all vary across countries.

New York Times: An Ongoing Repository of Data on Coronavirus Cases and Deaths in the U.S.
https://github.com/nytimes/covid-19-data
The New York Times has been making available county-level data for its maps of the spread of the disease in the United States. The NYT's GitHub repository also includes data for cases at colleges and universities, for excess deaths, and for usage of face masks.

Our World in Data: Coronavirus Pandemic (COVID-19) Statistics and Research
https://ourworldindata.org/coronavirus
Our World in Data has compiled international data from various sources as well as studies making use of such data. The data cover topics such as cases, deaths, tests, hospitializations, vacciations, mortality risk, excess mortality, and policy reponses. The data in their entirety are available for download from GitHub.

University of Maryland COVID-19 Impact Analysis Platform
https://data.covid.umd.edu/
This project provides visualizations of state- and county-level data for the U.S on a variety of metrics for 4 different categories: Mobility and Social Distancing, COVID and Health, Economic Impact, and Vulnerable Population. The data are a mix of publicly-available data and estimated calculated by the project. See https://data.covid.umd.edu/about/index.html for a complete list of available indicators and for how to request access to the data for those indicators. See "Replication Data for: Quantifying Human Mobility Behavior Changes During the COVID-19 Outbreak in the United States" for a replication dataset making use of some data from this project.