Skip to Main Content

Lost in Time, Like Tears in Rain: Access to At-Risk Public Data

Archives and Mirror Sites and Search Tools

Data Rescue Tracker
https://www.datarescueproject.org/data-rescue-tracker/
The Data Rescue Project has a searchable spreadsheet that tracks data rescue efforts for specific data resources and datasets. You can use to to see if a dataset of interest is being rescued and where the rescued data can be accessed.

How to Find Climate Data and Science the Trump Administration Doesn't Want You to See
https://envirodatagov.org/how-to-find-climate-data-and-science-the-trump-administration-doesnt-want-you-to-see/
The Environmental Data and Governance Initiative has a blog with posts on matters related to environmental policy and data, including access to data and tools. This particular post discusses trends in availability of environmental data and provides suggestions for where to look for archived copies of federal data and websites.

Find a Dataset by Name
https://libguides.gwu.edu/data-preservation/find-data-name
The George Washington University Libraries have produced a very helpful guide for locating data that have been taken down from federal sites. See in particular the A-Z Dataset List to search for individual datasets and where they can be located.

Find Lost Data
https://findlostdata.org/
The Center for Health Data Science at Boston University's School of Public Health has created a tool to search for and locate data that have been removed from federal websites. See https://findlostdata.org/db for the list of archives and sites across which the tool searches.



CDC Datasets Uploaded Before January 28th, 2025
https://archive.org/details/20250128-cdc-datasets
The Internat Archive is hosting an archive of "all CDC datasets uploaded to https://data.cdc.gov/browse before January 28th, 2025," with the caveats that the data are only those which were publicly available and that much of the data here are tables consisting summarized from larger surveys and collections. Note that those underlying data from recurring health surveys may not be available here.

Data.Gov Archive
https://source.coop/repositories/harvard-lil/gov-data
Harvard Law School Library's Library Innovation Lab has released an archive of https://data.gov/. The archive is regularly updated. See https://lil.law.harvard.edu/blog/2025/02/06/announcing-data-gov-archive/ for more information about this specific project, and see https://lil.law.harvard.edu/blog/2025/01/30/preserving-public-u-s-federal-data/ for more about the Data Vault Project to preserve publicly-funded open data available from the Federal Government.

Publicdata: An Archive of Public Data Sets
https://git.lsit.ucsb.edu/publicdata
UC-Santa Barbara's Letters Sciences IT has captured the contents of websites from agencies such as the National Institutes of Health, the Social Security Administration, the Centers for Medicare & Medicaid Services, the Agency for Healthcare Research and Quality, the Substance Abuse and Mental Health Services Administration, and many other agencies. Note that what is archived here are websites and not always actual datasets.

RestoredCDC.Org
https://restoredcdc.org/www.cdc.gov/
The RestoredCDC.Org project is taking copies of CDC website pages that had been archived prior to January 20, 2025 in an effort to rebuild the CDC's website as it was constructed and organized at that time. The project is a longer-term undertaking as its members work to re-create the navigation of the site and the links that connect pages across the site. The site will include data and contextual information and functionality, within the limits of what had been archived prior to January 20, 2025. If you wish to contribute to this project, see https://aboutus.restoredcdc.org/mission.