Research Guides: Resources for Text and Data Mining: Freely Available Resources

Freely Available/ Open Access Resources

Vendor	Description	Help/Guidelines	Examples
BYU Corpora	Textual corpora, focusing on languages and dialects.		Recent shifts with three nonfinite verbal complements in English: data from the 100-million-word Time corpus (1920s–2000s)
Caselaw Access Project	Covers 6.4 million cases that represent 360 years of U.S. legal history.	CAP API	California Wordclouds
Chronicling America	Historic newspapers.	Bulk Access	Using Big Data to Ask Big Questions cites the following teaching and research projects: America’s Public Bible American Lynching: Uncovering a Cultural Narrative Historical Agricultural News Chronicling Hoosier USNewsMap.com Digital APUSH
Digital Public Library of America	Photographs, books, maps, news footage, oral histories, personal letters, museum objects, artwork, government documents, etc. from libraries, archives and museums.	API Key Request	StackLife DPLA
Europeana Labs	Cultural heritage items from museums and galleries across Europe.	APIs Curated datasets	Europeana Project List. Examples include: Meath Heritage Automated Image Analysis with IIIF Linked Heritage
Folger Shakespeare Library	Shakespeare's plays, sonnets, poems.
Google Books	Books from 1800-2000.	APIs Info page	Quantitative analysis of culture using millions of digitized books; The Two Poverty Enlightenments: Historical Insights from Digitized Books Spanning Three Centuries
HathiTrust Digital Library	Printed material predominately published prior to 1923
Internet Archive Open Library	Books, texts and other digital material.	Data Dumps
New York Times Developer Network	Provides access to ten public APIs: Archive, Article Search, Books, Community, Geographic, Most Popular, Semantic, Times Newswire, TimesTags, and Top Stories.	API Key Request Form Terms of Use
Project Gutenberg	Books in various languages.	Terms of Use
Text Creation Partnership	Early English Books Online, Eighteenth Century Collections Online and Evans Early American Imprints.
University Datasets	Michigan State University Datasets (19th century Sunday School Books, Historic American Cookbooks, farming journals), University of Pennsylvania (books), University of Oxford Text Archive (literary and linguistic texts)