Vendor | Description | Help/Guidelines | Examples |
---|---|---|---|
BYU Corpora | Textual corpora, focusing on languages and dialects. | Recent shifts with three nonfinite verbal complements in English: data from the 100-million-word Time corpus (1920s–2000s) |
|
Caselaw Access Project | Covers 6.4 million cases that represent 360 years of U.S. legal history. | CAP API | California Wordclouds |
Chronicling America |
Historic newspapers.
|
Bulk Access |
Using Big Data to Ask Big Questions cites the following teaching and research projects: |
Digital Public Library of America | Photographs, books, maps, news footage, oral histories, personal letters, museum objects, artwork, government documents, etc. from libraries, archives and museums. | StackLife DPLA | |
Europeana Labs | Cultural heritage items from museums and galleries across Europe. |
Europeana Project List. Examples include: |
|
Folger Shakespeare Library | Shakespeare's plays, sonnets, poems. | ||
Google Books | Books from 1800-2000. | Quantitative analysis of culture using millions of digitized books; The Two Poverty Enlightenments: Historical Insights from Digitized Books Spanning Three Centuries | |
HathiTrust Digital Library | Printed material predominately published prior to 1923 | ||
Internet Archive Open Library | Books, texts and other digital material. | Data Dumps | |
New York Times Developer Network | Provides access to ten public APIs: Archive, Article Search, Books, Community, Geographic, Most Popular, Semantic, Times Newswire, TimesTags, and Top Stories. | ||
Project Gutenberg | Books in various languages. | Terms of Use | |
Text Creation Partnership | Early English Books Online, Eighteenth Century Collections Online and Evans Early American Imprints. | ||
University Datasets | Michigan State University Datasets (19th century Sunday School Books, Historic American Cookbooks, farming journals), University of Pennsylvania (books), University of Oxford Text Archive (literary and linguistic texts) |