Skip to Main Content

Resources for Text and Data Mining

Guide to text mining resources available through Emory Libraries and the Emory Center for Digital Scholarship.

Things to keep in mind

Appropriate Use of Purchased or Licensed Resources

Most of the library's electronic resources are governed by license agreements that limit use to the Emory community or to individuals who are physically present at Emory University Library facilities.

  • Each user is responsible for ensuring that he or she uses these products solely for noncommercial, educational, scholarly or research use.
  • Systematic downloading, distribution of content to non-authorized users or indefinite retention of substantial portions of information is strictly prohibited. 
  • The use of software such as scripts, agents, or robots, is generally prohibited and may result in loss of access to these resources for the entire Emory community.

Adapted from Yale's Resources for Text Mining Guide

 

Purchased and Out-of-the-Box Environments

Hathitrust Research Center

The HathiTrust Research Center (HTRC) facilitates non-profit and educational uses of the HathiTrust Digital Library by enabling computational analysis of works from its collection. NO LONGER SUPPORTED BY HATHITRUST.

Proquest Text and Data Mining Tool

ProQuest TDM (Text and Data Mining) Studio allows you to create and analyze datasets from ProQuest content. See the updated spreadsheet (August 2025) to the side for available content for analysis. These TDM help guides are rather useful. Note that this resource does not allow for the full scale downloading of raw content for use outside of the tool. Please contact the library about this possibility. As of August 2025, Proquest TDM does include AI tools for large scale text analysis. Major exception to the datasets include is Factiva.

Gale Digital Scholar Lab

Tool from Gale Engage to interact at a large scale with their primary source collections. Again, note that this resource does not allow for the full scale downloading of raw content for use outside of the tool. Please contact the library about this possibility.

Paper Machines

Paper Machines is an open-source extension for Zotero, which is a program that allows users to create bibliographies and build large text corpuses in an online database. Paper Machines enables Zotero to do text analysis and visualization aimed at researchers across a variety of disciplines in the humanities and social sciences.