Skip to Main Content

Resources for Text and Data Mining

Guide to text mining resources available through Emory Libraries and the Emory Center for Digital Scholarship.

Things to keep in mind

Appropriate Use of Purchased or Licensed Resources

Most of the library's electronic resources are governed by license agreements that limit use to the Emory community or to individuals who are physically present at Emory University Library facilities.

  • Each user is responsible for ensuring that he or she uses these products solely for noncommercial, educational, scholarly or research use.
  • Systematic downloading, distribution of content to non-authorized users or indefinite retention of substantial portions of information is strictly prohibited. 
  • The use of software such as scripts, agents, or robots, is generally prohibited and may result in loss of access to these resources for the entire Emory community.

Adapted from Yale's Resources for Text Mining Guide

See table below for info about databases that allow TDM. If you don't see the resource you want to use listed here, please contact your subject librarian.

Purchased and Out-of-the-Box Environments

Hathitrust Research Center

The HathiTrust Research Center (HTRC) facilitates non-profit and educational uses of the HathiTrust Digital Library by enabling computational analysis of works from its collection. 

Proquest Text and Data Mining Tool

ProQuest TDM (Text and Data Mining) Studio allows you to create and analyze datasets from ProQuest content. See the spreadsheet to the side for available content for analysis. The TDM help guides are rather useful.

Gale Digital Scholar Lab

Tool from Gale Engage to interact at a large scale with their primary source collections

Paper Machines

Paper Machines is an open-source extension for Zotero, which is a program that allows users to create bibliographies and build large text corpuses in an online database. Paper Machines enables Zotero to do text analysis and visualization aimed at researchers across a variety of disciplines in the humanities and social sciences.