Skip to Main Content

Resources for Text and Data Mining

Guide to text mining resources available through Emory Libraries and the Emory Center for Digital Scholarship.

Why can't I get my data?

Reasons: Examples
Emory has a license to provide access to the database that contains your data, but the license prohibits TDM or allows it at a prohibitive price. Access World News
Emory has a license to provide access to the database that contains your data, but the owner of the database charges substantial fees for the right to TDM. Factiva
Emory does not have a license to provide access to the database that contains your data. Vogue Archive

Emory Policies for Purchasing Text Mining Materials

  • Emory regularly negotiates contracts for collections that include explicit clauses for the allowance of text/data mining or use of collections for AI learning models, at a reasonable cost. Where required, updated licenses will be requested and renegotiated.
  • The creation and preparing of such licensed materials takes time, so as much advance notice as possible is preferred so that requests can be processed in a timely fashion.
  • Preference for purchases is for graduate (e.g., dissertation or  thesis-level research) or faculty research. 
  • Generally, we also do not provide licensing or funding for individual text-mining projects with needs not covered by university wide licenses and that can not be shared by the Emory community (i.e., we prefer permanent rights where possible).
  • Some vendors may require walled garden approaches to control and manage access to large amounts of their data. Frequently, fees are charged by the project. The library is unable to pay for project by project fees, but will attempt to negotiate with the vendor for a more institutional solution. Therefore, as noted above, we highly encourage scholars to consider grant funding in these cases.
  • It is not recommended to use third party AI tools as (1) their use are restricted in our licenses, as they are not secure and (2) many commercial LLM providers may repurpose user data for training. It is advisable to opt out of any data-sharing.

A more thorough policy statement, which includes guidelines for utilizing Emory resources for large language modeling, is also available.