Skip to Main Content

Resources for Text and Data Mining

Guide to text mining resources available through Emory Libraries and the Emory Center for Digital Scholarship.

Copyright Implications in Text Data Mining

When using text data mining (TDM) as part of a research strategy, remember that the materials you mine may be protected by copyright (see our site on Deciding If You Can Use a Copyrighted Work for info on determining the copyright status of a corpus). For copyrighted corpora, the practice of TDM is often allowed by fair use, which is an exception to copyright (for more info, see Issue Brief: Text and Data Mining and Fair Use in the United States [PDF] from the Association of Research Libraries). However, fair use may not protect you in the following situations:

For more information on copyright, see the Emory Scholarly Communications website. If you have questions about copyright and TDM, please contact Emory's Scholarly Communications Office

Contract Law and TDM

Contract law can prevent you from performing TDM on a corpus. If you want to mine content on a website, you must follow the website's terms of use, which is a contract between the owner of the website and the user. Often the terms of use will stipulate that you cannot mine or scrape content without permission.

These terms of use outweigh copyright law, so fair use does not apply here.

If you want to mine content from a library database, you must follow the license agreement that the library has negotiated on your behalf with the owner of the database (e.g., JSTOR, EBSCO, etc.). This agreement may or may not allow TDM. For info about specific databases available through Emory Libraries, please see these pages on this guide: Purchased Resources and Restricted Resources.

Privacy, Ethics, and Institutional Policy

In addition to copyright and contract law, there are a few other things to pay attention to when conducting TDM research.

Privacy

Privacy law protects the public disclosure of private information about an individual; this protection expires at the death of that individual. If you will be sharing content from your research corpus, it is important to think carefully about any private information you may wish to share to ensure you are disclosing no private information or the smallest amount possible.

Ethical Considerations

In conducting TDM research, also be mindful of ethical considerations. Would conducting or sharing your research put an individual or community at risk for harm or punishment? For online content, did the website metadata and robot.txt file specifically prohibit using the site for TDM?

Institutional Policy

Emory University researchers must be sure to abide by Emory policies. When conducting TDM projects, review and follow Emory's Copyright and IT Conditions of Use policies.

Additional Information

More information about legal implications of text data mining is becoming available all the time. We will update this page with resources as they appear. 

Contact for Copyright Questions

Profile Photo
John Morgenstern
Contact:
Robert W. Woodruff Library
540 Asbury Circle, Atlanta, GA 30322
(404) 727-8286