When using text data mining (TDM) as part of a research strategy, remember that the materials you mine may be protected by copyright (see our site on Deciding If You Can Use a Copyrighted Work for info on determining the copyright status of a corpus). For copyrighted corpora, the practice of TDM is often allowed by fair use, which is an exception to copyright (for more info, see Issue Brief: Text and Data Mining and Fair Use in the United States [PDF] from the Association of Research Libraries). However, fair use may not protect you in the following situations:
For more information on copyright, see the Emory Scholarly Communications website. If you have questions about copyright and TDM, please contact Emory's Scholarly Communications Office.
Contract law can prevent you from performing TDM on a corpus. If you want to mine content on a website, you must follow the website's terms of use, which is a contract between the owner of the website and the user. Often the terms of use will stipulate that you cannot mine or scrape content without permission.
These terms of use outweigh copyright law, so fair use does not apply here.
If you want to mine content from a library database, you must follow the license agreement that the library has negotiated on your behalf with the owner of the database (e.g., JSTOR, EBSCO, etc.). This agreement may or may not allow TDM. For info about specific databases available through Emory Libraries, please see these pages on this guide: Purchased Resources and Restricted Resources.
In addition to copyright and contract law, there are a few other things to pay attention to when conducting TDM research.
Privacy
Privacy law protects the public disclosure of private information about an individual; this protection expires at the death of that individual. If you will be sharing content from your research corpus, it is important to think carefully about any private information you may wish to share to ensure you are disclosing no private information or the smallest amount possible.
Ethical Considerations
In conducting TDM research, also be mindful of ethical considerations. Would conducting or sharing your research put an individual or community at risk for harm or punishment? For online content, did the website metadata and robot.txt file specifically prohibit using the site for TDM?
Institutional Policy
Emory University researchers must be sure to abide by Emory policies. When conducting TDM projects, review and follow Emory's Copyright and IT Conditions of Use policies.
More information about legal implications of text data mining is becoming available all the time. We will update this page with resources as they appear.
Courtney, Kyle K., Samberg, R., & Vollmer, T. (2020). Big data gets big help: Law and policy literacies for text data mining. College & Research Libraries News, 81(4).
Althaus, S., et al. (n.d.). Building legal literacies for text data mining. University of California, Berkeley.