When using text data mining (TDM) as part of a research strategy, remember that the materials you mine may be protected by copyright (see our site on Deciding If You Can Use a Copyrighted Work for info on determining the copyright status of a corpus). For copyrighted corpora, the practice of TDM is often allowed by fair use, which is an exception to copyright (for more info, see Issue Brief: Text and Data Mining and Fair Use in the United States [PDF] from the Association of Research Libraries). However, fair use may not protect you in the following situations:
If you want to mine content from a library database, you must follow the license agreement that the library has negotiated on your behalf with the owner of the database (e.g., JSTOR, EBSCO, etc.). This agreement may or may not allow TDM. For info about specific databases available through Emory Libraries, please see these pages on this guide: Purchased Resources and Restricted Resources.
In addition to copyright and contract law, there are a few other things to pay attention to when conducting TDM research.
Privacy law protects the public disclosure of private information about an individual; this protection expires at the death of that individual. If you will be sharing content from your research corpus, it is important to think carefully about any private information you may wish to share to ensure you are disclosing no private information or the smallest amount possible.
In conducting TDM research, also be mindful of ethical considerations. Would conducting or sharing your research put an individual or community at risk for harm or punishment? For online content, did the website metadata and robot.txt file specifically prohibit using the site for TDM?
More information about legal implications of text data mining is becoming available all the time. We will update this page with resources as they appear.
Courtney, Kyle K., Samberg, R., & Vollmer, T. (2020). Big data gets big help: Law and policy literacies for text data mining. College & Research Libraries News, 81(4).