Unlike some other social media outlets, getting data from Twitter is a relatively easy task. There are, however, some considerations to keep in mind if you are interested in getting and using tweets as a data source:
This guide is not an exhaustive guide for using Twitter as raw material for research. Instead, it is an introduction to getting data from Twitter. For an excellent introduction to using Twitter as a data sources, consider reading the book Twitter as Data, written by Zachary C. Steinert-Threlkeld and published by Cambridge University Press. This title covers a lot of the ins and outs of working with data from Twitter and is well worth the time to read.
Twitter has multiple APIs (Application Programming Interfaces) that you can use to access and download tweets. One is Twitter's REST (Representational State Transfer) API, which you can use to access past tweets and information from profiles of Twitter users. With this API, you can search for tweets on particular topics, but you are limited to tweets from the past 6-9 days. This API will also let you get tweets from a specific account. Here, you can get data further back in time, but only up to a maximum of 3200 tweets. Twitter also has a "streaming" API for collecting tweets in real time. This API provides you with a sample of up to 1% of the total volume of tweets.
Twitter has also recently added a "Academic Research Product Tract", which provides free access to "historical" tweets upon application and approval. You can read more about this option here and here.
To make use of Twitter's APIs, you need to have a Twitter account. You also need to get the appropriate credentials from Twitter - see https://apps.twitter.com/.
Here are some useful tools for collecting data from Twitter:
George Washington University's "Where to get Twitter data for academic research" has other suggestions for tools to use in collecting tweets and is, in general, a good primer on options for how/where to get data from Twitter.
You might also consider making use of Twitter datasets that have been collected by various academics and organizations. As noted above, Twitter places various limits on the extent to which you can share tweet data with others. As a result, these datasets generally consist of IDs of tweets rather than the content of tweets themselves. To get the contents of tweets, you can "hydrate" the IDs via the "Hydrator", a free tool that was developed by Documenting the Now. (Think of Rey in "Star Wars: The Force Awakens" putting flour in a dish of water to make a muffin, and you'll have a visual metaphor for what hydrating a tweet ID does.)
Some sources to consult for locating datasets consisting of tweet IDs:
Of these various collections, Documenting the Now is the most comprehensive and is generally the best place to start.
If neither pre-existing datasets nor collecting tweets yourself is suitable, e.g. for reasons of time coverage, then you may need to pay to get the data you need. Twitter now offers historical access to its archive based on a monthly fee. GWU's "Where To Get Twitter Data For Academic Research" guide mentioned above also mentions fee-based options for access to Twitter data. As noted above, there is also Twitter's new Academic Product Research Track.
As is noted above, Twitter is not meant to be a data source for researchers. That does not mean, however, that ethical considerations that accompany more "traditional" sources of data are not relevant for users of Twitter data:
(1) Sensitivity and Harm: Research using data from the likes of Twitter does have the potential to cause harm to would-be research subjects. The content of tweets may, for instance, relate to matters damaging to the people who tweeted them, such as tweets about illegal or illicit behavior. Or, we may be talking about tweets from vulnerable populations, such as children or people with mental illnesses or people living in authoritarian states or in violent contexts. Compiling such data creates the potential for information to be spread beyond what was originally intended, with resulting potential for harm.
(2) Privacy and Consent: Survey data of Twitter users suggest that people who post on Twitter are not always comfortable with their posts being aggregated and used for research purposes. While we are talking about social media here, you should keep in mind that Twitter allows users to delete tweets and make accounts private. In addition, Twitter users may intend their posts to be limited to networks and friends. The point being, there is a bit of a disconnect between researchers and the public here. The former are using tweets and sharing data from them in ways that the latter may not have intended and with which they may not be comfortable.
For more intensive and detailed discussions of this topic, we have compiled a list of suggested readings on Twitter and research ethics (pdf).