This page is a collection of links for help with using R, SAS, SPSS, and Stata. The guides are very "from-the-ground-up" and cover multiple topics, from the basics of getting data into the program to various common data-management tasks to introductory data analysis.
These guides generally focus on using syntax to work with and analyze data in statistical software. While there are learning curves of varying degrees of steepness with each of these applications, a syntax-based approach to working with data is a more robust and reproducible means of doing empirical analysis and is the flip side of proper citation with regard to the coin of transparency in quantitative research. Simply put, syntax is documentation that spells out what you did to process and analyze the data and produce your findings, and access to that syntax thus shows others how you got your results. Some journals such as the American Economic Review and the American Journal of Political Science even require submission of syntax for cleaning and analyzing data as part of their submission policies on data availability.
For additional guidance on working with data, see the following:
- "Coding for Economists" - An introduction to coding, using Python and focusing on Economics, but applicable more generally.
- "Data Organization in Spreadsheets" - Excellent guidance and suggestions for how to format spreadsheets for data analysis.
- "DIME Wiki" - The World Bank's Development Impact Evaluation group has a very extensive guide for data collection, management, and publication. The guide includes a handbook at https://worldbank.github.io/dime-data-handbook/ with additional guidance and recommendations. Many of the coding examples are Stata-specific, but the principles and recommendations are applicable more generally.
- "Stata for Researchers: Project Management" - Written for Stata users, but applicable more generally.
- The Teaching Integrity in Empirical Research Project's recommended specifications and processes for working with data.
- "Tidy Data" - Written for R users, but applicable more generally.