Proper documentation of sources is a key component of scholarly research, and the need for documentation is no less important for data sources than for bibliographic sources. It is vital that scholars in the social sciences and elsewhere who use quantitative data in their work be clear and explicit as to the sources for their data. This is in part because of the importance of reproducibility in the sciences - in principle, your analysis must be capable of being reproduced by other scholars so as to better-assess the soundness of your work, which requires that those other scholars have access to the data used in your original analysis.
Greater transparency in sources also encourages greater accountability in research and increases confidence that the data used in your work are suited for the questions you are asking, which in turn will make others more confident in whatever conclusions your research presents. In addition, thorough documentation of data sources makes it easier for scholars to assess whether sources used in your research are appropriate for use in their own work.
More generally, the scholarly community is placing a growing emphasis on greater transparency in and documentation of data sources, to the point where some scholarly journals even require authors to submit their data to archives where others can download them. We have also included examples of data availability policies from different journals in different fields as examples of how different fields choose to operationalize disciplinary norms of transparency in empirical research.
https://aeadataeditor.github.io/aea-de-guidance/addtl-data-citation-guidance.html - The editors of the American Economic Association have put together a guide for citing secondary data that provides general guidelines, specific examples/scenarios, and explanations for why citation of data is importance for replication/reproducibility and greater rigor in empirical research.
http://www.icpsr.umich.edu/files/ICPSR/enewsletters/iassist.html - This guide will help you figure out how to cite your data in a way that is informative and useful to others. See http://www.icpsr.umich.edu/icpsrweb/content/datamanagement/citations.html and https://youtu.be/xTUiefaq128 for additional information on and discussion of data citation, including its benefits for both collectors/producers of data and users of data.
http://www.esrc.ac.uk/files/funding/guidance-for-grant-holders/data-citation-what-you-need-to-know/ - This .pdf on data citation from the UK's Economic and Social Research Council is another guide to help you figure out how to cite your data in transparent and useful ways. See http://www.esrc.ac.uk/funding/guidance-for-grant-holders/data-citation/ for additional discussion from the ESRC on the need for data citation.
http://journals.plos.org/plosone/s/data-availability - "Data Availability:" The Public Library of Science's collection of journals "require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception."
http://dx.doi.org/10.1126/science.1257891 - "Raising the Bar:" Science "has established (...) a Statistical Board of Reviewing Editors (SBoRE), consisting of experts in various aspects of statistics and data analysis, to provide better oversight of the interpretation of observational data" as part of its process for reviewing manuscripts submitted for publication. See http://www.sciencemag.org/authors/science-editorial-policies for additional information about Science's data-availability policies.
http://www.nature.com/authors/policies/availability.html - "Availability of Data, Material, and Methods:" Nature has likewise established a policy on the availability of data and methods for replication of findings in published articles.
http://dx.doi.org/10.1093/toxsci/kfv330 - "More than Manuscripts: Reproducibility, Rigor, and Research Productivity in the Big Data Era" - a commentary in Toxiological Sciences that argues in favor of two things: (1) providing both data underlying publications and the code which transforms and analyzes those data to produce the results in those publications, and (2) treating data and code as citable contributions to science in their own right. The commentary was co-authored by Dr. Lance Waller in Emory's School of Public Health.
https://replicationnetwork.com/2018/12/21/hookworms-and-malaria-and-replications-oh-my/ - "Hookworms and Malaria and Replications, Oh My!" - a commentary from development economist David Roodman on replication and reanalysis and the importance of assessing empirical findings with new data and methods.
http://ineteconomics.org/ideas-papers/blog/economics-needs-replication - "Economics Needs Replication" - a commentary on the need for transparency in data sources and processing in Economics.
https://blog.repec.org/2020/08/04/a-replication-database-for-economics-and-social-sciences-the-replicationwiki/ - "A Replication Database for Economics and Social Sciences: The ReplicationWiki" - A commentary on the ReplicationWiki database of replication studies, which can be filtered by variables such as software used, keywords for the contents of the studies, and journals in which the studies were published. The database focuses largely on Economics, but it includes studies from other fields as well.
http://media.wix.com/ugd/fa8393_d55bef088ac44830bd194b5f80190479.pdf - "Openness in Political Science: Data Access and Research Transparency": A symposium published in PS: Political Science & Politics in January of 2014 that articulates the need for greater transparency in research practices, for both quantitative and qualitative data.
http://doi.org/10.1177/0049124107306658 - "Introduction to the Special Section on Replication and Data Access " - an introduction to a symposium in Sociological Methods & Research on replication of quantitative research in Sociology.
http://www.theguardian.com/news/datablog/2011/jul/28/data-journalism - "Data Journalism at the Guardian: What Is It and How Do We Do It?" - The Guardian newspaper is a first-rate practitioner of data journalism and has much practical advice on how to do data journalism well. It also has a nice visualization of its workflow for prepping data for stories. Their key, and very valuable, insight: working with data is "80% perspiration, 10% great idea, 10% output".