Skip to Main Content
About
For Instructors
For Students
Ask a Librarian
Contact
Events & Exhibits
Hours
Maps
My Library Card
News/Subscribe
Emory University
Emory Libraries
Emory Libraries
Research Guides
Oxford
Discovery Seminar - Intro to Data Science - Le - Spring 2025
Selecting a Dataset
Search this Guide
Search
Discovery Seminar - Intro to Data Science - Le - Spring 2025
This guide is for students in Dr. Le's Spring 2025 Discovery Seminar at Oxford College.
Home
Library Terminology
How to Use: Library Search
Selecting a Dataset
Dataset Reflection Questions
Dataset Sources
Evaluating Sources
Scholarly Communication in CS
Data Literacy: Practice!
Dataset Reflection Questions
The Data
How big was the sample size and what data may have been left out? What controls were in place?
What other information do you need to contextualize the information presented in the data?
What other questions does it give you?
The People
What factors not measured in the dataset could have affected how the data was collected or represented?
Who collected this data? When? What was their goal?
What is explicitly stated in the data? What is inferred?
Choosing a Dataset Handout
The handout we will be using in class.
Dataset Sources
UC Irvine Machine Learning Repository
The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.
Kaggle
Search for repositories and datasets - just make sure to check the source for quality/accuracy!
<<
Previous:
How to Use: Library Search
Next:
Evaluating Sources >>