Sociology 504: Advanced data analysis for the social sciences
Princeton University, Spring 2015
Preceptor: Angela Dixon
Overview
This course is the second of the two-semeter sequence for Ph.D. students in Sociology. In this course, students will learn the statisical and computational principles necessary to perform modern, flexible, and creative analysis of quantiative social data. This course hopes to transfrom students from consumers of quantative research to producers of it.
See the logistics page for more information about time and location, prerequisites, software, code conventions, collaboration policy, Piazza, and inspirations.
Goals
By the end of the semester, you will be able to:
- Conduct, interpret, and communicate results from analysis using multiple regression (including dummy variables and interactions).
- Conduct, interpret, and communicate results from analysis using logistic regression (including dummy variables and interactions).
- Describe the relationship between multiple regression, logistic regression, and then generalized linear model.
- Explain the limitations of observational data for making causal claims, and begin to use existing strategies for attempting to make causal claims from observational data.
- Write clean, reusable, and reliable R code.
- Build a solid, reproducible research pipeline to go from raw data to final paper.
- Feel empowered working with data.
Further, because we cannot possibly cover everything that you will need to know during your career as a researcher, there are two final long-term goals. After this course is over, you will be able to:
- Learn new statistics
- Learn new programing
Assignments
There are three main types of assignments for students:
- Preparing for class: For many classes there will be some reading (or video watching) that you must do before class. I expect you to come to class 100% prepared. I will assign a reasonable amount of stuff, and you must do it. I will not spend valuable class time summarizing readings that you should have done before class. Rather, we are going to use class time for more valuable learning activities.
- Weekly homework: Learning data analysis takes practice. There will be weekly homework assignments, and these assigments are described in detail on the homework page.
- Replication and extension project: Students will replicate and extend a published paper. For more information see the project page.
Github
All class materials are available from our class github page.
Open access
I have marked open access materials with a and closed access materials with a . If you do not have access to a university library, copies of many of the closed access articles can be found through Google Scholar.
Schedule
Introduction, 2015-02-02
Before class:
Optional after class:
Lab-Tranforming data with dplyr, 2015-02-02
Before class:
Optional after class:
Doing data analysis: An introduction to software engineering, 2015-02-04
Before class:
Optional after class:
Visualization, 2015-02-09
Before class:
Optional after class:
Lab-Visualizing data with ggplot2, 2015-02-09
Before class:
- Watch Visualizing Data Using ggplot2 by David Robinson.
- Introduction (about 3 minutes)
- Scatter Plots (about 8 minutes)
- Faceting and Additional Options (about 4 minutes)
- Histograms and Density Plots (about 4 minutes)
- Boxplots and Violin Plots (about 3 minutes)
- Input- Getting Data into the Right Format (about 9 minutes) [note: we will not use qplot(), but you should know that it exists]
- Output- Saving Your Plots (about 3 minutes)
Optional after class:
Version control with git and github, 2015-02-11
Before class:
Optional after class:
Regression and diagnostics, 2015-02-16
Before class:
Optional after class:
Lab-Running regressions, 2015-02-16
Before class:
Optional after class:
Multiple regression and diagnostics, 2015-02-18
Before class:
Optional after class:
Dummy variables and interaction, 2015-02-23
Before class:
- Fox, Chapter 7. (skim 7.2.1).
Optional after class:
Lab-Dummy variables and interactions, 2015-02-23
Before class:
Optional after class:
Dummy variables and interactions in practice, 2015-02-25
Before class:
Optional after class:
Statistical inference for regression, 2015-03-02
Before class:
- Fox, Chapter 6. (Available from Blackboard)
- Berk, Chapter 4 (skip Section 4.6). (Available from Blackboard)
Optional after class:
Lab-Loops and functions, 2015-03-02
Before class:
Optional after class:
Beyond star gazing, 2015-03-04
Before class:
- Nunzo, R. (2014) Scientific method: Statistical errors Nature.
- Cohen, J. (1994). The earth is round (p < .05) American Psychologist.
- Simmons, J. et al. (2014) False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science.
- King, G. Tomz, M., and Wittenberg, J. (2000). Making the Most of Statistical Analyses: Improving Interpretation and Presentation. American Journal of Political Science.
- Ward, M.D., Greenhill, B.D., and Bakke, K.M. (2010). The perils of policy by p-value: Predicting civil conflicts. Journal of Peace Research.
Optional after class:
Matrix approach to regression, 2015-03-09
Before class:
Optional after class:
Lab-Reproduction studio, 2015-03-09
Before class:
Optional after class:
Maximum likelihood approach to regression, 2015-03-11
Before class:
- Fox, Chapter 9, Sections: 9.3 - 9.5. (Available on blackboard).
Optional after class:
Causal inference and potential outcomes, 2015-03-23
Before class:
Lab-Turning tables into graphs, 2015-03-23
Before class:
Causal graphs, 2015-03-25
Before class:
Optional after class:
Conditioning and matching for causal inference, 2015-03-30
Before class:
- Mogran and Winship (2015) Counterfactuals and Causal Inference: Chapter 4 (Models of causal exposure and identification criteria for condition estimators) and Chapter 5 (Matching estimators of causal effects). (Available from Blackboard)
Optional after class:
Lab-Replication studio, 2015-03-30
Before class:
Regression, causal inference, and shoe leather, 2015-04-01
Before class:
Logit and probit models for categorical response variables, 2015-04-06
Before class:
Lab-Working with logit and probit coefficients, 2015-04-06
Before class:
Logit and probit models: Not as simple as you thought, 2015-04-08
Before class:
Optional after class:
Models for polytomous data, 2015-04-13
Before class:
Lab-Working with models for polytomous data, 2015-04-13
Before class:
Generalized linear model and models for count data, 2015-04-15
Before class:
Making simple (and complex) models more flexible and interesting, 2015-04-20
Before class:
Lab-Hurricanes!, 2015-04-20
Before class:
Multilevel modeling, 2015-04-22
Before class:
Sampling, networks, and hidden populations, 2015-04-27
Before class:
Optional after class:
Lab-Project presentations, 2015-04-27
Before class:
NOTE: This lab will end at 4:30 so that we can attend the Tumin Lecture.
Cautions, warnings, and wisdom, 2015-04-29
Before class:
- Fox, Chapter 1.
- Berk (2003). Regression Analysis: A Constructive Critque: Chapter 11 (What to do). [Available on Blackboard]
- Rosenbaum (2002). Observational Studies: Chapters 11 (Planning an observational study) and 12 (Some strategic issues). [Available on Blackboard]