Computational Social Science:
Social Research in the Digital Age


Sociology 596
Princeton University
Fall 2016

Tuesday 2pm-5pm (first half of semester)
165 Wallace Hall
Course materials: class github repository
Instructor: Matthew Salganik

Overview

Changes in technology---specifically the transition from the analog age to the digital age---mean that we can now collect and analyze social data in new ways. This six week mini-class is about doing social research in these new ways. Unlike some other courses on computational social science, this course will emphasize "social science" and de-emphasize "computation." We will focus on how traditional concepts of research design in the social sciences can inform our understanding of new data sources, and how these new data sources might require us to update our thinking on research design. The course should be helpful for social scientists that want to do more data science and data scientists that want to do more social science.

Course goals and learning objectives

  1. Students will describe the opportunities and challenges that the digital age creates for social research.
  2. Students will evaluate modern social research from the perspectives of both social science and data science.
  3. Students will create modern research proposals that blend ideas from social science and data science.
  4. Students will practice the techniques needed to actually conduct their proposed research (optional).

Course activities

Meeting structure

Each class meeting will be split into four main parts:

In general, the class will be a mix of professor-led discussion and student-led discussion. As the semester progresses, I will expect the students to take an increasingly active role in the course.

Logistics

See the logistics page for more information about time and location, prerequisites, collaboration policy, Piazza, grading, and open access.


Introduction and Ethics (September 20, 2016)

In this first class we will cover a broad overview of computational social science, focusing on blending ideas from social science and data science. A theme that runs throughout the course is ethics so we will cover it in the first week.

Slides

Big data (September 27, 2016)

Human behavior in the digital age often leaves behind traces, and these traces are being aggregated by companies and governments on a massive scale. This week we will discuss the strengths and weaknesses of using these big data sources for social research. Then, I'll describe three approaches that can help you learn from these big data sources: counting things, forecasting, and approximating experiments.

Slides

Surveys (October 4, 2016)

This week I'll begin by explaining that big data sources will not replace surveys. In fact, the abundance of big data sources increases---not decreases---the value of surveys. Given that motivation, I’ll summarize the total survey error framework that was developed during the first two eras of survey research. This framework enables us to understand new approaches to representation (e.g., non-probability samples) and new approaches to measurement (e.g., new ways of asking questions to respondents). Finally, I’ll describe two research templates for linking survey data to big data sources.

Slides

Running experiments (October 11, 2016)

Randomized controlled experiments have proven to be a powerful way to learn about the social world, and this week we will see how you can use them in your research. We will describe the difference between lab experiments and field experiments and the differences between analog experiments and digital experiments. Further, I’ll argue that digital field experiments can offer the best features of analog lab experiments (tight control) and analog field experiments (realism), all at a scale that was not possible previously. Next, I’ll describe three concepts---validity, heterogeneity of treatment effects, and mechanisms---that are critical for designing rich experiments. With that background, I’ll describe the trade-offs involved in the two main strategies for conducting digital experiments: doing it yourself or partnering with the powerful. Finally, I’ll conclude with some design advice about how you can take advantage of the real power of digital experiments and describe some of responsibility that comes with that power.

Slides

Mass collaborations (October 18, 2016)

Wikipedia is amazing. A mass collaboration of volunteers created a fantastic encyclopedia that is available to everyone. The key to Wikipedia’s success was not new knowledge; rather, it was a new form of collaboration. The digital age, fortunately, enables many new forms of collaboration. Thus, we should now ask: what massive scientific problems---problems that we could not solve individually---can we now tackle together? Mass collaboration has a long, rich history in fields such as astronomy and ecology, but it is not yet common in social research. However, by describing successful projects from other fields and providing a few key organizing principles, I hope to convince you of two things. First, mass collaboration can be harnessed for social research. And, second, researchers who use mass collaboration will be able to solve problems that had previously seemed impossible. Although mass collaboration is often promoted as a way to save money, it is much more than that. As I will show, mass collaboration doesn’t just allow us to do research cheaper, it allows us to do research better.

Slides

Student-selected topic and pitch day (October 25, 2016)

For the final week of class, students will select the topic. I'll update the syllabus once the choice is complete. In this final class, the students will also pitch their final projects.

For the final week of class, students will present and discuss their final research proposal. Then, we will collectively generate a set ideas about possible next steps for continuing your training in computational social science.

Slides

Acknowledgements

This class was shaped by conversations with Brandon Stewart, especially his class on Text as Data from Spring 2016.



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.