(Sociology 596) Computational Social Science: Social Research in
the Digital Age
Note: This course is officially listed as "Web-based Social Research"
Princeton University
Fall 2012
Time: Tuesday 2pm-5pm (second half of semester)
Location: 190 Wallace Hall
Instructor: Matthew Salganik
In the last decade we have witnessed the birth and rapid growth of Wikipedia, Google, Facebook, iPhones, Wi-Fi, YouTube, Twitter, and numerous other marvels of the digital age. In addition to changing the way we live, these tools---and the technological revolution they are a part of---have fundamentally changed the way that we can learn about the social world. We can now collect data about human behavior on a scale never before possible and with tremendous granularity and precision. The ability to collect and process "big data" enables researchers to address core questions in the social sciences in new ways and opens up new areas of inquiry.
This course on computational social science will emphasize social
science rather than computation. We will focus on how traditional
concepts of research design in the social sciences can inform our
understanding of new data sources, and how these new data sources
might require us to update our thinking on research design.
Now a little about mechanics. Each three hour class will consist of a general discussion based on several readings. Then, students will take turns presenting specific papers that apply the ideas from the general discussion. Students are expected to come to class prepared for the general discussion as well as present a few articles during the course of the semester. There will be no exam.
Your grade will be based on the following components:
- Class participation and in-class presentations: 25%
Each student will be expected to present a few articles during the course of the year. Each presentation should begin with a 30-second summary of the article and then move to a more elaborate discussion of the key issues in the paper. The student presenter will be expected to answer any questions that come up from the class.
- Response papers: 75%
Each student will be expected to write a short response paper
(2-3 pages) every week, except the first week. Students should view
them as a chance to play with the ideas in the readings: look for
contradictions, establish connections to your own research, develop
empirical tests, etc. The response papers should not be simple
summaries of the readings. All response papers should be sent to me by Monday at midnight on the day preceding the class.
There are no official prerequisites for the course, and students from all departments are welcome. Undergraduates interested in taking the course should contact the instructor for permission.
Meeting 1 (11/6/12) Introduction and Ethics
In this first class we will cover a broad overview of web-based
research, focusing on both strengths and weaknesses. We will also
discuss ethical issues that will arise throughout the course.
For general discussion
Lazer, D. et al. 2009. Computational social
science. Science, 323:721-723.
Watts, D.J. 2007. A
twenty-first century science. Nature, 445:489.
Anderson, C. 2008. The
End of Theory: The Data Deluge Makes the Scientific Method
Obsolete. Wired.
Simonite, T. 2012. What
Facebook Knows. MIT Technology Review.
Barbaro and Zeller. 2006. A Face is Exposed for AOL Searcher No. 4417749. New York Times, August 9.
Nissenbaum, H. 2010. Privacy in Context, Stanford University
Press. Introduction.
King, G. 2011. Ensuring the Data-Rich Future of the Social
Sciences. Science, 331(6018):719-721.
Zimmer, M. 2010. "But the data is
already public": on the ethics of
research in Facebook. Ethics and Information
Technology, 12:313-325.
boyd and Crawford. 2011. Six Provocations for Big
Data. Working paper.
Meeting 2 (11/13/12) Individual experiments
The web offers numerous advantages over the traditional laboratory
for the conduct of social science experiments. First, the web allows
researchers to conduct experiments on a completely different scale;
lab experiments are limited to hundreds of participants, but
web-based experiments involving tens of thousands of participants
have already been conducted and larger experiments are becoming
increasingly practical. The web also allows researchers access to a
much broader pool of participants and allows researchers to study
decision making in a more natural environment. But, conducting
experiments on the web also includes some drawbacks including unknown
participant pools and limited control over participants. In this
meeting we will discuss four types of web-based experiments where the
unit of analysis is an individual: A/B tests on existing sites, overlayed experiments on existing sites, quasi-experiments, and experiments using micro-payment platforms (e.g. Amazon's Mechanical Turk). The strengths and weaknesses of the various approaches will be compared.
For general discussion
Doleac and Stein. 2010. The Visible Hand: Race and
Online Market Outcomes. Working paper.
Kohavi, Deng, Frasca, Longbotham, Walker, and
Yu. 2012. Trustworthy Online Controlled Experiments: Five Puzzling
Outcomes Explained. KDD.
Bakshy, Eckles, Yan, and Rosenn. 2012. Social Influence in Social Advertising: Evidence from Field Experiments. EC.
Mas and Moretti. 2009. Peers at Work. American Economic Review, 99: 112-45.
Einav, Kuchler, Levin, and Sundaresan. 2011. Learning from Seller
Experiments in Online Markets. NBER Working Paper No. 17385.
Berinsky, Huber, and Lenz. 2012. Evaluating Online
Labor Markets for Experimental Research: Amazon.com's Mechanical
Turk. Political Analysis, 20:351-368.
Horton, Rand, and Zeckhauser. 2011. The online
laboratory: conducting experiments in a real labor market.
Experimental Economics, 14(3):399-425.
For presentation
Kohli, Bachrach, Stillwell, Kearns, Herbrich, and Graepel. 2012. Colonel Blotto on
Facebook: The Effect on Social Relations on Strategic Interaction.
Web Science'12.
Bakshy, Rosenn, Marlow, and Adamic. 2012. The role of
social networks in information diffusion. WWW.
Aral and Walker. 2011. Creating Social Contagion
Through Viral Product Design: A Randomized Trial of Peer Influence in
Networks. Management Science, 57: 1623-1639.
Restivo and van de Rijt. 2012. Experimental Study of
Informal Rewards in Peer Production. PLoS ONE, 7:e34358.
Broockman and Green. 2012. Can Facebook Advertisements
Increase Political Candidates' Name Recognition and Favorability?
Evidence from a Randomized Field Experiment. Working paper.
Horton. 2012. Computer-Mediated Matchmaking:
Facilitating Employer Search and Screening. Working
Tucker and Zhang. 2011. How Does Popularity
Information Affect Choices? A Field Experiment. Management Science, 57:828-842.
Mason and Watts. 2009. Financial incentives and the
performance of crowds. KDD.
Horton. 2010. Employer
Expectations, Peer Effects and Productivity: Evidence from a Series
of Field Experiments. Working paper.
Mason and Suri. 2012. A Guide to
Behavioral Experiments on Mechanical Turk. Behavior Research Methods, 44(1), 1-23.
Meeting 3 (11/20/12) Collective experiments
The web also allows for collective experiments, where the unit of
analysis is a group, not an individual. These collective experiments
introduce numerious logistical complications, but can be used to
address questions that are otherwise extremely difficult to study.
For general discussion
Hedstrom. 2006. Experimental macro sociology:
Predicting the next best seller. Science, 311:786-787.
Salganik, Dodds, and Watts 2006. Experimental study of inequality and unpredictability in an artificial cultural market. Science, 311:854-856 (also read supporting online materials).
Suri and Watts. 2011. Cooperation and contagion in
web-based, networked public goods experiments. PLoS
One, 6:e16836.
van der Leij. 2011. Experimenting with
Buddies. Science, 334:1220-1221.
Centola. 2010. The spread of behavior in an online social network experiment. Science, 329:1194-1197 (also read supporting online materials).
Centola. 2011. An Experimental Study of Homophily in
the Adoption of Health Behavior. Science,
334:1269-1272 (also read supporting
online material).
For presentation
Salganik and Watts. 2008. Leading
the herd astray: Experimental study of self-fulfilling prophecies in
an artificial cultural market. Social Psychology
Quarterly, 71:338-355.
Salganik and Watts. 2009. Web-Based
Experiments for the Study of Collective Social Dynamics in Cultural
Markets. Topics in Cognitive Science, 1:439-468.
Mason and Watts. 2012. Collaborative learning in
networks. Proceedings of the National Academy of
Sciences, 109(3):764-769.
Wang, Suri, and Watts. 2012. Cooperation and
assortativity with dynamic partner updating Proceedings of the National Academy of
Sciences, 109(36):14363-14368.
Isaac, Walker, and Williams. 1994. Group size and the
voluntary provision of public goods: Experimental evidence utilizing
large groups. Journal of Public Economics, 54:1-36.
Meeting 4 (11/27/12) Mobile phones and wearable sensors
There are approximately four billion mobile phones in the world.
While these devices are often thought of as "phones," the newest
wave of "smart phones" that are increasingly dominant in developed
countries are actually sophisticated mobile computers that offer
amazing opportunities for researchers. In this class we will
discuss the two main forms of research using mobile phones and
wearable sensors: research that uses data collected from individual
devices and research that uses aggregate data collected by mobile
phone companies. Within the category of research that users
individual devices, we will distinguish between research that uses
phones and research that uses custom-build devices. We will also
distinguish between active and possive data collection.
For general discussion
Eagle. 2010. Mobile Phones as Sensors for Social Research. in The
Handbook of Emergent Technologies in Social Research. Hesse-Biber
(Ed.). [to be posted on blackboard]
Kaplan and Stone. 2012. Bringing
the Laboratory and Clinic to the Community: Mobile Technologies for
Health Promotion and Disease Prevention. Annual Review of
Psychology, in press.
Palmer, Espenshade, Bartumeus, Chung, Ozgencil, and Li New Approaches to Human Mobility: Using Mobile Phones for
Demographic Research. Demography, forthcoming.
Gething and Tatem. 2011. Can Mobile Phone Data
Improve Emergency Response to Natural Disasters? PLoS
Medicine, 8(8):e1001085.
Bengtsson, Lu, Thorson, Garfield, von Schreeb. 2011. Improved
Response to Disasters and Outbreaks by Tracking Population Movements with Mobile Phone Network Data: A Post-Earthquake
Geospatial Study in Haiti. PLoS Medicine
Salate, Kazandjieva, Lee, Levis, Feldman, Jones. 2010. A high-resolution human contact network for infectious
disease transmission. Proceedings of the National Academy
of Sciences, 107(51):22020-22025.
For presentation
Miller. 2012. The Smartphone Psychology Manifesto. Perspectives on
Psychological Science, 7(3):221-237.
Raento, Oulasvirta, and Eagle. 2009. Smartphones: An
Emerging Tool for Social Scientists. Sociological
Methods and Research, 37(3):426-454.
Kazandjieva, Lee, Salathe, Feldman, Jones, Levis. 2010. Experiences in Measuring Human Contact Network for
Epidemiological Research. HotEmNets.
Chittaranjan, Blom, and Gatica-Perez. 2011. Mining
large-scale smartphone data for personality studies. Personal and Ubiquitous Computing.
Blumenstock, Eagle, and Fafchamps, 2011. Charity and
Reciprocity in Mobile Phone-Based Giving: Evidence in the Aftermath
of Earthqaukes and Natural Disasters. Working
Wyatt, Choudhury, Bilmes, and Kitts. 2011. Inferring
Colocation and Conversation Networks from Privacy-Sensitive Audio
with Implications for Computational Social Science. ACM
Transactions on Intelligent Systems and Technology, 2(1).
Bagrow, Wang, and Barabasi. 2011. Collective
Response of Human Populations to Large-Scale Emergencies. PLoS ONE, 6(3):e17680.
Onnela, et al. 2008. Structure and tie
strengths in mobile communications networks. Proceedings
of the National Academy of Sciences, 104(18):7332-7336.
Wuchty. 2009. What is a social
tie? Proceedings of the National Academy of
Sciences, 106(36):15099-15100.
Eagle, Pentland, Lazer. 2009. Inferring social
network structure using mobile phone data. Proceedings
of the National Academy of Sciences, 106(36):15274-15278. with Comment and Reply.
Wesolowski, Eagle, Noor, Snow, and Buckee. 2012. Heterogeneous Mobile Phone Ownership and Usage Patterns in
Kenya. PLoS ONE 7(4): e35319.
Wesolowski and Eagle. 2010. Parameterizing
the dynamics of slums. 2010 AAAI Spring Symposium Series.
Meeting 5 (12/4/12) Digital traces
Human behavior in the digital age often leaves behind traces, and
these traces are being aggregated on a scale that is difficult to
comprehend. In this meeting we will discuss the strengths and
weaknesses of using these traces for social research.
For general discussion
Polgreen, Chen, Pennock, Nelson, Weinstein. 2008. Using Internet Searches for Influenza Surveillance.
Clinical Infectious Disease, 47(11):1443-1448.
Helft. 2008. Google Uses Searches to Track Flu's
Spread. New York Times.
Ginsberg, Mohebbi, Patel, Brammer, Smolinski, and
Brilliant. 2008. Detecting influenza epidemics using
search engine query data. Nature, 457:1012-1014.
Butler. 2008. Web
data predict flu. Nature, 456, 287-288.
Goel, Hofman, Lahaie, Pennock, and Watts. 2010. Predicting
consumer behavior with Web search. Proceedings of the National Academy of
Sciences, 107(41):17486-17490.
Cook, Conrad, Fowlkes, and Mohebbi. 2011. Assessing
Google Flu Trends Performance in the United States during the 2009
Influenza Virus A (H1N1) Pandemic. PLoS ONE, 6(8):e23610.
Correlate: The Comic Book.
Ugander, Backstrom, Marlow, and Kleinberg. 2012. Structural
diversity in social contagion. Proceedings of the National Academy of
Sciences, 109(16):5962-5966.
Kossinets, G. and Watts, D.J. (2009). Origins of Homophily in an Evolving Social Network. American Journal of Sociology, 115(2):405-450.
For presentation
Schneider and Buckley. 2002. What do parents want from schools? Evidence from the internet. Education Evaluation and Policy Analysis, 24(2):133-144.
Backstrom, Sun, and Marlow. 2010. Find Me If You Can:
Improving Geographical Prediction with Social and Spatial
Proximity. WWW.
Wimmer and Lewis. 2010. Beyond and below racial homophily. ERG models of a friendship network documented on Facebook. American Journal of Sociology, 116(2):583-642.
Wuchty and Uzzi. 2011. Human Communication Dynamics in
Digital Footsteps: A Study of the Agreement between Self-Reported
Ties and Email Networks. PLoS ONE, 6(11):e26972.
De Choudhury, Mason, Hofman, Watts. 2010. Inferring
relevant social networks from interpersonal communication. WWW.
Aral and Van Alstyne. 2011. The Diversity-Bandwidth
Trade-off. American Journal of Sociology, 117(1):90-171.
Lewis, Gonzalez, and Kaufman. 2012. Social selection
and peer influence in an online social network. Proceedings of the National Academy of
Sciences, 109(1):68-72.
Baker and Fradkin. 2011. What drives job
search? Evidence from Google search data. Working paper.
Stephens-Davidowitz. 2012. The Effects of Racial
Animus on a Black Presidential Candidate: Using Google Search Data
to Find What Surveys Miss. Working paper.
Golder and Macy. 2011. Diurnal and
Seasonal Mood Vary with Work, Sleep and Daylength Across Diverse
Cultures. Science, 333:1878-81.
Meeting 6 (12/11/12) Crowdsourcing, Citizen Science, and Conclusions
Anyone who has used Wikipedia understands the power of
large-scale social collaboration. How can we harness this collective
power for other intellectual challenges?
For general discussion
Watch Luis von Ahn's talk at google on human computation
The Economist. 2007. Spreading the load. The Economist, Dec 8.
Markoff. 2010. In a Video Game, Tackling the Complexities of Protein Folding. New York Times August 9.
Barker. 2008. Trying to Design a Truly Entertaining Game Can Defeat Even a Certified Genius. Wired.
Cooper et al. 2010. Predicting protein
structures with a multiplayer game. Nature
Fortson, Masters, Nichol, Borne, Edmondson, Lintott, Raddick,
Schwainski, and Wallin. 2011. GalaxyZoo: Morphological
Classifications and Citizen Science. Advances in Machine
Learning and Data Mining for Astronomy, in press.
Tuite, Snavley, Hsiao, Tabing, and Popovic. 2011. PhotoCity:
Training Experts at Large-scale Image Acquisition Through a
Competetive Game. CHI.
For presentation
von Ahn and Dabbish. 2008. Designing games with a purpose. Communications of the ACM, 58-67.
von Ahn, et al. 2008. reCAPTCHA:
Human-based character recognition via web security measures. Science, 321(5895):1465-1468.
Khatiba, Cooper, Tykaa, Xu, Makedon, Popovic, Baker, and FoldIt
Players. 2011. Algorithm discovery by protein folding
game players Proceedings of the National Academy
of Sciences, 108:(47):18949-18953.
Thompson. 2008. If You Liked This, You're Sure to
Love That. New York Times.
Bell, Koren, and Volinsky. 2010. All Together Now: A
Perspective on the Netflix Prize. Chance, 23(1):24-29.