DATA
ANALYSIS NOTES: LINKS AND GENERAL GUIDELINES
Oscar Torres-Reyna
DSS Data Consultant
Finding the question is often more important than finding
the answer
John Tukey
I do not understand the output of my
regression!!!
Data Analysis: Annotated Output
Exploring data
http://dss.princeton.edu/training/StataTutorial.pdf
Linear regression
http://dss.princeton.edu/training/Regression101.pdf
Logit
regression, ordered logit regression
http://dss.princeton.edu/training/Logit.pdf
Factor analysis
http://dss.princeton.edu/training/Factor.pdf
Panel data, fixed effects, random
effects
http://dss.princeton.edu/training/Panel101.pdf
Multilevel analysis
http://dss.princeton.edu/training/Multilevel101.pdf
Time Series
http://dss.princeton.edu/training/TS101.pdf
Descriptive Statistics
http://www.princeton.edu/~otorres/Excel
Data Analysis: Annotated Output
http://www.ats.ucla.edu/stat/AnnotatedOutput/default.htm
Regression with Stata
http://www.ats.ucla.edu/STAT/stata/webbooks/reg/default.htm
Regression
http://www.ats.ucla.edu/stat/stata/topics/regression.htm
How to interpret dummy variables
in a regression
http://www.ats.ucla.edu/stat/Stata/webbooks/reg/chapter3/statareg3.htm
Logit output:
what are the odds ratios?
http://www.ats.ucla.edu/stat/stata/library/odds_ratio_logistic.htm
Is my model OK?
Regression diagnostics: A
checklist
http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htm
Logistic
regression diagnostics: A checklist
http://www.ats.ucla.edu/stat/stata/webbooks/logistic/chapter3/statalog3.htm
Times series
diagnostics: A checklist (pdf)
http://homepages.nyu.edu/~mrg217/timeseries.pdf
Times series: dfueller test for unit roots (for R and Stata)
http://www.econ.uiuc.edu/~econ472/tutorial9.html
http://dss.princeton.edu/training/TS101.pdf
(Stata)
http://www.stata.com/support/faqs/stat/panel.html
http://www.stata.com/support/faqs/stat/xtreg.html
http://www.stata.com/support/faqs/stat/xt.html
http://dss.princeton.edu/online_help/analysis/panel.htm
Generating confidence intervals
http://fhss.byu.edu/polsci/Goodliffe/504/stataci.pdf
Confidence intervals in logistic
regression
http://www.stata.com/support/faqs/stat/prep.html
Chow Test
http://dss.princeton.edu/training/TS101.pdf#page=23
http://www.stata.com/support/faqs/stat/awreg.html
Marginal effects
http://www.stata.com/help.cgi?margins
http://www.stata.com/support/faqs/stat/mfx_ologit.html
http://www.stata.com/support/faqs/stat/mfx_size.html
Outliers, influential and leverage
(using SPSS)
http://faculty.chass.ncsu.edu/garson/PA765/regress.htm#outlier2
Quandt likelihood ratio
(QLR test) or sup-Wald statistic
http://dss.princeton.edu/training/TS101.pdf#page=24
How to create dummies
http://www.stata.com/support/faqs/data/dummy.html
http://www.ats.ucla.edu/stat/stata/faq/dummy.htm
Making
publication-style tables in STATA
http://www.fiu.edu/~tardanic/make.pdf
How can I create variables containing percent
summaries?
http://www.stata.com/support/faqs/data/percentvars.html
Topics in Statistics
http://www.ats.ucla.edu/stat/mult_pkg/whatstat/default.htm
Statnotes: Topics in Multivariate Analysis, by G. David Garson
http://www2.chass.ncsu.edu/garson/pa765/statnote.htm
Elementary Concepts in Statistics
http://www.statsoft.com/textbook/stathome.html
Introductory Statistics:
Concepts, Models, and Applications
http://www.psychstat.missouristate.edu/introbook/sbk00.htm
Statistical Data Analysis
http://www.ats.ucla.edu/STAT/stata/library/GraphExamples/default.htm
http://www.indiana.edu/~statmath/stat/all/ttest/
Online Training Section at DSS
http://dss.princeton.edu/training/
BOOK: Stock,
James H. and Mark Watson, Introduction to
Econometrics, Addison Wesley, 2003.
A very general guideline…
Once you define the
question and, hopefully, have a clear idea of what you want to know you can
proceed to apply the statistical technique suitable for your data.
At first you need to
answer two questions:
1.
What is your dependent variable?
2.
What is(are)
your independent variable(s)?
There is no a straight answer on what kind of technique
you need to use for your data. Two factors play a role:
1. Your theory
2. Your data
3. Your knowledge on the topic
For practical purposes the statistical technique you
choose will depend mostly on the type of your dependent variable. See the
following site for types of analysis using different types of dependent
variables http://www.ats.ucla.edu/stat/mult_pkg/whatstat/default.htm.
In general, if your dependent
variable is:
1.
Dichotomous – (0, 1/ male, female) Use logit
or probit. Logit is the
most common application.
2.
Ordered – (1, 2, 3, 4/ bad, not so bad, not so good, good) going
from low to high, negative to positive use ordered logit
(or probit)
3.
Different categories (1, 2,3/democrat,
independent, republican) use multinomial logit.
4.
Continuous (1, 1.01, 1.02,…) Regression
(simple, multivariate).
Other
things to consider:
1.
Is your data organized by groups or entites
(panel data, cross sectional)
2.
What about time (years, months, days, quarters, etc.)
If you
have one or both of the previous one you may need to control for variables that
vary across time but not entities (like public policies) or variables that vary
across entities but not time (like cultural factors).
Once you
define your dependent and independent variables you can start exploring the
relationships between them. For this you can do the following:
1. Create a
correlation matrix for all variables. This will help you to have an idea of the
nature of the relationship between not only the dependent and independent
variables but also among the later ones (in Stata type spearman [list of
variables], star(0.05), or pwcorr [list of variables], sig. Type help
spearman or help pwcorr for more details.)
2. Create a scatter plot between the dependent variable
and each of the independent variables (in Stata type scatter [dep. var] [indep. var], type help scatter for more options or visit the DSS
help or training pages for examples: http://dss.princeton.edu/training/
or the general DSS help pages)