Practitioners typically use the critical values ±1.96 for the IV t-ratio when conducting a hypothesis test at the 5 percent
level of significance, while being aware that problems arise when the instrument is not “sufficiently strong”. The work of
Staiger and Stock (1997) and Stock and Yogo (2005) analyzed these problems, and showed precisely how practitioners need to
qualify their inferences when using the usual ±1.96 critical values. One of the specific findings of Stock and Yogo (2005)
(see their Figure 5.2) is the following: if one uses ±1.96 critical values, and if one is willing to assert a particular
minimum value (specifically, 6.88) for E[F] (the expected value of the first stage F statistic) then the significance level
of the test is 10 percent (and the corresponding intervals using ±1.96*(std.error) are 90 percent confidence intervals).
Stock and Yogo’s equations imply that E[F] must be at least 142.6 for a 5 percent test (95% confidence).
See the calculation in A.7 in the Online Appendix below.
No. The conclusions described above apply to “large samples”, after we have – as usual – used the central limit theorem to derive
the limiting distribution of the t-ratio. In this case, the limiting distribution is not standard normal (which is the presumption
behind the ±1.96 critical values). The econometric literature has recognized for a long time recognized that in the case of instrumental
variables – including the just-identified case – the normal approximation can be quite poor, especially so when E[F] is low and the
correlation between the errors of the main and first-stage equations, ρ, is relatively high.
Motivated by these problems, Staiger and Stock (1997) developed the “weak-IV” asymptotic approximation,
which allows us to see precisely what the (non-normal) distribution of the IV t-ratio should look like in large samples.
The STATA program below demonstrates the high accuracy of the Staiger and Stock (1997) approximation.
The program produces 10,000 Monte Carlo draws of a sample size of 1000, using the same Monte Carlo design used in
“Mostly Harmless Econometrics” (Angrist and Pischke, 2009), and produces the histogram of the 10,000 t-ratios.
In the figure below, the Monte Carlo was based on E[F]=10, ρ=.8. The histogram clearly does not follow the standard normal density,
also shown in the figure. Instead, it closely matches the theoretical density, as predicted by the formulas given by Staiger and Stock
(1997) and Stock and Yogo (2005).
The bottom line is that the usual t-ratio test based on the ±1.96 critical values does not deliver (unqualified) valid inference at
the intended 5 percent level of significance. That usual procedure presumes that the large-sample distribution for the t-ratio is normal
in all cases; but it actually departs from normality.
Stata program used to create graph: weakivdensity
Using this program, you can experiment with different values of E[F] or ρ. You will see that the distribution
will depart from the standard normal to varying degrees, but the density predicted by the Staiger and Stock (1997) approximation consistently does well.
One of the options is to abandon the t-ratio, and use a different statistic. For example, you could use the test of Anderson and Rubin (1949) (AR).
The inversion of the AR test can similarly be used to form a valid confidence set of intended confidence level.
“Valid t-ratio Inference for IV” provides an alternative:
use the tF critical value function. The paper uses and builds on the work of Staiger and Stock (1997) and Stock and Yogo (2005), who developed the first methods for using
the first-stage F statistic to address the inferential problem illustrated above. “Valid t-ratio Inference for IV” offers a simple fix:
adjust your 2SLS standard errors according to a particular smooth function of the observed first-stage F statistic (i.e. use Table 3a and 3b in the paper).
After adjusting the standard errors in this way, the usual confidence intervals (±1.96 * std. error) will have correct confidence level (95 percent).
Just like Anderson and Rubin (1949), it requires no assumption about E[F] or ρ for validity. Note that the paper also shows that, in expectation,
the AR confidence interval will be longer than tF intervals (when both tF and AR produce bounded intervals).
Anderson, T. W., and Herman Rubin. 1949. “Estimation of the Parameters of a Single Equation in a Complete System of Stochastic Equations.”
Annals of Mathematical Statistics, 20: 46–63.
Angrist, Joshua, and Jorn-Steffen Pischke. 2009. Mostly Harmless Econometrics: An Empiricist’s Companion.
Staiger, Douglas, and James H. Stock. 1997. “Instrumental Variables Regression with Weak Instruments.” Econometrica, 65: 557–586.
Stock, James H., and Motohiro Yogo. 2005. “Testing for Weak Instruments in Linear IV Regression.”
In Identification and Inference in Econometric Models: Essays in Honor of Thomas J. Rothenberg, ed. Donald W.K. Andrews and James H. Stock,
Chapter 5, 80–108. Cambridge University Press.