Supplementary Material for "Valid t-ratio Inference for IV"

Practitioners typically use the critical values ±1.96 for the IV t-ratio when conducting a hypothesis test at the 5 percent level of significance, while being aware that problems arise when the instrument is not “sufficiently strong”. The work of Staiger and Stock (1997) and Stock and Yogo (2005) analyzed these problems, and showed precisely how practitioners need to qualify their inferences when using the usual ±1.96 critical values. One of the specific findings of Stock and Yogo (2005) (see their Figure 5.2) is the following: if one uses ±1.96 critical values, and if one is willing to assert a particular minimum value (specifically, 6.88) for E[F] (the expected value of the first stage F statistic) then the significance level of the test is 10 percent (and the corresponding intervals using ±1.96*(std.error) are 90 percent confidence intervals).

Stock and Yogo’s equations imply that E[F] must be at least 142.6 for a 5 percent test (95% confidence). See the calculation in A.7 in the Online Appendix below.

No. The conclusions described above apply to “large samples”, after we have – as usual – used the central limit theorem to derive the limiting distribution of the t-ratio. In this case, the limiting distribution is not standard normal (which is the presumption behind the ±1.96 critical values). The econometric literature has recognized for a long time recognized that in the case of instrumental variables – including the just-identified case – the normal approximation can be quite poor, especially so when E[F] is low and the correlation between the errors of the main and first-stage equations, ρ, is relatively high.

Motivated by these problems, Staiger and Stock (1997) developed the “weak-IV” asymptotic approximation, which allows us to see precisely what the (non-normal) distribution of the IV t-ratio should look like in large samples. The STATA program below demonstrates the high accuracy of the Staiger and Stock (1997) approximation. The program produces 10,000 Monte Carlo draws of a sample size of 1000, using the same Monte Carlo design used in “Mostly Harmless Econometrics” (Angrist and Pischke, 2009), and produces the histogram of the 10,000 t-ratios. In the figure below, the Monte Carlo was based on E[F]=10, ρ=.8. The histogram clearly does not follow the standard normal density, also shown in the figure. Instead, it closely matches the theoretical density, as predicted by the formulas given by Staiger and Stock (1997) and Stock and Yogo (2005).

The bottom line is that the usual t-ratio test based on the ±1.96 critical values does not deliver (unqualified) valid inference at the intended 5 percent level of significance. That usual procedure presumes that the large-sample distribution for the t-ratio is normal in all cases; but it actually departs from normality.

Stata program used to create graph: weakivdensity
Using this program, you can experiment with different values of E[F] or ρ. You will see that the distribution will depart from the standard normal to varying degrees, but the density predicted by the Staiger and Stock (1997) approximation consistently does well.

One of the options is to abandon the t-ratio, and use a different statistic. For example, you could use the test of Anderson and Rubin (1949) (AR). The inversion of the AR test can similarly be used to form a valid confidence set of intended confidence level.

“Valid t-ratio Inference for IV” provides an alternative: use the tF critical value function. The paper uses and builds on the work of Staiger and Stock (1997) and Stock and Yogo (2005), who developed the first methods for using the first-stage F statistic to address the inferential problem illustrated above. “Valid t-ratio Inference for IV” offers a simple fix: adjust your 2SLS standard errors according to a particular smooth function of the observed first-stage F statistic (i.e. use Table 3a and 3b in the paper). After adjusting the standard errors in this way, the usual confidence intervals (±1.96 * std. error) will have correct confidence level (95 percent). Just like Anderson and Rubin (1949), it requires no assumption about E[F] or ρ for validity. Note that the paper also shows that, in expectation, the AR confidence interval will be longer than tF intervals (when both tF and AR produce bounded intervals).

Download Current Version of Online Appendix to "Valid t-ratio Inference for IV"

Anderson, T. W., and Herman Rubin. 1949. “Estimation of the Parameters of a Single Equation in a Complete System of Stochastic Equations.” Annals of Mathematical Statistics, 20: 46–63.

Angrist, Joshua, and Jorn-Steffen Pischke. 2009. Mostly Harmless Econometrics: An Empiricist’s Companion.

Staiger, Douglas, and James H. Stock. 1997. “Instrumental Variables Regression with Weak Instruments.” Econometrica, 65: 557–586.

Stock, James H., and Motohiro Yogo. 2005. “Testing for Weak Instruments in Linear IV Regression.” In Identification and Inference in Econometric Models: Essays in Honor of Thomas J. Rothenberg, ed. Donald W.K. Andrews and James H. Stock, Chapter 5, 80–108. Cambridge University Press.

Supplementary Material for "Valid t-ratio Inference for IV"

Lee, David S., Justin McCrary, Marcelo J. Moreira, and Jack Porter. 2021. “Valid t-ratio Inference for IV.” NBER Working Paper.

Frequently Asked Questions (including STATA demonstration)

Download Current Version of Online Appendix to "Valid t-ratio Inference for IV"