By: David Alfaro Serrano*
Proper estimation of the standard errors of the estimators of regression coefficients is important. These estimates are needed when analyzing statistical significance, which is the basis of the interpretation of the results of an econometric analysis. In impact evaluation practice, analysis of statistical significance is what allows the researcher to say whether there is evidence for the effectiveness of an intervention.
In this post I tell you something I discovered recently about the calculation of estimators’ standard errors: there are cases in which the correlation of the errors of the regression model can be ignored when calculating them. Moreover, these cases arise frequently with experimental data.
To clusterize or not to clusterize standard errors? That is the question
One of the main decisions a researcher has to take regarding standard errors estimation is about using clustered standard errors or not. The usual population expression of the standard errors (the one that Stata tries to estimate by default) is based on the assumption of error term independence across observations. A common situation in which this assumption doesn’t hold is when there are groups of units for which the error term exhibits correlation. For example, when the units of observation are families that are affected by the characteristics of the neighborhoods or villages in which they live. In this case, the usual expression has to be adjusted. This adjusted expression is known as clustered standard error. Clustered standard errors are usually larger than the value indicated by the expression based on error term independence, so, not clustering when appropriate can lead to erroneously finding effects of the intervention.
Let´s imagine that we are trying to use a randomized experiment to evaluate the effectiveness of a program assigned to households in a certain region. Households are our unit of observation and they’re grouped into villages. The assignment to the program in the experiment is random, regardless of the village to which the households belong. Is it necessary to use clustered standard errors at the level of the village?
There´re two possible answers:
No Due to the fact that randomization occurred at the household level, it isn’t necessary to consider the correlation of the error term because the right thing is “to use standard error clustered at the level of randomization” (this phrase and variations are widely used) and this is so, even if it’s true that the error term of the model exhibits within village correlation.
Yes For holding the usual expression, errors must be independent across observations. If they are not, it’s necessary to use clustered standard errors. The independence or not of the unobserved variables depends on the nature of the phenomenon and its existence is not affected by that fact that we randomly assign or not an intervention.
The solution from a population perspective: with experimental data, it’s the same
For a long time, my position in this debate has been “yes”. After all, if an assumption is needed for the usual expression to be valid and it doesn’t hold, the usual expression cannot be the correct one. I always thought that “using standard error clustered at the level of randomization” is nothing more than an adage useful in many situations, but incorrect in this case. However, it seems I’ve been wrong.
As is noted in Cameron and Miller (2013), a literature review about cluster robust inference, if the regressor of interest is randomly assigned, the population expressions of the clustered and unclustered standard errors coincide. In this study, the authors consider which is the magnitude of the adjustment required when unobserved variables exhibit intragroup correlation. A simple way to analyze this issue is calculating the quotient of the population expressions of the clustered and unclustered standard errors. This quotient, which formula can be found in section IIB.1 of that paper, informs about the relevance of the adjustment in different contexts.
As expected, that quotient shows that the higher the intragroup correlation of the error term (i.e., the bigger the failure of the assumption of independence of the error), the greater the required correction. This makes perfect sense and supports to those who say “yes”.
However (and here is where the magic happens), the magnitude of the necessary correction also depends on the intracluster correlation of the regressor being analyzed. . In particular, if the intracluster correlation is zero, the correction to account for the correlation of the error term, while necessary, has zero magnitude (it’s equivalent to multiply by 1!).
This is what is commonly ignored by those who say “yes”. If our interest lies in the effect of a randomized treatment (as in an experiment), we are in this particular case and we can avoid having into account the correlation of the error term when thinking in the standard error of the estimator of causal effect.
The solution in practice, in this kind of situation, it’s better not to use clustered standard errors.
In practice, it’s advisable to heed those who say “no” when analyzing a situation like the one in the example. Well, you might ask “why all this mess?” If in the special case of a randomly assigned treatment the clustered and unclustered expressions coincide, the best strategy is using clustered standard errors in any case and that’s it. Not so fast.
Cameron and Miller (2013) show that the clustered and unclustered population expressions coincide when the repressor has been randomly assigned. However, in practice these values aren’t directly observable, but have to be estimated.
The estimator of the usual expression of the standard error (the one that Stata applies by default) has a better small sample performance than the White estimator, which is the one used to estimate clustered standard errors (the one that Stata uses when the option vce(cluster clustervar) is chosen).
Therefore, it’s advisable to avoid using clustered standard errors for calculating the variance of the estimates whenever possible. In our example, considering that the correlation of the error term can be ignored, it’s better to estimate the usual expression of the standard error.
Anyway, this post shouldn´t be taken as a recommendation to never use clustered standard errors with experimental data. A simple counterexample is the following: if in our initial example, our unit of observation was the individual rather than the household, but the treatment continued to be assigned at the household level, it would be necessary to use standard errors clustered at the household level because the intra-household correlation of the treatment variable would be one (in fact, the maximum possible) and not zero. Moulton (1990) uses a similar case, in which the level of treatment assignment doesn´t match with the level of observation, to show that there are cases in which low intracluster correlation of the error term can lead to large corrections in the variances of the estimators.
Another case in which we must be careful is the one that arises when analyzing spillovers. While variable of direct treatment can be randomly assigned into groups, the variable of indirect treatment will exhibit, by its nature, large intracluster correlation and, therefore, the correction required when calculating the variance of the estimator of indirect effect can be large.
A final note
The practical conclusion of this post probably is “keep doing business as usual”. However, it may be helpful to keep in mind the reasons behind the idea of “using standard errors clustered at the level of randomization” when facing complex scenarios. I strongly suggest reading Cameron and Miller (2013). In that paper, several issues of practical relevance about cluster-robust inference are treated.