Inter-American Development Bank
facebook
twitter
youtube
linkedin
instagram
Abierto al públicoBeyond BordersCaribbean Development TrendsCiudades SosteniblesEnergía para el FuturoEnfoque EducaciónFactor TrabajoGente SaludableGestión fiscalGobernarteIdeas MatterIdeas que CuentanIdeaçãoImpactoIndustrias CreativasLa Maleta AbiertaMoviliblogMás Allá de las FronterasNegocios SosteniblesPrimeros PasosPuntos sobre la iSeguridad CiudadanaSostenibilidadVolvamos a la fuente¿Y si hablamos de igualdad?Home
Citizen Security and Justice Creative Industries Development Effectiveness Early Childhood Development Education Energy Envirnment. Climate Change and Safeguards Fiscal policy and management Gender and Diversity Health Labor and pensions Open Knowledge Public management Science, Technology and Innovation  Trade and Regional Integration Urban Development and Housing Water and Sanitation
  • Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer

Impacto

  • HOME
  • CATEGORIES
    • Beyond development effectiveness
    • Evaluation methods and techniques
    • Measuring our performance
    • What does and doesn’t work in development
  • Authors
  • English
    • Español

Errors: the simpler the better?

July 15, 2014 por Autor invitado Leave a Comment


By: David Alfaro Serrano*

error_eng

Proper estimation of the standard errors of the estimators of regression coefficients is important. These estimates are needed when analyzing statistical significance, which is the basis of the interpretation of the results of an econometric analysis. In impact evaluation practice, analysis of statistical significance is what allows the researcher to say whether there is evidence for the effectiveness of an intervention.

In this post I tell you something I discovered recently about the calculation of estimators’ standard errors: there are cases in which the correlation of the errors of the regression model can be ignored when calculating them. Moreover, these cases arise frequently with experimental data.

To clusterize or not to clusterize standard errors? That is the question

One of the main decisions a researcher has to take regarding standard errors estimation is about using clustered standard errors or not. The usual population expression of the standard errors (the one that Stata tries to estimate by default) is based on the assumption of error term independence across observations. A common situation in which this assumption doesn’t hold is when there are groups of units for which the error term exhibits correlation. For example, when the units of observation are families that are affected by the characteristics of the neighborhoods or villages in which they live. In this case, the usual expression has to be adjusted. This adjusted expression is known as clustered standard error. Clustered standard errors are usually larger than the value indicated by the expression based on error term independence, so, not clustering when appropriate can lead to erroneously finding effects of the intervention.

Let´s imagine that we are trying to use a randomized experiment to evaluate the effectiveness of a program assigned to households in a certain region. Households are our unit of observation and they’re grouped into villages. The assignment to the program in the experiment is random, regardless of the village to which the households belong. Is it necessary to use clustered standard errors at the level of the village?

There´re two possible answers:

No  Due to the fact that randomization occurred at the household level, it isn’t necessary to consider the correlation of the error term because the right thing is “to use standard error clustered at the level of randomization” (this phrase and variations are widely used) and this is so, even if it’s true that the error term of the model exhibits within village correlation.

 

Yes  For holding the usual expression, errors must be independent across observations. If they are not, it’s necessary to use clustered standard errors. The independence or not of the unobserved variables depends on the nature of the phenomenon and its existence is not affected by that fact that we randomly assign or not an intervention.

The solution from a population perspective: with experimental data, it’s the same

For a long time, my position in this debate has been “yes”. After all, if an assumption is needed for the usual expression to be valid and it doesn’t hold, the usual expression cannot be the correct one. I always thought that “using standard error clustered at the level of randomization” is nothing more than an adage useful in many situations, but incorrect in this case. However, it seems I’ve been wrong.

As is noted in Cameron and Miller (2013), a literature review about cluster robust inference, if the regressor of interest is randomly assigned, the population expressions of the clustered and unclustered standard errors coincide. In this study, the authors consider which is the magnitude of the adjustment required when unobserved variables exhibit intragroup correlation. A simple way to analyze this issue is calculating the quotient of the population expressions of the clustered and unclustered standard errors. This quotient, which formula can be found in section IIB.1 of that paper, informs about the relevance of the adjustment in different contexts.

As expected, that quotient shows that the higher the intragroup correlation of the error term (i.e., the bigger the failure of the assumption of independence of the error), the greater the required correction. This makes perfect sense and supports to those who say “yes”.

However (and here is where the magic happens), the magnitude of the necessary correction also depends on the intracluster correlation of the regressor being analyzed. . In particular, if the intracluster correlation is zero, the correction to account for the correlation of the error term, while necessary, has zero magnitude (it’s equivalent to multiply by 1!).

This is what is commonly ignored by those who say “yes”. If our interest lies in the effect of a randomized treatment (as in an experiment), we are in this particular case and we can avoid having into account the correlation of the error term when thinking in the standard error of the estimator of causal effect.

The solution in practice, in this kind of situation, it’s better not to use clustered standard errors.

In practice, it’s advisable to heed those who say “no” when analyzing a situation like the one in the example. Well, you might ask “why all this mess?” If in the special case of a randomly assigned treatment the clustered and unclustered expressions coincide, the best strategy is using clustered standard errors in any case and that’s it. Not so fast.

Cameron and Miller (2013) show that the clustered and unclustered population expressions coincide when the repressor has been randomly assigned. However, in practice these values aren’t directly observable, but have to be estimated.

The estimator of the usual expression of the standard error (the one that Stata applies by default) has a better small sample performance than the White estimator, which is the one used to estimate clustered standard errors (the one that Stata uses when the option vce(cluster clustervar) is chosen).

Therefore, it’s advisable to avoid using clustered standard errors for calculating the variance of the estimates whenever possible. In our example, considering that the correlation of the error term can be ignored, it’s better to estimate the usual expression of the standard error.

Anyway, this post shouldn´t be taken as a recommendation to never use clustered standard errors with experimental data. A simple counterexample is the following: if in our initial example, our unit of observation was the individual rather than the household, but the treatment continued to be assigned at the household level, it would be necessary to use standard errors clustered at the household level because the intra-household correlation of the treatment variable would be one (in fact, the maximum possible) and not zero. Moulton (1990) uses a similar case, in which the level of treatment assignment doesn´t match with the level of observation, to show that there are cases in which low intracluster correlation of the error term can lead to large corrections in the variances of the estimators.

Another case in which we must be careful is the one that arises when analyzing spillovers. While variable of direct treatment can be randomly assigned into groups, the variable of indirect treatment will exhibit, by its nature, large intracluster correlation and, therefore, the correction required when calculating the variance of the estimator of indirect effect can be large.

A final note

The practical conclusion of this post probably is “keep doing business as usual”. However, it may be helpful to keep in mind the reasons behind the idea of “using standard errors clustered at the level of randomization” when facing complex scenarios. I strongly suggest reading Cameron and Miller (2013). In that paper, several issues of practical relevance about cluster-robust inference are treated.

David Alfaro is an economist. He holds a masters degree by Universidad de San Andres (Argentina). Currently, he works as a consultant in the Office of Strategic Planning and Development Effectiveness at the IDB. His work is mainly focused on productive development policies.


Filed Under: Evaluation methods and techniques Tagged With: Cluster, clustered errors, estimator, standard errors

Autor invitado

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Primary Sidebar

Follow Us

Subscribe

Search

About this blog

This blog highlights effective ideas in the fight against poverty and exclusion, and analyzes the impact of development projects in Latin America and the Caribbean.

Categories

Footer

Banco Interamericano de Desarrollo
facebook
twitter
youtube
youtube
youtube

    Blog posts written by Bank employees:

    Copyright © Inter-American Development Bank ("IDB"). This work is licensed under a Creative Commons IGO 3.0 Attribution-NonCommercial-NoDerivatives. (CC-IGO 3.0 BY-NC-ND) license and may be reproduced with attribution to the IDB and for any non-commercial purpose. No derivative work is allowed. Any dispute related to the use of the works of the IDB that cannot be settled amicably shall be submitted to arbitration pursuant to the UNCITRAL rules. The use of the IDB's name for any purpose other than for attribution, and the use of IDB's logo shall be subject to a separate written license agreement between the IDB and the user and is not authorized as part of this CC- IGO license. Note that link provided above includes additional terms and conditions of the license.


    For blogs written by external parties:

    For questions concerning copyright for authors that are not IADB employees please complete the contact form for this blog.

    The opinions expressed in this blog are those of the authors and do not necessarily reflect the views of the IDB, its Board of Directors, or the countries they represent.

    Attribution: in addition to giving attribution to the respective author and copyright owner, as appropriate, we would appreciate if you could include a link that remits back the IDB Blogs website.



    Privacy Policy

    Derechos de autor © 2025 · Magazine Pro en Genesis Framework · WordPress · Log in

    Banco Interamericano de Desarrollo

    Aviso Legal

    Las opiniones expresadas en estos blogs son las de los autores y no necesariamente reflejan las opiniones del Banco Interamericano de Desarrollo, sus directivas, la Asamblea de Gobernadores o sus países miembros.

    facebook
    twitter
    youtube
    This site uses cookies to optimize functionality and give you the best possible experience. If you continue to navigate this website beyond this page, cookies will be placed on your browser.
    To learn more about cookies, click here
    X
    Manage consent

    Privacy Overview

    This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
    Necessary
    Always Enabled
    Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
    Non-necessary
    Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
    SAVE & ACCEPT