Development that Works
  • About

    This blog highlights effective ideas in the fight against poverty and exclusion, and analyzes the impact of development projects in Latin America and the Caribbean.
  • The synth_runner Package: Utilities to Automate Synthetic Control Estimation Using synth



    By Brian Quistorff.

     The new module for Synthetic Control Methods “synth_runner” helps conduct multiple estimations, inference, diagnostics, and generate visualizations of results.”


    Image: iStock

    The Synthetic Control Methodology (SCM) (Abadie and Gardeazabal, 2003, Abadie et al., 2010, hereafter ADH) is a data-drive approach to small-sample comparative case-studies for estimating treatment effects.

    Similar to a difference-in-differences design, SCM exploits the differences in treated and untreated units across the event of interest. However, in contrast to a difference-in-differences design, SCM does not give all untreated units the same weight in the comparison.

    Instead, it generates a weighted average of the untreated units that closely matches the treated unit over the pre-treatment period and uses that as the counterfactual. Along with their paper, ADH released the -synth- Stata command for single estimations. This blog post details the new module -synth_runner- that builds on top of the previous command to help conduct multiple estimations, inference, diagnostics, and generate visualizations of results.

    Let’s take a look at the example used in ADH in which they estimate the effect of Proposition 99 in California on cigarette sales. The proposition passed in 1988 and increased the tax on cigarettes and instituted several other restrictions on tobacco. First, have a look at the state trends in per-capita cigarette sales.  synth 1

    California is graphed along with states that did not enact large tobacco regulations or change their tobacco tax rates. These 38 other states are termed “donors” as they can potential be used to form the counterfactual. As one can see, California looks quite different than most of these other states during the pre-treatment period.

    A simple difference-in-differences strategy would therefore not be appropriate as the parallel trends assumption is not satisfied. Instead of using the untreated states equally, SCM finds an optimal weight over the untreated states to construct a counterfactual.

    The weights are found so that the counterfactual’s pre-treatment outcomes (and any other important pre-treatment variables) match that of the treated unit. Below is the estimated counterfactual for California.

    synth 2

    One sees that the counterfactual matches California well during the pre-treatment period. In the post-treatment period, however, California is much lower than its counterfactual. The estimated effect is then the difference between the treated unit and its synthetic control for the post-treatment period. Below is the difference between California and its counterfactual.synth 3

    ADH show that if weights can be found so that the counterfactual matches well the treated unit in the pre-treatment period then the estimated effect will be unbiased even in the presence of unobserved confounders that take a factor structure.

    The allowed factor structure is more general than the standard panel estimation framework where unobserved confounders are limited to time-invariant characteristics. In SCM, the factor structure can accommodate units on different time trends.

    For inference, SCM conducts a series of in-place placebo tests. For each of the untreated units, temporarily assume that it received treatment at the same time and construct a synthetic control for each using the rest of the untreated units. Collect the placebo effects (differences between units and their synthetic controls) to get a distribution against which one can gauge the relative size of the main effect.synth 4

    To calculate a p-value for each post-treatment effect, find the share of placebo effects that are as large as the main effect.

    If many placebo effects are as large as the main effect (i.e. the p-value is high) than it is likely that the main effect was observed by chance. As one can see in the distribution of differences above, post-treatment differences will be larger if the pre-treatment match was bad. A common alteration then is to scale each post-treatment effect by a measure of pre-treatment match quality (the pre-treatment root mean-squared prediction error, RMSPE). Comparing those “pseudo t-statistics”, the following are the p-values for Proposition 99’s effects.

    synth 5

    Cavallo et al. (2013) extent SCM by allowing more than one unit to experience treatment and at possibly different times.

    An overall treatment effect is constructed as an average over the treated units, with effects all relative to their treatment date. This averaging removes noise from the estimate so the same should be done for the comparison distribution when conducting inference.

    For each treatment, consider the group of placebo estimates where the never-treated donors are thought of as experiencing treatment at the treatment period. Select one placebo effect from each group and then take the average to construct a member of the comparison distribution. There will be many such averages with the size of the comparison group growing exponentially in the number of treatments.

    The -synth_runner- package also performs several diagnostics used in the aforementioned papers. The first is to check if a weighted average of donors is able to approximate the treated unit in the pre-treatment period. This should be satisfied if the treated unit lies within the convex hull of the control units. -synth_runner- calculates from the distribution of pre-treatment RMSPEs, what proportion of control units match worse than the treated unit.

    The second diagnostic, is that if one constructs the counterfactual by only matching on the initial part of the pre-treatment period, then the counterfactual should do well to match the rest of the pre-treatment period. The initial section of the pre-treatment period is often designated the “training” period with the later part being the “validation” period. As an example, Cavallo et al. (2013) set aside the first half of the pre-treatment period as the training period. When a training period is used, -synth_runner- will provide the proportion of control units that match worse than the treated unit during the validation period.

    Hopefully, this package will be useful for those using this new method. The synth_runner module file may be downloaded from here.


    Brian Quistorff has a degree in Computer Science from Stanford University, a Master’s in Economics from University of British Columbia and he will finished is Ph.D. in Economics from University of Maryland at College Park this May 2016.

    2 Responses to “The synth_runner Package: Utilities to Automate Synthetic Control Estimation Using synth”

    • click :

      fantastic points altogether, you just gained a brand new reader. What would you suggest about your post that you made some days ago? Any positive?

    Comment on the post

    Sign me up for the newsletter!