by Eva Vivalt.
Increasingly rigorous studies have been done on the effects of development programs with the hope that these studies’ results will inform policy decisions.
However, the same program often has different effects in different contexts. There are many different variables that can affect what will happen.
The key question is then: to what extent can we generalize from a research study’s conclusions?
If a policymaker were to decide to implement a program based on the results from impact evaluations, how different could they expect their own project’s results to be?
Recently, I answered this question using data from over 600 studies in international development. The data focused on 20 different types of development programs, from conditional cash transfers to microfinance.
These data were gathered in the course of meta-analyses and systematic reviews by AidGrade, a non-profit research institute I set up in 2012. These data are the ideal data with which to examine this question, because what one needs is a body of evidence on each of many different types of development programs, with all the data collected in the same way.
Many economists would not be surprised that results from impact evaluations of the same type of program varied from context to context. However, several surprising facts stand out:
1) First, the extent of the variation was much greater than expected. In particular, if one tried to use data from the earlier studies to predict the results of later studies, no matter how one tried to model the results or which methods one used to make predictions, it would seem unusual for a prediction to come within 50 percent of the observed value.
2) While we often focus on a few successful projects, most programs are statistically indistinguishable from each other in terms of achieving a particular goal. This means that although studies of two programs might report different results, we cannot be confident that the difference is real; it could just be a fluke.
3) Research studies seem to “run away from each other” in terms of the outcomes that they study. Very few papers on a particular intervention have more than a couple of outcomes in common with any other paper on the same intervention. This makes sense from a research incentives angle: everybody wants to write the first paper on a topic, as it will likely publish better. However, this poses a great problem for the discipline, as the only way we can come to generalizable conclusions is by studying the same thing over and over again.
Since we see that there is a lot of variation in the effects of a type of program, a natural question is how policymakers and technical advisors interpret the evidence that an impact evaluation produces.
For example, if results appear very heterogeneous and people are overly optimistic, they would tend to prefer those types of programs that have had a wider range of effects.
There is also the issue of how policymakers, practitioners and researchers aggregate information from studies done in different contexts. I am currently working on another set of papers to answer these questions.
As researchers, we can do a lot more to provide better evidence to inform policy, including coordinating, replicating, building models that explain disperse results, and being humble and transparent about the limitations of any one single study.
Eva Vivalt is a Visiting Assistant Professor at Stanford, Lecturer (i.e. Assistant Professor) at Australian National University, and the founder of AidGrade, a non-profit research institute that gathers and synthesizes data from impact evaluations. She has a Ph.D. in Economics and an M.A. in Mathematics from the University of California, Berkeley and spent two years as a Young Professional at the World Bank, where she worked in the Development Economics Research Group.