In Seach of The Lost Data - IDB Improving Lives

An objection I usually hear when discussing with practitioners about using rigorous impact evaluation techniques is “…but these evaluations cost a lot of money”.

I always answer that the cost of an evaluation does not depend on the method, but mainly on whether or not you need to collect primary data (I was glad to hear Paul Gertler responding in the same way during a recent meeting at the IDB). So, if you want to do quantitative research and use micro-data, you may end up needing a good amount of resources, because collecting data (good data) is expensive.

However, data, as all information, has very high production costs, but usually very low re-production costs. That is, once produced, cleaned and properly organized, data can be used for many other purposes than the one for which they were originally collected.

Micro-data are nowadays collected by a variety of organizations and for a variety of reasons.

First and foremost, statistical institutes collect censuses and surveys for statistical purposes, but they rarely use them for econometric analysis. Other public and private organizations also collect a significant amount of valuable statistics and administrative records.

Financial supervision authorities collect data on access to finance. Authorities that monitor the use of natural resources collect data on agricultural production and farmers. Customs authorities collect data on imports and exports.

Social security agencies and tax authorities collect data on workers and firms. Most of the time, these datasets can also be merged among them, because they often use the same codes to identify individuals and firms.

Although the cost of using these datasets is much lower than collecting new data, other “costs” should be considered when one plans to use them.

These data are always protected by confidentiality agreements between the data-recipients and the information providers.

Therefore, their use requires agreements between the user and the recipients to guarantee that this confidentiality is not violated.

These agreements imply restrictions on:

(i) the type of information accessible to the user (firms’ identifiers and names are never provided),

(ii) the modality of use (sometime the users are asked to work in-situ, i.e. through systems under the direct control of the recipients) and

(iii) the type of results that can be published (aggregate results only, i.e. nothing that can reveal information on individual firms or households).

Yes, there is a real “island of misfit toys” for evaluators out there. Valuable datasets that could be used for evaluation and monitoring purposes often sit in the servers of organizations that ignore most of their potential uses.

Don’t get me wrong. I am not advocating for data driven evaluations, which would imply focusing on studies for which data are already available or, even worst, limiting the evaluation questions to those that can be answered with existing information.

I am simply suggesting that before starting a new data collection one should have thorough review of the existing data and ask herself what can be done with them.

In the last few years, I had the opportunity to work on a series of impact evaluations based on secondary data. In the next posts, I will review some of these discussing the specific challenges related to the use of these sources of information.

Comments

Rachel Kasumba says

November 17, 2011 at 9:32 am

This is great advice as it saves resources that would have been spent/wasted by starting from scratch. However, before placing reliance on this data from other organizations, it is best to ensure it is current, regularly updated, not corrupted, and most importantly, that the organization is credible.

With transparency and open data being encouraged and embraced by a lot of governments and agencies, soon this “lost” data will be easily accessible, which will speed up evaluation and monitoring mechanisms.

- Francisco Mejía says
  
  November 17, 2011 at 10:04 am
  
  Thank you Rachel for your comment. We should definitely do a lot more to make administrative data and data in generalo more easily accesible.
  
Rachel Kasumba says

November 17, 2011 at 9:32 am

This is great advice as it saves resources that would have been spent/wasted by starting from scratch. However, before placing reliance on this data from other organizations, it is best to ensure it is current, regularly updated, not corrupted, and most importantly, that the organization is credible.

With transparency and open data being encouraged and embraced by a lot of governments and agencies, soon this “lost” data will be easily accessible, which will speed up evaluation and monitoring mechanisms.

- Francisco Mejía says
  
  November 17, 2011 at 10:04 am
  
  Thank you Rachel for your comment. We should definitely do a lot more to make administrative data and data in generalo more easily accesible.

In search of the lost data

Blog posts written by Bank employees:

For blogs written by external parties:

Alessandro Maffioli

Reader Interactions

Comments

Leave a Reply Cancel reply

Footer

Blog posts written by Bank employees:

For blogs written by external parties: