Inter-American Development Bank
facebook
twitter
youtube
linkedin
instagram
Abierto al públicoBeyond BordersCaribbean Development TrendsCiudades SosteniblesEnergía para el FuturoEnfoque EducaciónFactor TrabajoGente SaludableGestión fiscalGobernarteIdeas MatterIdeas que CuentanIdeaçãoImpactoIndustrias CreativasLa Maleta AbiertaMoviliblogMás Allá de las FronterasNegocios SosteniblesPrimeros PasosPuntos sobre la iSeguridad CiudadanaSostenibilidadVolvamos a la fuente¿Y si hablamos de igualdad?Home
Citizen Security and Justice Creative Industries Development Effectiveness Early Childhood Development Education Energy Envirnment. Climate Change and Safeguards Fiscal policy and management Gender and Diversity Health Labor and pensions Open Knowledge Public management Science, Technology and Innovation  Trade and Regional Integration Urban Development and Housing Water and Sanitation
  • Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer

Abierto al público

  • HOME
    • About this blog 
    • Editorial guidelines
  • CATEGORIES
    • Knowledge Management
    • Open Data
    • Open Learning
    • Open Source
    • Open Systems
  • Authors
  • English
    • Español

Meet the Atypical Data Classifier, a system to review data quality for social programs

September 14, 2018 by Luis Tejerina - Carlos Tejada Leave a Comment


This open-source tool developed in collaboration with the IDB applies machine learning to review the quality of data used to determine eligibility for social programs in Colombia and the Dominican Republic.

One of the keys to implementing or evaluating a social program is having good data. And, in this sense, well-designed information systems for social programs are key to their effectiveness. These programs depend on massive information-gathering exercises related to households, in order to determine eligibility to receive benefits connected to health, education or conditional transfers, for example. However, the surveys typically employed to gather this household data do not always allow for manual verification for each of the variables that are asked. This certainly leaves rooms for errors or atypical data in the information obtained. For this reason, improving the quality of this data can in fact strengthen the quality of social program delivery, allowing us to provide services with greater precision and efficiency.

As in so many other fields, the digital revolution is making it possible to apply technology including artificial intelligence to obtain better results in social policies. A new open-source application that applies machine learning techniques to improve and accelerate the revision of this data is an example of this. The Atypical Data Classifier, formerly known as the Identification System for Potential Beneficiaries of Social Programs in Colombia (SISBEN ML) is a system that was designed with the purpose of automating a quality control process, taking into account all available information of household surveys in an objective way to select the cases that merit verification. This tool automatically classifies atypical cases of information to improve data quality and efficiency in the review process of potential beneficiaries of social programs.

The tool was intentionally developed in open source and is available through Code for Development, an Inter-American Development Bank initiative to promote using open-source technology in Latin America and the Caribbean for the public good. This is a result that we hope to continue replicating at the IDB as we collaborate in the development of other IT solutions together with countries.

Classifying and visualizing social program data

Traditional processes for reviewing social program data use logical validation meshes or manual validation systems. For example, a logical validation mesh verifies that someone born in 1975 cannot be considered a child in the database. The manual review depends on a person reviewing and cross-checking each of these points among possibly thousands of household surveys one by one.

The Atypical Data Classifier uses machine learning to automatically review all the information available from the survey while also potentially flagging atypical cases that may be less obvious, such as the use of an unusual construction material in the area. The current version applies unsupervised machine learning to generate the classifications, meaning the system continually learns by itself what is atypical and what is not. This learning is contextualized by geographic area. The algorithm knows to classify cases based on the local conditions where the surveyed family resides.

The Atypical Data Classifier includes two components. One is the classifier itself, the algorithms used to classify the households by reviewing the survey data and monitoring to detect atypical cases. The second component is the visualizer, a Web interface that allows you to see which cases that the classifier has identified as atypical while highlighting the exact variables within each form that it considers to be inconsistent. This streamlines the process of reviewing the household indicators for possible errors.

Numerous advantages to automating the data quality review process

Automating this process has a series of advantages for the institution that manages social programs:

  1. First, it allows you to reduce costs by minimizing the personnel required to review the data collected.
  2. It also allows to increase the quality of the final database. In the manual case, random sampling would be carried out in order to deliver the results on time and keep costs down. In this case, it is the algorithm that selects the sample after making a first revision to all the files.
  3. Finally, the tool allows to correct any logistic problem or the survey tool during the execution of the operation, since the algorithm will yield in real time the results of the analysis of the data deposited in the central database.

The tool was developed in coordination with the National Planning Department of Colombia, and can be adapted for use in other countries. Any organization can calibrate the components with the weights that they value in their calculations for determining eligibility for social programs.

In the process of creating the tool we learned a lot about the types of validation that are necessary for a tool to be reusable by third parties. We are currently working on a more modular version that has fewer dependencies with technologies from a particular provider. This would allow any institution to use the system, regardless of its infrastructure, operating system and other tools that it might be using.

In the near future we hope to take an additional step through the use of information from social databases and tools that apply machine learning to further enhance information and improve the efficiency of the use of public resources in social programs.

Interested in using this tool? Get the code for the Atypical Data Classifier on our GitHub.

 

By Luis Tejerina, Lead Specialist in the IDB Social Protection and Health Division and Carlos Tejada, Information Systems development consultant

 

 


Filed Under: Open Source Tagged With: Actionable Resources, Code for Development, Data Analysis, Knowledge Products

Luis Tejerina

Luis Tejerina es especialista líder en la División de Protección Social y Salud del Banco Interamericano de Desarrollo, en donde trabaja aportando su experiencia en proyectos de transformación digital en el sector social y en herramientas para promover el uso más eficiente y efectivo de la tecnología.

Carlos Tejada

Carlos Tejada es un consultor de sistemas informáticos. Ha colaborado con el BID en el diseño e implementación de proyectos de tecnología en varios países desde el 2010.

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Primary Sidebar

Follow Us

Subscribe

About this blog

Open knowledge can be described as information that is usable, reusable, and shareable without restrictions due to its legal and technological attributes, enabling access for anyone, anywhere, and at any time worldwide.

In the blog 'Abierto al Público,' we explore a wide range of topics, resources, and initiatives related to open knowledge on a global scale, with a specific focus on its impact on economic and social development in the Latin American and Caribbean region. Additionally, we highlight the Inter-American Development Bank's efforts to consistently disseminate actionable open knowledge generated by the organization.

Search

Topics

Access to Information Actionable Resources Artificial Intelligence BIDAcademy Big Data Citizen Participation Climate Change Code for Development Coronavirus Creative Commons Crowdsourcing Data Analysis Data Journalism Data Privacy Data Visualization Development projects Digital Badges Digital Economy Digital Inclusion Entrepreneurship Events Gender and Diversity Geospatial Data Hackathons How to Instructional Design Key Concepts Knowledge Products Lessons Learned Methodologies MOOC Most Read Natural Language Processing Numbers for Development Open Access Open Government Open Innovation Open Knowledge Open Science Solidarity Sustainable Development Goals Taxonomy Teamwork Text Analytics The Publication Station

Similar Posts

  • Code for Development: celebrating two years of opening software to the public
  • 5 Open Iniciatives for the Sustainable Management of Water, Sanitation and Solid Waste
  • The IDB is committed to the open source model for development
  • Coronavirus Impact Dashboard Toolbox: data and open-source tools to understand the pandemic’s impact in cities across Latin America and the Caribbean
  • How Veracruz replicated the platform that opens public works

Footer

Banco Interamericano de Desarrollo
facebook
twitter
youtube
youtube
youtube

    Blog posts written by Bank employees:

    Copyright © Inter-American Development Bank ("IDB"). This work is licensed under a Creative Commons IGO 3.0 Attribution-NonCommercial-NoDerivatives. (CC-IGO 3.0 BY-NC-ND) license and may be reproduced with attribution to the IDB and for any non-commercial purpose. No derivative work is allowed. Any dispute related to the use of the works of the IDB that cannot be settled amicably shall be submitted to arbitration pursuant to the UNCITRAL rules. The use of the IDB's name for any purpose other than for attribution, and the use of IDB's logo shall be subject to a separate written license agreement between the IDB and the user and is not authorized as part of this CC- IGO license. Note that link provided above includes additional terms and conditions of the license.


    For blogs written by external parties:

    For questions concerning copyright for authors that are not IADB employees please complete the contact form for this blog.

    The opinions expressed in this blog are those of the authors and do not necessarily reflect the views of the IDB, its Board of Directors, or the countries they represent.

    Attribution: in addition to giving attribution to the respective author and copyright owner, as appropriate, we would appreciate if you could include a link that remits back the IDB Blogs website.



    Privacy Policy

    Copyright © 2025 · Magazine Pro on Genesis Framework · WordPress · Log in

    Banco Interamericano de Desarrollo

    Aviso Legal

    Las opiniones expresadas en estos blogs son las de los autores y no necesariamente reflejan las opiniones del Banco Interamericano de Desarrollo, sus directivas, la Asamblea de Gobernadores o sus países miembros.

    facebook
    twitter
    youtube
    This site uses cookies to optimize functionality and give you the best possible experience. If you continue to navigate this website beyond this page, cookies will be placed on your browser.
    To learn more about cookies, click here
    x
    Manage consent

    Privacy Overview

    This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
    Necessary
    Always Enabled
    Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
    Non-necessary
    Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
    SAVE & ACCEPT