Inter-American Development Bank
facebook
twitter
youtube
linkedin
instagram
Abierto al públicoBeyond BordersCaribbean Development TrendsCiudades SosteniblesEnergía para el FuturoEnfoque EducaciónFactor TrabajoGente SaludableGestión fiscalGobernarteIdeas MatterIdeas que CuentanIdeaçãoImpactoKreatopolisMoviliblogMás Allá de las FronterasNegocios SosteniblesPrimeros PasosPuntos sobre la iSeguridad CiudadanaSostenibilidadVolvamos a la fuente ¿Y si hablamos de igualdad?Home
Citizen Security and Justice Creative Industries Development Effectiveness Early Childhood Development Education Energy Envirnment. Climate Change and Safeguards Fiscal policy and management Gender and Diversity Health Labor and pensions Open Knowledge Public management Science, Technology and Innovation  Trade and Regional Integration Urban Development and Housing Water and Sanitation
  • Skip to content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer

Abierto al público

  • HOME
  • CATEGORIES
    • Knowledge Management
    • Open Data
    • Open Learning
    • Open Source
    • Open Systems
  • Authors
  • English
    • Español

How does AI work? Consider the Iceberg Model

December 12, 2018 by Kyle Strand | Daniela Collaguazo | Leave a Comment


When it comes to thinking about how artificial intelligence works, at some level, all AI projects are primarily data projects. In order to help think about projects from this perspective, we’ve developed a model which breaks down the essential AI aspect of any project into three main phases: 1) Prepare the relevant data, 2) Train the algorithm(s) and 3) Test the trained algorithm(s).

We can compare these phases to an iceberg to demystify the work of artificial intelligence a little bit. Artificial Intelligence techniques like machine learning, deep learning, and natural language processing are not magic, and much of their success, if not most of their success, depends on extensive data preparation. We estimate that the most significant effort (over half) of a successful AI project is time spent preparing the data, assuming an adequate dataset hasn’t been cleaned and prepared previously. Which is a likely assumption if you are using data from your organization on an AI project for the first time. Yet despite this significant effort, the work of preparing the data is work that largely goes unseen, just like how most of an iceberg is under water and unseen. As such, the complexity of this part of the process is not always appreciated, as it isn’t always reflected in the visible results of a project, in the same way that only a relatively small part of the iceberg can be seen from the surface.

Have you ever used a voice assistant, like Alexa, Siri, or Google Home? Let’s imagine an interaction with Google Home and explore an overview of what happens during each of these phases.

Phase 1) Prepare the Relevant Data

Google Home works by understanding spoken voice commands to take appropriate actions, like answering a question, setting a timer, or controlling some connected device. For these kinds of results to be possible, the first phase, preparing the data, must consist of activities such as:

  • collecting millions of voice recordings from a wide range of voices;
  • cleaning the sound of the recordings by removing background noise and the like; normalizing the recordings to a single audio format such as mp3;
  • labeling the recordings appropriately;
  • and other related activities.

 

Lastly, the data needs to be separated into at least two groups: a group to be used in phase two for modeling the algorithm (training data), and another to be used to test the trained algorithm in phase three (test data).

For an organization like Google, we can imagine that all of the tasks connected to this phase were carried out over the course of years, by a talented team of engineers and developers getting paid to tackle this challenge over the course of multiple iterations and updates. Also, we can imagine that with large technology firms like this, data is their business. They have access to massive amounts of data relevant to compiling robust training data sets and testing data sets for the development of their products. And yet despite all of this, as consumers we’ve probably experienced the imperfections of these devices at one moment or another, where a word or more has been misunderstood or a face has not been recognized.

Now compare those conditions to the resources and data available to any given researcher working in AI, and then we start to understand the magnitude of the data preparation phase and its overall importance in the process of developing a functional AI feature.

Phase 2) Train the Algorithm(s)

Phase two is training the algorithm. First, we choose which algorithm(s) we are going to train. To understand this phase, think of the process of modeling with clay. Imagine that each of these blocks of clay is a different basic algorithm type: the red block represents linear regression, the orange represents k-means, purple represents neural networks, and turquoise represents support vector machines.

What happens during the training process? You’ve probably heard the expression that you need to find the algorithm that has the best fit. Well, here’s where the clay analogy helps express this idea. During the training process, each algorithm is molded to the training data by finding patterns in the data, so at the end of the phase the algorithms might look like this:

Phase 3) Test the Trained Algorithm(s)

In phase three, testing, each of the trained models is given the testing data to see which one provides the most accurate results and determine how successful our training was. To continue the clay analogy, imagine that a successful test result means being able to roll well.  Both the red and the purple trained algorithms in the photo above look like they could roll, but if we test them by sending them down a ramp, the purple trained algorithm will be most effective.  In the case of Google Home, if a successful result for a trained algorithm means giving an adequate response to a verbal command based on the testing data, the algorithm that provides the best response would be the best fit.

If the results of the testing phase need improvement, you have essentially two possible next steps: 1) change the algorithms or 2) prepare and introduce additional relevant data. This is an iterative process, but the general sequence follows the three phases of the Iceberg Model.

There are many other considerations when designing and implementing a project that uses artificial intelligence. Our hope is that the Iceberg Model can be a useful conceptual for framing the general approach to your next project as well as communicating the kind of effort which takes place behind the scenes.

 

What AI projects are you working on?

 

By Kyle Strand and Daniela Collaguazo from the IDB´s Knowledge, Innovation and Communications Sector

 


Filed Under: Open Source Tagged With: Artificial intelligence Inteligencia Artificial

Kyle Strand

Kyle Strand es Especialista Senior en Gestión del Conocimiento en el Sector de Conocimiento, Innovación y Comunicación del BID. Desde 2007, su trabajo se ha centrado en iniciativas para mejorar el acceso al conocimiento dentro del BID y en la región de América Latina y el Caribe. Trabaja para promover la idea del software como un producto de conocimiento para ser reutilizado y adaptado para el desarrollo, y trabaja para integrar el uso de la inteligencia artificial y el procesamiento del lenguaje natural como la frontera de la gestión del conocimiento. Kyle es economista de la Universidad de Michigan y tiene una maestría en Estudios Latinoamericanos de la Universidad George Washington en Washington, DC.

Daniela Collaguazo

Nacida en Quito, Ecuador en abril de 1984. Daniela culminó sus estudios de pre-grado en la Universidad San Francisco de Quito. Posteriormente, vivió 3 años en Alemania, en donde cursó su maestría en Gestión de la Tecnología e Innovación en la Universidad Técnica de Brandenburgo Cottbus-Senftenberg. Al culminar sus estudios, Daniela enseñó Tecnologías Web en la Facultad de Arquitectura Diseño y Artes en la Pontificia Universidad Católica del Ecuador. Actualmente, se encuentra colaborando con el BID como consultora en proyectos relacionados con aprendizaje automático y procesamiento del lenguaje natural. Es una apasionada del deporte y ha participado en varias competencias en su país natal, entre ellas una de aguas abiertas y los dos primeros triatlones de media distancia.

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Primary Sidebar

Follow Us

Subscribe

About this blog

Open knowledge is defined as knowledge that is accessible, freely used, and free to its user. This concept is part of a movement that seeks to generate solutions for public good in a collaborative manner. In "Abierto al Público" we explore the use of open knowledge through data, code and other media.

Search

Explorar temas

#CollaborativeMethodologies #ConDatos #Crowdsouring #INFRALATAM #Infrastructure #InfrastructureGap #OpenData #OpenKnowledge #PublicManagement #TheGovLab Artificial intelligence Code for Development Data privacy Data protection Digital development Digital rights Editorial IATI Inteligencia Artificial Natural language processing Open headlines Superheroes of Development Text analytics Transparency

Similar Posts

  • Meet SmartReader, our open-source text analytics tool
  • Meet the Atypical Data Classifier, a system to review data quality for social programs
  • Presenting the Open Urban Planning Toolbox
  • Is open artificial intelligence an upward trend?

Footer

Banco Interamericano de Desarrollo
facebook
twitter
youtube
youtube
youtube

Blog posts written by Bank employees:

Copyright © Inter-American Development Bank ("IDB"). This work is licensed under a Creative Commons IGO 3.0 Attribution-NonCommercial-NoDerivatives. (CC-IGO 3.0 BY-NC-ND) license and may be reproduced with attribution to the IDB and for any non-commercial purpose. No derivative work is allowed. Any dispute related to the use of the works of the IDB that cannot be settled amicably shall be submitted to arbitration pursuant to the UNCITRAL rules. The use of the IDB's name for any purpose other than for attribution, and the use of IDB's logo shall be subject to a separate written license agreement between the IDB and the user and is not authorized as part of this CC- IGO license. Note that link provided above includes additional terms and conditions of the license.


For blogs written by external parties:

For questions concerning copyright for authors that are not IADB employees please complete the contact form for this blog.

The opinions expressed in this blog are those of the authors and do not necessarily reflect the views of the IDB, its Board of Directors, or the countries they represent.

Attribution: in addition to giving attribution to the respective author and copyright owner, as appropriate, we would appreciate if you could include a link that remits back the IDB Blogs website.



Privacy Policy

Banco Interamericano de Desarrollo

Aviso Legal

Las opiniones expresadas en estos blogs son las de los autores y no necesariamente reflejan las opiniones del Banco Interamericano de Desarrollo, sus directivas, la Asamblea de Gobernadores o sus países miembros.

facebook
twitter
youtube