Inter-American Development Bank
facebook
twitter
youtube
linkedin
instagram
Abierto al públicoBeyond BordersCaribbean Development TrendsCiudades SosteniblesEnergía para el FuturoEnfoque EducaciónFactor TrabajoGente SaludableGestión fiscalGobernarteIdeas MatterIdeas que CuentanIdeaçãoImpactoIndustrias CreativasLa Maleta AbiertaMoviliblogMás Allá de las FronterasNegocios SosteniblesPrimeros PasosPuntos sobre la iSeguridad CiudadanaSostenibilidadVolvamos a la fuente¿Y si hablamos de igualdad?Home
Citizen Security and Justice Creative Industries Development Effectiveness Early Childhood Development Education Energy Envirnment. Climate Change and Safeguards Fiscal policy and management Gender and Diversity Health Labor and pensions Open Knowledge Public management Science, Technology and Innovation  Trade and Regional Integration Urban Development and Housing Water and Sanitation
  • Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer

Abierto al público

  • HOME
    • About this blog 
    • Editorial guidelines
  • CATEGORIES
    • Knowledge Management
    • Open Data
    • Open Learning
    • Open Source
    • Open Systems
  • Authors
  • English
    • Español

How does AI work? Consider the Iceberg Model

December 12, 2018 by Kyle Strand - Daniela Collaguazo 4 Comments


When it comes to thinking about how artificial intelligence works, at some level, all AI projects are primarily data projects. In order to help think about projects from this perspective, we’ve developed a model which breaks down the essential AI aspect of any project into three main phases: 1) Prepare the relevant data, 2) Train the algorithm(s) and 3) Test the trained algorithm(s).

We can compare these phases to an iceberg to demystify the work of artificial intelligence a little bit. Artificial Intelligence techniques like machine learning, deep learning, and natural language processing are not magic, and much of their success, if not most of their success, depends on extensive data preparation. We estimate that the most significant effort (over half) of a successful AI project is time spent preparing the data, assuming an adequate dataset hasn’t been cleaned and prepared previously. Which is a likely assumption if you are using data from your organization on an AI project for the first time. Yet despite this significant effort, the work of preparing the data is work that largely goes unseen, just like how most of an iceberg is under water and unseen. As such, the complexity of this part of the process is not always appreciated, as it isn’t always reflected in the visible results of a project, in the same way that only a relatively small part of the iceberg can be seen from the surface.

Have you ever used a voice assistant, like Alexa, Siri, or Google Home? Let’s imagine an interaction with Google Home and explore an overview of what happens during each of these phases.

Phase 1) Prepare the Relevant Data

Google Home works by understanding spoken voice commands to take appropriate actions, like answering a question, setting a timer, or controlling some connected device. For these kinds of results to be possible, the first phase, preparing the data, must consist of activities such as:

  • collecting millions of voice recordings from a wide range of voices;
  • cleaning the sound of the recordings by removing background noise and the like; normalizing the recordings to a single audio format such as mp3;
  • labeling the recordings appropriately;
  • and other related activities.

Lastly, the data needs to be separated into at least two groups: a group to be used in phase two for modeling the algorithm (training data), and another to be used to test the trained algorithm in phase three (test data).

For an organization like Google, we can imagine that all of the tasks connected to this phase were carried out over the course of years, by a talented team of engineers and developers getting paid to tackle this challenge over the course of multiple iterations and updates. Also, we can imagine that with large technology firms like this, data is their business. They have access to massive amounts of data relevant to compiling robust training data sets and testing data sets for the development of their products. And yet despite all of this, as consumers we’ve probably experienced the imperfections of these devices at one moment or another, where a word or more has been misunderstood or a face has not been recognized.

Now compare those conditions to the resources and data available to any given researcher working in AI, and then we start to understand the magnitude of the data preparation phase and its overall importance in the process of developing a functional AI feature.

Phase 2) Train the Algorithm(s)

Phase two is training the algorithm. First, we choose which algorithm(s) we are going to train. To understand this phase, think of the process of modeling with clay. Imagine that each of these blocks of clay is a different basic algorithm type: the red block represents linear regression, the orange represents k-means, purple represents neural networks, and turquoise represents support vector machines.

What happens during the training process? You’ve probably heard the expression that you need to find the algorithm that has the best fit. Well, here’s where the clay analogy helps express this idea. During the training process, each algorithm is molded to the training data by finding patterns in the data, so at the end of the phase the algorithms might look like this:

Phase 3) Test the Trained Algorithm(s)

In phase three, testing, each of the trained models is given the testing data to see which one provides the most accurate results and determine how successful our training was. To continue the clay analogy, imagine that a successful test result means being able to roll well.  Both the red and the purple trained algorithms in the photo above look like they could roll, but if we test them by sending them down a ramp, the purple trained algorithm will be most effective.  In the case of Google Home, if a successful result for a trained algorithm means giving an adequate response to a verbal command based on the testing data, the algorithm that provides the best response would be the best fit.

If the results of the testing phase need improvement, you have essentially two possible next steps: 1) change the algorithms or 2) prepare and introduce additional relevant data. This is an iterative process, but the general sequence follows the three phases of the Iceberg Model.

There are many other considerations when designing and implementing a project that uses artificial intelligence. Our hope is that the Iceberg Model can be a useful conceptual for framing the general approach to your next project as well as communicating the kind of effort which takes place behind the scenes.

What AI projects are you working on?

By Kyle Strand and Daniela Collaguazo from the IDB´s Knowledge, Innovation and Communications Sector


Filed Under: Open Source Tagged With: Artificial Intelligence, How to

Kyle Strand

Kyle Strand is Lead Knowledge Management Specialist and Head of the Felipe Herrera Library in the Knowledge, Innovation and Communication Sector of the Inter-American Development Bank (IDB). For more than a decade, his work has focused on initiatives to improve access to knowledge both at the Bank and in the Latin American and Caribbean region. Kyle designed the first open repository of knowledge products at the IDB and spearheaded the idea of software as a knowledge product to be reused and adapted for development purposes, which led the IDB to become the first multilateral to formally recognize it as such. Currently, Kyle coordinates library services within the organization, supports the open knowledge product lifecycle including publications and open data, and promotes the use of artificial intelligence and natural language processing as a cornerstone of knowledge management in the digital age. Kyle is also executive editor of Abierto al Público, a blog in Spanish that promotes the opening and reuse of knowledge. He has a B.A. from the University of Michigan and an M.A. from the George Washington University.

Daniela Collaguazo

Born in Quito, Ecuador in April 1984. Daniela completed her undergraduate studies at the San Francisco de Quito University. Subsequently, he lived for 3 years in Germany, where he completed his master's degree in Technology Management and Innovation at the Brandenburg Technical University Cottbus-Senftenberg. Upon completing her studies, Daniela taught Web Technologies at the Faculty of Architecture, Design and Arts at the Pontificia Universidad Católica del Ecuador. Currently, she is collaborating with the IDB as a consultant on projects related to machine learning and natural language processing. She is passionate about sports and has participated in several competitions in her native country, including one in open water and the first two medium distance triathlons.

Reader Interactions

Comments

  1. Deep learning says

    April 4, 2019 at 11:53 pm

    Very good explanation!

    Reply
  2. Gha@TodaysAIcom says

    May 23, 2019 at 12:30 pm

    Most of the time I hear about AI applications, but never thought of delving deep into the process itself as I thought it was too complicated to be explained to someone with no technical background. Preparing the necessary data turns out to be the most important, yet the least unseen and appreciated. I wonder if this stage (data preparation) could be open to participation by non-technical people since it’s very important to have vast amounts of data for success.

    Reply
  3. Kelly says

    June 22, 2019 at 10:56 am

    Above article contains more knowledge, Thanks for sharing.

    Reply
  4. Mitra Boiler says

    July 14, 2021 at 8:52 am

    Tanks For Sharing Articel… Good Job

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Primary Sidebar

Follow Us

Subscribe

About this blog

Open knowledge can be described as information that is usable, reusable, and shareable without restrictions due to its legal and technological attributes, enabling access for anyone, anywhere, and at any time worldwide.

In the blog 'Abierto al Público,' we explore a wide range of topics, resources, and initiatives related to open knowledge on a global scale, with a specific focus on its impact on economic and social development in the Latin American and Caribbean region. Additionally, we highlight the Inter-American Development Bank's efforts to consistently disseminate actionable open knowledge generated by the organization.

Search

Topics

Access to Information Actionable Resources Artificial Intelligence BIDAcademy Big Data Citizen Participation Climate Change Code for Development Coronavirus Creative Commons Crowdsourcing Data Analysis Data Journalism Data Privacy Data Visualization Development projects Digital Badges Digital Economy Digital Inclusion Entrepreneurship Events Gender and Diversity Geospatial Data Hackathons How to Instructional Design Key Concepts Knowledge Products Lessons Learned Methodologies MOOC Most Read Natural Language Processing Numbers for Development Open Access Open Government Open Innovation Open Knowledge Open Science Solidarity Sustainable Development Goals Taxonomy Teamwork Text Analytics The Publication Station

Similar Posts

  • How can design thinking promote innovation in development?
  • 10 Practical Resources to Strengthen Your Prompt Engineering Skills
  • More or better investment? What data tells us on how to close the infrastructure gap in Latin America and the Caribbean
  • Is open artificial intelligence an upward trend?
  • CrowdLaw: The Demand for Public Participation in Lawmaking

Footer

Banco Interamericano de Desarrollo
facebook
twitter
youtube
youtube
youtube

    Blog posts written by Bank employees:

    Copyright © Inter-American Development Bank ("IDB"). This work is licensed under a Creative Commons IGO 3.0 Attribution-NonCommercial-NoDerivatives. (CC-IGO 3.0 BY-NC-ND) license and may be reproduced with attribution to the IDB and for any non-commercial purpose. No derivative work is allowed. Any dispute related to the use of the works of the IDB that cannot be settled amicably shall be submitted to arbitration pursuant to the UNCITRAL rules. The use of the IDB's name for any purpose other than for attribution, and the use of IDB's logo shall be subject to a separate written license agreement between the IDB and the user and is not authorized as part of this CC- IGO license. Note that link provided above includes additional terms and conditions of the license.


    For blogs written by external parties:

    For questions concerning copyright for authors that are not IADB employees please complete the contact form for this blog.

    The opinions expressed in this blog are those of the authors and do not necessarily reflect the views of the IDB, its Board of Directors, or the countries they represent.

    Attribution: in addition to giving attribution to the respective author and copyright owner, as appropriate, we would appreciate if you could include a link that remits back the IDB Blogs website.



    Privacy Policy

    Copyright © 2025 · Magazine Pro on Genesis Framework · WordPress · Log in

    Banco Interamericano de Desarrollo

    Aviso Legal

    Las opiniones expresadas en estos blogs son las de los autores y no necesariamente reflejan las opiniones del Banco Interamericano de Desarrollo, sus directivas, la Asamblea de Gobernadores o sus países miembros.

    facebook
    twitter
    youtube
    This site uses cookies to optimize functionality and give you the best possible experience. If you continue to navigate this website beyond this page, cookies will be placed on your browser.
    To learn more about cookies, click here
    x
    Manage consent

    Privacy Overview

    This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
    Necessary
    Always Enabled
    Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
    Non-necessary
    Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
    SAVE & ACCEPT