Inter-American Development Bank
facebook
twitter
youtube
linkedin
instagram
Abierto al públicoBeyond BordersCaribbean Development TrendsCiudades SosteniblesEnergía para el FuturoEnfoque EducaciónFactor TrabajoGente SaludableGestión fiscalGobernarteIdeas MatterIdeas que CuentanIdeaçãoImpactoIndustrias CreativasLa Maleta AbiertaMoviliblogMás Allá de las FronterasNegocios SosteniblesPrimeros PasosPuntos sobre la iSeguridad CiudadanaSostenibilidadVolvamos a la fuente¿Y si hablamos de igualdad?Home
Citizen Security and Justice Creative Industries Development Effectiveness Early Childhood Development Education Energy Envirnment. Climate Change and Safeguards Fiscal policy and management Gender and Diversity Health Labor and pensions Open Knowledge Public management Science, Technology and Innovation  Trade and Regional Integration Urban Development and Housing Water and Sanitation
  • Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer

Abierto al público

  • HOME
    • About this blog 
    • Editorial guidelines
  • CATEGORIES
    • Knowledge Management
    • Open Data
    • Open Learning
    • Open Source
    • Open Systems
  • Authors
  • English
    • Español

Meet SmartReader, our open-source text analytics tool

September 7, 2018 by Kyle Strand - Daniela Collaguazo - Autor invitado Leave a Comment


At the IDB we are motivated to learn how open source tools for text analytics and other technology can help guide the task of finding relevant knowledge. With that in mind, we teamed up with the Institute for the Future to create SmartReader, a tool to share with others interested in open source collaboration and artificial intelligence.

If you have ever worked on a literature review, we’re sure you know the feeling: you are hunkered down, buried in a pile of journals, books, and open browser windows, trying to make sense of it all and keep track of the winding threads of the topic you’re researching. You’re on the fifth document, which the author didn’t bother to write in a very engaging way. But you can’t skip it; after all, what if the missing insight you’ve been searching for is hidden in its depths? So, you read a page, but it doesn’t stick, so you read the same page again. Finally, you read it a third time and you turn the page. Your mind is wandering. Should you go back or just read something else? What if you miss something important?

The SmartReader could be your answer. It’s an experiment in using natural language processing techniques to make the literature review process more efficient, keep your research on track, and help point out key arguments you might have otherwise missed in the reading. The prototype version of the text analytics tool and its code in Python are now open and available to the public as part of the IDB’s Code for Development initiative.

What does the SmartReader do?

The SmartReader takes a body of text documents you have collected to support a specific research question and in minutes, generates insights for your literature review.

The results include:

Keywords

Word maps of the most relevant words, entities, and locations at the level of the overall topic, as well as at the level of each subtopic that you specify.

Relevant Content

A list of sentences highly relevant to a specific subtopic, as well as unique.  These sentences are also linked to and highlighted in the source text so that you can explore them in context. 

Now I am sure you are thinking, “I need this very much!” And you are right! But first, let’s go over the mechanism that makes all this magic happen in more detail.

How do I use the SmartReader?

First, you must formulate a research question such as “How will technology impact the informal economy over the next decade?”

Next, you collect a set of publications (a corpus of documents) that seem relevant to the chosen question, just like you would for a literature review.  Then, to set up a framework for analyzing the corpus, you identify a main topic (e.g. “Informal Economy”) and a set of relevant subtopics (e.g. “innovation, productivity, blockchain, and taxation”) and give them to the SmartReader.  With these inputs, the SmartReader queries Google to contextualize the subtopics and uses the real-time results to generate a model.  Last, it compares the model to our corpus and extracts the most salient terms and entities, while highlighting phrases within the documents containing relevant and unique information. Here’s a more detailed description of the process:

Step 1: Model Definition

This is where you tell SmartReader what topic you’re interested in, and what sub-topics you want to focus on within that topic, in order to define the model.  “What is a model?” you ask. Well, in this context, it is a set of keywords built based on the results of a Google search, and weighted by their relevance to our research question. 

Step 2: Model Status 

The second step is checking the Model Status to see if the SmartReader has finished creating the model based on the topic and subtopics you told it you were interested in exploring.  A model’s status is “Queued” immediately after the subtopics are submitted, “Processing” while the Google search and content analysis are underway, and “Done” when the model is created and ready to be used.

Step 3: Model Application 

Once the model is created, you can to tell SmartReader to use it to analyze a set of documents.  All you have to do is upload a .zip file with the documents in .txt format, and choose from a drop-down list which model to use to analyze the corpus.  Now the magic happens! 

Step 4: View Results

Lastly, you’re ready to see the Results!  Word maps of the most relevant keywords, locations, and entities for each of the subtopics as well as for the overall corpus.  Below each of the subtopics, you’ll also see a list of sentences worth checking out, with links back to their location in the source text.  Oh, and you can also download the results in .json if that’s your thing.

Now It is YOUR hands! How will you help improve this text analytics tool? 

We know that the tool can be refined and become a useful time-saver for researchers and curious minds all over, so we have made SmartReader available via Code for Development as an open source tool for text analytics in Python 3.6.5 and we could not be more eager to hear about your experience with it! You’ll find installation instructions, a user guide and other documentation that will help you set it up and experiment.  And if you love programming in python and are interested in getting your hands dirty right now, we’ve already compiled a backlog of improvements to work on, such as making the model results visible, incorporating Google Scholar, tweaking the query strings used to create the model, and improving scoring for the model.

Did we hear you say “challenge accepted”?

By Kyle Strand and Daniela Collaguazo from the Knowledge, Innovation, and Communication Sector of the IDB and Seaford Bacchas from the University of the West Indies, Mona

Seaford Bacchas

Seaford Bacchas is a graduate student at the University of the West Indies Mona. He completed a BSc (Hons) in Software Engineering at the same institution and is currently pursuing an MPhil in Computing. His research focuses on Big Data analysis on parallel computing.


Filed Under: Open Source Tagged With: Actionable Resources, Code for Development, Text Analytics

Kyle Strand

Kyle Strand is Lead Knowledge Management Specialist and Head of the Felipe Herrera Library in the Knowledge, Innovation and Communication Sector of the Inter-American Development Bank (IDB). For more than a decade, his work has focused on initiatives to improve access to knowledge both at the Bank and in the Latin American and Caribbean region. Kyle designed the first open repository of knowledge products at the IDB and spearheaded the idea of software as a knowledge product to be reused and adapted for development purposes, which led the IDB to become the first multilateral to formally recognize it as such. Currently, Kyle coordinates library services within the organization, supports the open knowledge product lifecycle including publications and open data, and promotes the use of artificial intelligence and natural language processing as a cornerstone of knowledge management in the digital age. Kyle is also executive editor of Abierto al Público, a blog in Spanish that promotes the opening and reuse of knowledge. He has a B.A. from the University of Michigan and an M.A. from the George Washington University.

Daniela Collaguazo

Born in Quito, Ecuador in April 1984. Daniela completed her undergraduate studies at the San Francisco de Quito University. Subsequently, he lived for 3 years in Germany, where he completed his master's degree in Technology Management and Innovation at the Brandenburg Technical University Cottbus-Senftenberg. Upon completing her studies, Daniela taught Web Technologies at the Faculty of Architecture, Design and Arts at the Pontificia Universidad Católica del Ecuador. Currently, she is collaborating with the IDB as a consultant on projects related to machine learning and natural language processing. She is passionate about sports and has participated in several competitions in her native country, including one in open water and the first two medium distance triathlons.

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Primary Sidebar

Follow Us

Subscribe

About this blog

Open knowledge can be described as information that is usable, reusable, and shareable without restrictions due to its legal and technological attributes, enabling access for anyone, anywhere, and at any time worldwide.

In the blog 'Abierto al Público,' we explore a wide range of topics, resources, and initiatives related to open knowledge on a global scale, with a specific focus on its impact on economic and social development in the Latin American and Caribbean region. Additionally, we highlight the Inter-American Development Bank's efforts to consistently disseminate actionable open knowledge generated by the organization.

Search

Topics

Access to Information Actionable Resources Artificial Intelligence BIDAcademy Big Data Citizen Participation Climate Change Code for Development Coronavirus Creative Commons Crowdsourcing Data Analysis Data Journalism Data Privacy Data Visualization Development projects Digital Badges Digital Economy Digital Inclusion Entrepreneurship Events Gender and Diversity Geospatial Data Hackathons How to Instructional Design Key Concepts Knowledge Products Lessons Learned Methodologies MOOC Most Read Natural Language Processing Numbers for Development Open Access Open Government Open Innovation Open Knowledge Open Science Solidarity Sustainable Development Goals Taxonomy Teamwork Text Analytics The Publication Station

Similar Posts

  • Open Knowledge Maps: A visual interface to the world‘s scientific knowledge
  • Applying topic modeling to knowledge management online
  • Meet the Atypical Data Classifier, a system to review data quality for social programs
  • Open-Source technology: concepts and applications
  • Getting the most out of your open source software initiative

Footer

Banco Interamericano de Desarrollo
facebook
twitter
youtube
youtube
youtube

    Blog posts written by Bank employees:

    Copyright © Inter-American Development Bank ("IDB"). This work is licensed under a Creative Commons IGO 3.0 Attribution-NonCommercial-NoDerivatives. (CC-IGO 3.0 BY-NC-ND) license and may be reproduced with attribution to the IDB and for any non-commercial purpose. No derivative work is allowed. Any dispute related to the use of the works of the IDB that cannot be settled amicably shall be submitted to arbitration pursuant to the UNCITRAL rules. The use of the IDB's name for any purpose other than for attribution, and the use of IDB's logo shall be subject to a separate written license agreement between the IDB and the user and is not authorized as part of this CC- IGO license. Note that link provided above includes additional terms and conditions of the license.


    For blogs written by external parties:

    For questions concerning copyright for authors that are not IADB employees please complete the contact form for this blog.

    The opinions expressed in this blog are those of the authors and do not necessarily reflect the views of the IDB, its Board of Directors, or the countries they represent.

    Attribution: in addition to giving attribution to the respective author and copyright owner, as appropriate, we would appreciate if you could include a link that remits back the IDB Blogs website.



    Privacy Policy

    Copyright © 2025 · Magazine Pro on Genesis Framework · WordPress · Log in

    Banco Interamericano de Desarrollo

    Aviso Legal

    Las opiniones expresadas en estos blogs son las de los autores y no necesariamente reflejan las opiniones del Banco Interamericano de Desarrollo, sus directivas, la Asamblea de Gobernadores o sus países miembros.

    facebook
    twitter
    youtube
    This site uses cookies to optimize functionality and give you the best possible experience. If you continue to navigate this website beyond this page, cookies will be placed on your browser.
    To learn more about cookies, click here
    x
    Manage consent

    Privacy Overview

    This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
    Necessary
    Always Enabled
    Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
    Non-necessary
    Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
    SAVE & ACCEPT