What does it say about someone that he frequently uses words like “idiot”, “fake”, or “bed” in his tweets? According to a recent paper and IDB seminar by Johannes Eichstaedt, his use of “idiot” signifies hostility or aggression; “fake”, hate and interpersonal tension; and “bed,” boredom and fatigue. Moreover, it means that he is more at risk for heart disease, a condition closely linked to negativity, anxiety, and depression.
With global, monthly users at 320 million for Twitter and more than 1.5 billion for Facebook, linguistic analysis of social media has become a powerful tool for everyone from political candidates to retailers. They use it to generate profiles of age, gender, and income. They determine personality and mood, infer inclinations and desires. They more precisely target their campaigns. No wonder people get those continuous adds on their computers for the shoes, cars and weight machines they desperately want but can’t afford.
But if linguistic analysis of social media helps to zero in on consumers, it can also be a profound social instrument. Most especially, it may be used to track heart disease and other psychologically-related ailments across wide areas and huge populations. It eventually could even orient timely interventions.
The process already has begun. In 2013, a team at the University of Pennsylvania, consisting of Martin Seligman, a founder of positive psychology, Eichstaedt and others, began an experiment. They took hundreds of millions of words, phrases and topics from the Facebook postings of 75,000 volunteers. They compared those words against personality tests administered to the volunteers and created computer algorithms linking words and psychological traits. They found, for example, that extroverts use words like “party” more, and that emotionally stable individuals used word like “exercise,” indicating they engaged more in team sports.
Then they used the tool to analyze words and word clusters in 100 million tweets from counties representing nearly 90% of the United States population. The results were remarkable. Identifying counties that most used words reflecting negative emotions and those using words reflecting optimism, among other positive personality traits, they were able to identify those counties at greatest and least risk of atherosclerotic heart disease. That is the leading cause of death in the United States. Moreover, they were able to do so with stunning accuracy. Their predictions outperformed any other model using official government statistics for risk factors like smoking, diabetes, hypertension and obesity. Twitter was even predictive of heart disease over and above income and education.
It is important to add that they did not predict risk in any particular individual. Instead, individual users served as the canaries in the mine for the general state of their communities, indicating where there is better and worse psychological health. Nonetheless, the potential is immense. In 2013, Microsoft Research used prenatal tweets to predict post-postpartum depression in several hundred women with 71% accuracy. Mexico’s National Institute of Statistics and Geography (INEGI) also has begun to use Twitter to study health trends. Other countries are sure to join.
Where this will all lead is any one’s guess. But with computational capacity growing and analytical tools becoming ever more refined, it seems safe to say that linguistic analyses of social media will continue to expand to cover a whole range of diseases in both the developed and developing world. The tool is far cheaper and faster than household surveys. And it could eventually be used for a whole range of other areas, including measuring the wellbeing of students. Tailored psychological interventions might then be brought to bear. Social media is here to stay; its capacity to improve human welfare will only grow.