Twitter is growing up, evolving from a simple message system into a public health research tool.
Computer scientists at Johns Hopkins University in Baltimore are using Twitter to track fluctuations in influenza activity in the U.S.
JHU doctoral candidate Michael Paul said Twitter will never be able to provide hard and fast statistics the way hospitals do when they report flu admissions to the Centers for Disease Control and Prevention, but monitoring tweets can show trends at a broader population level.
And it doesn’t stop with the flu. Paul said an earlier project used tweets to track seasonal allergies, mapping geographic trends and winter vs. spring. That could evolve into mapping allergy prevalence within a community or state.
Monitoring tweets could also be used to determine how people take medications. “We think it could be interesting to see how people are self-medicating,” Paul said. Are people, for example, using antibiotics to treat the flu, which is a virus and does not respond to them?
All of this may sound simple, but it actually required a lot of foundational work by researchers. Paul said the project began with a word search query based on 400 keywords related to health, including the word “flu.”
“What we needed to do for this to work was to look for tweets of people who say specifically that they had the flu,” Paul said.
Using supervised machine learning, the JHU researchers showed the computer how to sort tweets. They began with 5,000 tweets that matched for flu keywords, sorted them and gave the computer a learning system so it could pick up patterns. Paul said they taught the computer to look for phrases such as “in bed with the flu” or “sick with the flu” that people who had influenza would be likely to tweet.
The computer then sorted through millions of tweets and pulled those that met the criteria, giving the researchers an estimate of how many people were sick with flu on any given day. While they could not say 1 million people had the flu on a Thursday, Paul said they could say “there is a rise in the flu” or “it looks like it is rising faster than usual.”
So the trade-off with using Twitter to do this type of research is it is much faster than traditional public health information gathering, but less accurate.
As for Paul, what started off as something he and other researchers just “happened upon” will likely be the topic of his thesis.