Social media data could be key to tracking disease patterns

Optum Group shares data on social media's link to tracking the COVID-19 pandemic.
By Dr. Caroline Yang
09:17 am
Share

Photo: Josep M Rovirosa/Getty Images 

Social media has grown rapidly over the past couple of decades, with some of the biggest players including Facebook, formed in 2004, and Twitter, started in 2006. In just the past 12 months, the number of active social media users has grown by over 400 million, and Twitter alone reaches 211 million active users daily. This all translates to volumes of data that could be leveraged to draw population-level insights.

The COVID19 pandemic presented an opportunity for researchers like the Optum Group to investigate if social media data could be correlated to disease patterns and trends. At a HIMSS22 presentation, Danita Kiser, VP of Optum, took us on a deep dive of such a project, where over 20 million posts on Twitter were reviewed  

The organization posed the question: “How strongly is social media data correlated with actual COVID-19 cases, and does that signal remain stable through the course of the pandemic?”

“We collected a set of geolocated tweets … read the tweets, and labeled them [to categorize them]. Then using those classified tweets we built natural language processing models to … categorize unlabeled data,” Kiser said.

The team of researchers then ran the models on real-time data and the categorized tweets were monitored over time.

 “We spent quite a bit of time on collecting and monitoring … before we were able to start defining trends,” Kiser said.

More than 15,000 hand-labeled tweets were placed into categories, some of which included “confirmed,” “showing symptoms,” “recovered,” and “hoax.” They also labeled whether the content of the tweet had proximity to the location of the post. What the Group found was interesting.

At the beginning of the pandemic, there was a very strong correlation between confirmed tweets and cases. 

“Tweets correlated most strongly when we shifted tweets by seven to 10 days. ... People would tweet about cases before case rates started increasing, [and this was found to be] a leading indicator of COVID cases.” Kiser said. “This was important because at the time, there were no leading indicators.”

Interestingly, however, in the latter part of the Delta wave, the tweet lag shortened. In Pennsylvania, for example, this lag shifted from seven to two days, meaning the case counts were rising pretty soon after Tweets were posted.

The greatest challenge was working against a moving “ground truth.” The categories chosen were ultimately correlated against this defined “truth,” but knowing what was fact was constantly evolving as people better understood the disease over time and navigated multiple COVID-19 variants.

Social media is a powerful tool to draw insights on an individual and population level. Through collaboration with university partners and data scientists, the Optum Group learned that particularly when COVID-19 cases are on the rise, they were able to input Twitter signals as leading indicators to predict counts.

The hope is that such data analytics could be utilized for future pandemic preparedness and response. As Gina Debogovich, senior director of UnitedHealth Group, stated, “There are a multitude of data sources that can help us more accurately predict course of disease, but digital surveillance could be one of our most effective offensive mechanisms. …We need to vigilantly monitor social media so we can proactively identify next big outbreak.” 

 

Tags: 
Optum
Share