Twitter Provides Access to COVID-19 Related Tweet Dataset to Assist Researchers

Twitter will provide selected academic teams with full access to the public tweet conversation about COVID-19 as part of a new dataset designed to help in the research of the virus and its spread.

Specifically, Twitter is releasing a new endpoint into Twitter Developer Labs which will enable developers and researchers to study the public conversation about COVID-19 in real-time.

As per Twitter:

"This is a unique dataset that covers many tens of millions of Tweets daily and offers insight into the evolving global public conversation surrounding an unprecedented crisis. Making this access available for free is one of the most unique and valuable things Twitter can do as the world comes together to protect our communities and seek answers to pressing challenges."

The COVID-19 data set will facilitate research into how people are discussing the pandemic, what they're discussing about it, the spread of misinformation and hoaxes, predicting future hot spots for healthcare management, and more.

Facebook has also provided its own dataset for similar purpose - earlier this month, The Social Network launched its new COVID-19 location tracking and individual connectivity maps, which aim to highlight where people are more connected, and where they're traveling, in order to help predict potential spread.

Though Twitter's data set is more about the specific conversation - which past research has shown can also be a highly effective tool in predicting future concerns.

US Geological Survey, for example, uses tweet data to track earthquakes and their potential impacts, while Tweet insights have also been used in various regions to predict civil unrest, and even crime.

Probably the best example in this case, however, comes from back in 2013, when researchers showed how tweet conversation can be used to map flu outbreaks, and help local health authorities prepare in order to mitigate the impacts.

As per the research paper:

"Our system detected the weekly change in direction (increasing or decreasing) of influenza prevalence with 85% accuracy, a nearly twofold increase over a simpler model, demonstrating the utility of explicitly distinguishing infection tweets from other chatter."

In their further notes, the research team says that real-time monitoring of relevant conversation via Twitter can better enable clinicians to anticipate surges in influenza-like illness - "up to two weeks in advance of existing data collection strategies".

That's the kind of predictive capacity Twitter will be hoping to facilitate with this new access, but Twitter also knows that it needs to be very cautious when enabling such usage.

The Cambridge Analytica debacle at Facebook changed the process in this respect. The dataset that CA used originally came from academic usage, and Twitter, like all social platforms, is acutely aware of how such insights can potentially be misused, if they fall into the wrong hands. Because of this, researchers that are looking to access the new dataset will need to apply via a specific process. Twitter will then review each application "to ensure they support the public good".

The dataset does not, however, include any private data or protected Tweets, so it's only content that's been made available publicly. But at such a scale, public datasets can reveal more than people might want to, which is why Twitter is being extra cautious with access.

Given the past usage of such, it could be a valuable addition, and could definitely help researchers better understand the COVID-19 conversation, and how that relates to, and is indicative of, the virus' path.

Researchers who want to use the new tweet data endpoint need to apply via this form.