Using Social Media Data to Predict the Result of the 2016 US Presidential Election
One of the most significant benefits of social media is the capacity of the data generated on social platforms to enable better predictive modeling. This is one of the reasons why social platforms are so valuable - LinkedIn's data, for example, was the key factor behind Microsoft's decision to spend $26.2 billion to acquire it.
The potential benefits of social data for predictive purposes range from stock market spikes to natural disasters, with a wide range of examples showing how social interactions can be tracked, logged and then used to show usage patterns and trends leading to major shifts.
And given this capacity, and the widespread adoption of social as a discussion network, one of the more interesting use cases is that of predicting elections.
There's been a lot of research conducted on this front - Dublin City University released a report back in 2011 which suggested that Twitter data could be used as an accurate indicator of election outcomes, while researchers from Germany came to similar conclusions. When looked at with an analytical eye, it's clear that social data can provide some measure of voter sentiment and attention, but how much, exactly, and how accurate that insight is, remains a key query.
With that in mind, here's a look at the current state of the US Presidential Election race based on tweet data, using the key representative metrics as identified by previous academic research.
Share of Voice
There are three key measures to consider when analyzing Twitter data - share of voice, audience growth and sentiment. Any one of these measures in isolation is not enough to provide an indicative result, but in combination, they can be used to get some idea as to where the electorate is headed.
The first measure to consider is share of voice. Identified by various studies as the best indicator of election outcome, the amount of mentions a candidate receives can be a seen as a reflection that their message is gaining more attention, and thus, more traction amongst voters.
So who, of Donald Trump or Hillary Clinton, is being mentioned more via tweet? Using insights from Twitter's @gov handle, we can get some scope on mention share based on the recent debates.
It's clear the Donald Trump has dominated Twitter attention, but in this case, that attention may be for the wrong reasons. A lot of the discussion around Trump has been negative, with topics like #TrumpTapes trending, which actually works against his campaign. In this sense, the high profile nature of the candidate would need to be discounted from the numbers to get a truly accurate indication - but then, of course, we do have other measures to cross-check against to further validate those results.
Automated sentiment detection can be problematic, with human intervention generally needed to get a reliable level of accuracy. One of the biggest knocks against automated sentiment analysis is that it can't predict sarcasm, which can unfairly skew the results - and this is, no doubt, a significant consideration in the case of Donald Trump. But hesitations aside, what does the sentiment data say about the two leading candidates in the current US Presidential campaign?
Using a basic polling tool called HappyGrumpy, we can see that overall sentiment for the two candidates has shifted over time - most notably, since the first debate on September 27th.
That makes some sense, in terms of related media coverage, though the gap is not significant - there's no definitive divide between the two candidates.
This means that Trump is leading in mention volume, while Clinton takes the points on sentiment.
And that leads us to the final comparative measure - follower growth.
Audience growth can be seen as a demonstration of how well each candidate's message is being received - if more people are signing up as supporters, that can be an indicative measure of that candidate's message getting through.
Using Twitter Counter, we can see that over the past month, Hillary Clinton has gained 1, 004, 342 new Twitter followers.
Donald Trump, on the other hand, has gained 979,729 new followers in the same period.
So Clinton has gained more followers over the period of the Presidential debates, but not a lot more - 24, 613 to be exact.
Overall, the data suggests the race is close, and both candidates still have a chance to win. Looking at the three measures on balance, it would appear that Clinton is looking better, in terms of both sentiment and follower growth, which could be seen as reflective of more people aligning with her message. But as noted, previous research has suggested that mention volume is a better indicator, and on this front, Trump is the clear winner. That said, there have been various reports which suggest that up to a third of all pro-Trump tweets have been generated by bots, so that too is a factor in this instance.
The 2016 US Presidential Election seems something of an anomaly, purely because of the high profile and scandalous nature of the coverage, which has then lead to more mentions and more attention on the race than there would have been had Donald Trump not been involved. In this sense, it's hard to use Twitter data as a truly indicative measure, but it is interesting to see where each candidate is placed based on social coverage, and what that might mean in relation to the final result.
Follow Andrew Hutchinson on Twitter