Using Social Media Data to Predict to 2016 US Presidential Election
One of the more analyzed aspects of the masses of data we now have access to via social media and digital platforms is the capacity of that data to enable better predictive modeling. One of the most obvious applications of that is the ability to predict stock market fluctuations, and there are various examples of how social data can be used to benefit - even totally automate - stock trading. But another, more interesting social experiment is to analyze whether social media insights can show us who's going to win an election.
In theory, this should be possible - social media data can't provide an accurate prediction of a non-human influenced event, like a natural disaster or the weather. But given that social is 'social' - a virtual data log of human interactions - and that so many people in the world now interact and share information via these platforms, it should be able to provide an indicative measure of all human-influenced processes, of which, elections are totally people-defined. It makes sense, then, that we should be able to use social media as an indicator of probably electoral outcomes - the trick is in the how.
And the how involves several important elements and considerations.
The Popular Vote
The first measure you might look to utilize in a predictive model for an election would be overall mention volume - if a candidate's getting mentioned alot, that's a good sign their message is resonating - or at the least, their voice is reaching a lot of people.
Of course, there are several variables to this, but previous academic studies have shown that mention volume can play an important, indicative role in election results.
For example, a study Dublin City University in 2011 found that tweet volume was "the single biggest predictive variable" in election results, based on their analysis of political sentiment and prediction modeling.
Their research indicated that mention volume was a more accurate indicator than sentiment because volume better represents the relative popularity among the population, while sentiment can be reactive and influenced by responses to specific news stories or events.
These findings were echoed by a study conducted by the Technical University of Munich - based on their research, they found that:
"The mere number of tweets reflects voter preferences and comes close to traditional election polls."
Going on this, the basic metric of tweet volume could, at least in some measure, be used to predict the upcoming US Presidential Election.
So who do the current numbers suggest will become the 45th President of the United States?
Twitter recently released a new "Election 2016 Candidate Buzz" tracker which can provide this data for us.
If you were to go with this, as the most basic and simple metric predictor, tweet volume suggest Donald Trump is on a path to The White House.
It may seem simplistic - it may be simplistic - but as noted, previous academic research has found that tweet volume is one of the biggest, if not the biggest, predictive indicators of election results.
But there are some problems with this.
The current US Presidential campaign provides us with a unique case study because the leading candidate, according to tweet volume, may actually be a political anomaly. Donald Trump already had a massive Twitter following before he started his campaign, and he has a huge international presence because of his media work. As such, Trump arguably has a bigger media profile than any other candidate before him, at least at this stage of a Presidential campaign. Because of this, and because of Trump's divisive nature and social media savvy (i.e. he knows how to get a lot of attention), the volume numbers may actually be misleading - you'd suspect that a large proportion of these mentions would be negative.
This being the case, we'd need to look at additional qualifiers to get a better handle on Trump's standing with voters - so what other measures can we use?
Automated sentiment detection is a minefield, in terms of accuracy. One of the biggest knocks against automated sentiment analysis is that it can't predict sarcasm, which can unfairly skew the results - and this is, no doubt, a significant consideration in the case of Trump. But that noted, and taking into account that the aforementioned studies discounted the value of sentiment in their predictive models, what does the data say about the two leading candidates in the current US Presidential campaign?
Using a basic polling tool called HappyGrumpy, we can see that overall sentiment for Donald Trump is actually considered quite positive.
Of course, you have to question how accurate such polls are, which is impossible without reviewing the full methodology. On the website, Happy Grumpy provides examples of their previous predictive success, using the US Election Campaign as a model.
Worth noting, too, that that poll is only over the last month, and Trump did see a big dip in the polls in late April, according to those stats. But even then, he was only marginally lower than Clinton's average. And he bounced back.
Other sentiment analysis is not as definitively in Trump's favor - research by Hootsuite, published in Fortune in April, showed that Trump was well out in front in terms of impact, but his overall sentiment rating trails Clinton's.
Taking the two into account, it's hard to make any definitive judgments about sentiment ratings - which also, of course, means it's impossible to totally discount it. The best comparable measure would be to see if the HappyGrumpy real-time analysis holds up, as they suggest, when future polls come around - if the results are connected, it may be a valid indicator, though hard to rate it against the contrasting findings from Hootsuite's team at this time.
So if automated sentiment can't provide a definitive qualifier, then what can?
Another measure to consider is follower growth - the combination of mention volume and follower growth in the lead-up to the most recent Canadian election correctly indicated that Justin Trudeau would emerge triumphant.
While Trudeau's percentage growth was not as significant, the raw numbers were higher, which suggests that more people were responding to, and looking to hear, his message.
So how do the two leading US candidates stack up in terms of follower growth?
Using Twitter Counter, we can see that Trump has gained 595,777 followers over the last month.
Hillary Clinton, meanwhile, has gained 285, 541 followers in that same time frame.
Those growth figures only reflect the last 30 days, but other data, released via Twitter's Government analysis handle (@gov) have reflected similar - that Trump is gaining more than Clinton over time (they regularly post updates like this after major debates and events).
Now, this is fairly rudimentary analysis - a more accurate measure will be when the final candidates are set and the real campaign begins, as the responses will be more in line with policy detail and discussion. And again, this may also be skewed by Trump's global presence - Trump would likely have a significantly larger proportion of international support. But as an indicative measure, Trump is out in front. You could actually argue that Trump is in front on all relative metrics.
While there's still a long way to go, and a lot more campaigning to be had before we get into the real battles of the 2016 Presidential Campaign, if social media data - or Twitter data more accurately - is to be trusted as an predictor of the likely outcome, Donald Trump is winning.
Of course, you could conduct similar analysis yourself with the other candidates - for comparison, Bernie Sanders has added 217, 612 Twitter followers in the last month. These are only basic indicators, and critics would no doubt just as easily dismiss such findings. But it'll be interesting to see how it all plays out, and whether these measures actually do point to the eventual winner.
Follow Andrew Hutchinson on Twitter