Earlier this year, researchers from The University of Cambridge and Stanford University released a report which looked at how people's Facebook activity could be used as an indicative measure of their psychological profile. What they found was pretty amazing - using the results of a 100 question psychological study, which had been completed by more than 86,000 participants through an app and mapped alongside their respective Facebook likes, the researchers developed a system which could then, based on Facebook activity alone, determine a person's psychological make-up more accurately than their friends, their family - better even than their partners.
The research was widely reported with headlines like "Facebook Knows You Better Than Your Therapist", stoking the coals of ongoing debates about internet privacy and the all-knowing power held by Facebook. But the thing that stood out most to me was just how much potential Facebook has for audience data and profiling purposes, for reaching the audience most receptive to your message. People don't seem to realize, or haven't yet been able to fully grasp, just how valuable this sort of insight is. Data of this level would have been the stuff of dreams for researchers of times past, and while there are more and more examples of how such information can be applied - from creating hit TV shows to predicting outbreaks of diseases ahead of time - the potential of such information can sometimes get lost in the fear mongering and concern around our loss of privacy in the modern, connected world.
With the initial reactions to the report now out of the way, I recently got a chance to speak with one of the co-lead authors of the study, Dr. Michal Kosinski, to get his thoughts on their findings, how the report was received and - importantly - whether he'd personally changed his online habits as a result of their research. And as you'd expect, he provided some fascinating insights.
Everything is Predictable
I first asked Kosinski about his initial approach to the Facebook study and whether he had expected to find that Facebook activity could reveal so much detail about a person's psychological make-up.
"I did expect that Facebook activity and other types of digital footprint were going to be somewhat revealing," Kosinski told me. "But the thing that was most surprising - and is quite surprising to me still, three years since we first discovered it - is that our most intimate traits can be very easily predicted from our digital footprint."
"One of our most surprising findings was that we could even predict whether your parents were divorced or not, based on your Facebook likes," Kosinski continued. "Actually, when I saw those results, I started doubting my methods and I re-ran the analyses a few more times. I couldn't believe that what you like on Facebook could be affected by your parents' divorce, which could have happened many years earlier - we're talking here about people who might be 30 or 40 years old."
"There are many other intimate traits that are also predictable from your digital footprint: smoking, drinking, taking drugs, sexual orientation, religious and political views, and so on. Actually, everything we tried predicting was predictable, to a degree, and quite often it was very accurate. We created a website with a demo to show how much a computer can learn from your likes, if people are interested." (Note: you can go to www.applymagicsauce.com to try this out for yourself).
"The second surprising thing was that such a wide range of digital activities could be used in predictions - even broad measures, such as the number of your friends, number of your likes, how many times you log in to Facebook, how many tweets you've sent. Each one of these, while not a very strong predictor of anything on its own, becomes powerful when combined with different variables of this kind, enabling predictive systems to establish very accurate profiles of who you are."
This goes some way towards supporting my initial observations, that people are unaware - or possible prefer not to be aware - of the predictive power of their online activities. In isolation, of course, an action like logging onto Facebook three times a day means nothing, but when matched up against a broader data set, as the team has done with their 86,000+ respondents, those behaviors all start to form behavioral patterns, the correlations of which can reveal very specific details about who you are and what you're about.
Attention to Detail
In line with my thinking on the subject, I asked Kosinski whether he felt that people, generally, had a good grasp of Facebook's privacy settings and the amount of insight that can be gleaned from their online actions.
"I think people realize that their digital footprint is being tracked," Kosinski said. "I don't think, however, that they realize that their purchase records or music playlists can be used to extract so much more than just what they bought or listened to."
"One of the main points of my research is that seemingly simple data points - such as what you listen to or what you purchased at the grocery store this morning - these can be turned into very accurate predictions of your intimate traits, such as IQ or sexual orientation."
This is an excellent point, and lead into my next question - have brands, to Kosinski's knowledge, utilized his research, or similar data, to better target their marketing/advertising efforts?
"Certainly so. One of the most well-known cases is the one of Target, which used customers' purchase records to detect their pregnancies and send a timely baby-formula offer."
Given what he knows about the power of predictive algorithms - and Facebook data for such purposes, in particular - I was interested to know what Kosinski thought about how Facebook is currently using their troves of data to fuel their News Feed algorithm and deliver a more customized and targeted user experience.
"Facebook's doing quite an amazing job in terms of improving user experience," Kosinski said. "The News Feed feature is, in essence, an ingenuous information recommendation mechanism, selecting the stories that users are most likely to be interested in. Obviously, at the same time, some Facebook users may feel overly exposed, and I do hope Facebook will seek to do more to protect their users' data and experience within their platform on that front."
"For example, Facebook could offer its users full control over their data and who can access it. For historical reasons, it's widely accepted now that your data has to be stored and governed by third parties, such as Facebook. But does it have to be this way?" Kosinski said. "Imagine a social network or an online store that doesn't store your Likes, or purchase records, these are safely stored on your computer or personal cloud account. Predictions could still be made, but under an individual's control, allowing people, if they wish to do so, to approve resulting personal inferences."
"Predictive algorithms, like any other technology, are morally neutral. We can use them to improve our lives or to harm ourselves - just like a knife."
The Next Evolution
The next evolution, and one which is fueling a growing amount of concern amongst some elements, is the development of artificial intelligence and machine learning. The most difficult step with AI, however, is that computers can't process information the way a human brain can, they can only respond with logic - you input a command, the computer spits out a response. But what if computers could think through a problem and work it out similar to how we do? What if a computer could learn and develop solutions based on neural networks?
A big part of this development is giving computers access to how people think, how people respond to different stimulus and develop ideas. In some ways, social media gives us access to a 'data-fied' version of such info - I asked Kosinski whether he thought data like Facebook content could one day be used to inform such an element of an artificial intelligence model.
"Certainly," Kosinski said. "Our brains are way more sophisticated than human-designed machine brains. Machine brains, however, while in many ways being rather simple, can process and store enormous amounts of data."
"Take chess, for example," Kosinski continued. "Humans brain can deploy a wide range of strategies while playing chess, but we're still easily beaten by computers that can simply analyze all the possible next moves. Similarly, I can make accurate predictions of others' personality based on their looks or behaviour, but as our research shows, machines can beat humans at this task by relying on lots of relatively simple data. Each single Like doesn't say much about a given user's personality. Hundreds or thousands of them combined, however, allow a relatively simple machine model to make a very accurate prediction of one's personality. That same data would be rather useless for humans."
As we came to the end of our discussion, I asked Kosinski whether his research had changed the way he, personally, uses or thinks about Facebook and social media and whether he had any reservations about sharing his data as a result of his findings.
"Not at all," Kosinski told me. "There are too many benefits that I'd deprive myself of - and I'd like to think that my friends and family would miss me, too."
"It's also worth noting that social media is only one source of intimate data," Kosinski continued. "Your web search logs, browsing history, purchase records, your geographical whereabouts - all of these are duly recorded by a number of gadgets and services which are potentially much more intrusive than social media."
"You can't function in today's world without leaving behind significant amounts of digital footprint."
An interesting perspective, and an interesting conversation which, no doubt, still has many iterations to go before we get to an understanding of the extent of the potential of big data and it's many uses.