Why the Cambridge Analytica Discussion Needs to Shift to Data Misuse More Broadly

There are a few considerations that feel slightly off-center in the Facebook/Cambridge Analytica story.

Not the story itself – the misuse of people’s personal data is clearly a major concern, and the fact that such insights could, potentially, be used to influence people’s political leanings, in order to get a chosen candidate into power, through almost sub-conscious means, is frightening on many levels.

No, it’s definitely a major issue - possibly the major concern of our time. But there are a lot of misinterpretations out there as to what, exactly, may or may not have happened, and how, exactly, this situation came about.

Here are some of the core issues.

Facebook’s Business

First off, while everyone’s upset about Cambridge Analytica specifically, the actual process they’ve supposedly exploited is essentially Facebook’s very business model – they’ve used The Social Network’s vast data set to hone in on very specific audience segments, those must susceptible to specific messaging.

That’s the core appeal of Facebook ads more broadly – through Facebook ads, you can focus on tiny, micro-audiences and refine your messaging to appeal to those users. This help reduce costs, as you only need to connect with a small group, while also enabling businesses to tailor their messaging to fit each set.

That’s why Facebook advertising is so effective, and Facebook has allowed, and still does allow, for such targeting.

In a recent statement on Twitter, Facebook’s former VP of Ads Andrew Bosworth noted that:

This was unequivocally not a data breach. People chose to share their data with third party apps and if those third party apps did not follow the data agreements with us/users it is a violation. no systems were infiltrated, no passwords or information were stolen or hacked.
— Boz (@boztank) March 17, 2018

The issue is not that Cambridge Analytica was able to access such data – they, and many others, including Facebook, can do so, and have done over time. The issue is with how they’ve used it – but that does seem to miss the point. Yes, Cambridge Analytica got caught out, potentially using psychographic targeting to change the fate of a nation. But others have done the same.

The concern, then, is not so much with Cambridge Analytica itself, but the fact that Facebook has such complex data - and that it can, potentially, be used in this way. That’s where the discussion is now headed, and what Facebook CEO Mark Zuckerberg needed to allay fears of when he issued his statement of what Facebook was doing to respond.

He has tried to do this, he’s largely done all he can be announcing new measures to protect users further from now on. But that doesn’t change what already exists – and what Facebook itself still has access to.

As such, the suggested seedy workings of Cambridge Analytica are not the real issue here, and shouldn’t cloud the wider discussion. The concern is with data targeting – and more specifically, how Facebook has failed in their duty to protect such information from misuse.

Changing Times

To be fair, Facebook has updated its data access provisions to limit such use, and did so some five years ago, when they first realized there was a problem. But the issue is, as I noted recently, once anyone has access to the initial data set – even if that data was from five years ago – it’s already too late.

Sure, you won’t have the most up to date insights, and Facebook’s massive database is expanding every day. Right now, Facebook is in a position where is could create the most complex, detailed and specific psychographic audience profiles of virtually everyone on the planet – even those who don’t use The Social Network can be framed via the templates that already exist (the notable exception would likely be Chinese citizens due to Facebook being banned in that nation).

But even without the latest data, you’d still be able to build accurate audience templates if you had access to the older data set.

The value of Facebook data in this regard is in scale, not in up to the minute detail.

For example, if you had access to all of Facebook’s data points, you could go through and list all the likes of people who are members of, say, racist groups. You could then cross reference those likes and come out with a list of commonalities – people who like this group are also 95% likely to like ‘X’, ‘Y’ and ‘Z’.

Based on that insight, you could then take those commonalities and match them against all of the Facebook data you have. Now, even though those other members have not outright expressed support of the same group and or viewpoints, you know that there’s a very high likelihood that they’ll be susceptible to the same messaging.

Extrapolate that example to the trillions of data points you have access too through Facebook activity and you can imagine just how powerful – and accurate – those predictions could be. For instance, it wouldn’t ‘X’, ‘Y’ and ‘Z’ as your commonalities, you could match up hundreds, even thousands of data points.

Various research reports have confirmed this is entirely possible, and such insights would remain accurate, or at least indicative, for all of these people’s lives.

That would mean that as time goes on, the insights would become less relevant, as younger users and shifting opinions come into play. But Facebook’s data – whether its 5 years old, 10 years, 20 even – would still be effective.

As such, there’s not a heap more Facebook can do to fix it.

An Awakening

What’s also somewhat surprising within the current context is that many people have been raising concerns on this front for years, and few chose to listen.

Back in 2013, Facebook released Graph Search, which enabled Facebook users to conduct complex searches of their data. Say you wanted to find friends who also liked the same movies as you, you could. Say you wanted to see if you knew people who were in certain groups – no problem.

What about if you wanted to find…

This is why Graph Search was eventually scaled back, because it revealed too much personal insight – while scraping apps could also be built to extract that data for cross-matching - again, in much the same way that Cambridge Analytica is believed to have done.

This is smaller scale, of course, the common user couldn’t access all of the millions of profiles CA reportedly could. But again, you can see the potential for data matching and audience profiling.

Even more recently, Facebook came under fire for enabling discriminatory ad targeting through their system, as their AI tools developed audience subsets, largely without human oversight.

This, again, is the very model Facebook’s ad system is built on. That’s not to say it's designed for this type of targeting, but you’d be naïve to assume it isn’t being used this way. If people can utilize such targeting, they will, and they’ve been able to so via Facebook for some time.

Of course, this is what we’re now getting to - the Cambridge Analytica story in itself is more of a gateway into the wider privacy concerns linked with data misuse. But CA is just the vehicle, they’re, at least it seems at this stage, the ones who got caught out. But the actual concern is with big data more broadly, and how the various tools through which to track and measure our personalities and psychological leanings are expanding.

Did Cambridge Analytica actually use Facebook data against voters? We don’t know, and quite likely, we never will (they claim to have deleted any ‘paper trails’). But that, in itself, is not the issue – the narrative needs to shift from CA as the culprits, and to social networks, and other data collection providers, and their responsibilities with such insights.

In essence, the data that’s been collected has been collected, and it can be used, and misused as it is. Restrictions will have limited effectiveness, punishments can only be handed out in retrospect.

What we need to work out now is what we can do to detect and stop misuse by bad actors.

And the possible solutions are likely to be hugely complex.

People also ask