Court Rules that LinkedIn Cannot Stop Third Party Data Scraping of Public Information

In a case that could have significant implications in the digital sector, LinkedIn has lost an appeal to stop a third party company from scraping user profiles in order to gather publicly available information en masse, and use that to build its own analytics engine.

As explained by Reuters:

"The 9th U.S. Circuit Court of Appeals let stand an August 2017 preliminary injunction that required LinkedIn, to give hiQ Labs Inc. access to publicly available member profiles. The 3-0 decision by the San Francisco appeals court sets back Silicon Valley’s battle against “data scraping,” or extracting information from social media accounts or websites, which critics say can equate to theft or violate users’ privacy."

hiQ Labs uses LinkedIn profile information in order to build data profiles which can predict when an employee is more likely to leave a company.

"hiQ’s retention platform scours the web for any publicly available information about a company’s employees and then its data science engine extracts strong signals from that noise that indicate someone may be a flight risk. Based on the statistical patterns observed across hundreds of thousands of employees, powerful machine learning models then assign each of those employees a risk score: high (red), medium (yellow), or low (green). Companies are able to pinpoint with laser-like accuracy the employees that are highest risk, focus retention efforts on those employees, and keep them engaged and contributing happily to the organization."

Essentially, hiQ's system tracks employee activity, largely based on their LinkedIn profile and presence, and then matches it against other data points that align with likely staff movement. And given the cost of hiring and training new employees, you can see why such an app would have appeal - but LinkedIn, which is generally very protective of its data, views this is essentially "piggybacking" off of its service in order to, in some views, "steal" its traffic.

If other providers are going to profit of its platform, it makes sense that LinkedIn would be seeking to stop such, but the court decision in this case could also set a precedent that scraping of publicly posted information, regardless of where it's sourced, is a valid model. And that could open up many more cases along similar lines.

As you may recall, back in February, Twitter upset many third-party platforms by implementing new restrictions on its API usage. Those changes saw a range of well known Twitter apps go down - but under the legal terms of this finding, those platforms may actually have a case to continue their usage, depending on how its interpreted. Instagram has also been cutting off third-party providers as it seeks to impose tighter controls. This case could re-open the doors for some of these apps to get re-connected - changes to APIs are different to publicly posted information. But if companies are now allowed to build their own tools that scrape public info, that could lead to new complications.

The social platforms themselves, of course, have been seeking to tighten such access in the wake of the Cambridge Analytica scandal at Facebook. Facebook, for years, had granted data access to academic organizations and the like, till it found out that some of those groups were onselling the same for a significant profit, and for nefarious purpose. That's triggered an industry-wide squeeze on such insights - but if it's posted publicly, and outsiders can access it, maybe, according to this case, they will now be allowed to do so.

It's a small case that doesn't seem to bear any significant impacts - and as noted, the platforms can implement API restrictions and the like to make it more difficult. But the crux of this finding is that if users are posting information publicly, they're essentially inviting others to view, and use it. Does LinkedIn have the right to restrict that, even if does originate from its platform?

And if LinkedIn can't restrict such, what does that mean for future data scandals and the like - is LinkedIn then responsible if the same information is used for, say, targeting people with more specific political ads based on their career movements and likely leanings?

It's not clear cut, and there are a lot of implications tied into a seemingly minor legal case.

It'll be interesting to see how it plays out moving forward.