Facebook Showcases Examples of Advanced Image Recognition AI, Future Possibilities
Social media is the greatest source of insights into human activity that we've ever had. The possibilities of that data stream are endless - there's so much that we can discover as a result of that capacity, ranging from interesting tidbits (people who like cats are more likely to be single) to potentially life-saving applications (predicting flu outbreaks). An article published in 2013 proclaimed that 90% of the world's data had been created in the preceding two years - that means all but 10% of the data you have access to today simply hasn't existed before now. And because of that rapid growth, no one knows exactly what it all means, and what it will mean in application. But we're slowly working through it - and soon, we'll have another, huge data source to refer to and assess in our calculations.
This week, Facebook outlined their latest advances in image recognition AI. Facebook's been developing their image recognition capacity for some time, with artificial intelligence guru and New York University professor Yann LeCun at the helm. Last November, Facebook showcased the progress they'd made with their thus far, with their system able to distinguish between objects in a photo 30% faster, and using 10x less training data, than previous industry benchmarks.
Those developments lead to the implementation of their new automated image caption system which was released in April, providing visually impaired users with an improved on-platform experience.
In their latest update, Facebook has outlined the additional progress they've made in this area, making specific note of their advances in image recognition accuracy and capacity.
"We've witnessed massive advances in image classification (what is in the image?) as well as object detection (where are the objects?), but this is just the beginning of understanding the most relevant visual content of any image or video. Recently we've been designing techniques that identify and segment each and every object in an image, as in rightmost panel (c) of the image below, a key capability that will enable entirely new applications."
As noted, Facebook's latest developments focus on not only how to identify image content, but how to more accurately delineate the various objects within an image frame to improve the system's accuracy. Facebook uses a three stage process to refine this, going from initial object identification, to a more specific segmentation of object boundaries within the frame via a second stage called SharpMask.
Following on from this, Facebook uses a third analytical layer called MultiPathNet which looks at each of the objects identified and seeks to clarify what they are in isolation.
As you can see, now each subject within the frame is specified and given a title, based on what the AI believes each segment to be - a donut, a sheep, a giraffe, a person.
As Facebook continues to develop and evolve their image recognition AI, their accuracy models are constantly improving - it won't be long till you're able to search for content or analyze social media data based on image content, in addition to text.
And that is a huge development - as noted by Facebook:
"There are wide ranging potential uses for visual recognition technology. Building off this existing computer vision technology and enabling computers to recognize objects in photos, for instance, it will be easier to search for specific images without an explicit tag on each photo. People with vision loss, too, will be able to understand what is in a photo their friends share because the system will be able to tell them, regardless of the caption posted alongside the image."
In addition to this, Facebook has also flagged commercial potential - imagine being able to overlay images with information based on the image content, like nutritional information about food products, eCommerce applications, like allocating product info to a photographed item, or health assessments based on visual cues.
That next level of image recognition technology is something that feels distant, unreal even, but these latest advances show that it's not as far off as you might think. The time is coming where you'll be able to search for images based on image content, to monitor visual brand mentions, as well as text ones, and to gain more understanding about your social audience based not only on what they say, but what they do, in terms of their posted photos.
What's more, Facebook is also looking to apply the same image recognition tools to video
"We've already made some progress with computer vision techniques to watch videos and understand and classify what's in them in real time. Real-time classification could help surface relevant and important Live videos on Facebook, while applying more refined techniques to detect scenes, objects, and actions over space and time could one day allow for real-time narration."
The specific mention of Live video is interesting here - one of the major challenges with live-stream content (as noted by both Meerkat and Blab in their exit announcements) is that providing compelling, entertaining live-stream content on a regular basis is tough. Live content is hard, and there are not many people who can do it well. That's why Facebook and Twitter are both working with established broadcasters to bring more high quality live content to their platforms - without regular, entertaining content, viewers simply won't come.
But video image recognition could add another dimension to this - if Facebook were able to classify video content in real time, it could push more relevant alerts to users, boosting engagement by showing you Live content you're more likely to be interested in.
A basic example - if you like watching rollercoaster videos (which, evidently, a lot of people do), Facebook could alert you to a rollercoaster live-stream in progress. If there were an event happening in a city, Facebook could gather all the relevant Live streams into a separate tab to provide a range of perspectives - this is already possible, to some degree, via the Live map, but image recognition would make it more specific.
In addition to the latest update on image recognition advances, Facebook has also announced that they're making the code for these tools available to all, in the hopes that they'll be able to advance their technology faster through open-source - another reason why that next stage might be closer than you think.
As noted, social media provides us with an amazing data source, one which we're nowhere near fully utilizing or understanding as yet. And it's about to get even deeper again.
It's worth considering the possibilities.
Follow Andrew Hutchinson on Twitter