Facebook Automated Captions Improve Accessibility, Provide Additional Insights
Yesterday, Facebook announced the release of automatic alternative text - or automatic alt text - for images posted to Facebook. Automatic alt text uses object recognition technology to generate a description of a photo, processing each through Facebook's artificial intelligence engine to establish image content.
It's the latest advancement in Facebook's image recognition technology, a system they've been working on for the last few years, with artificial intelligence guru and New York University professor Yann LeCun at the helm. Last November, Facebook showcased the progress they'd made with their image recognition AI, with their system able to distinguish between objects in a photo 30% faster, and using 10x less training data, than previous industry benchmarks.
The live launch of automated captions show just how far their system has advanced, and while it's still not able to provide full, detailed descriptions of everything in each image, the fact that it can be reliably used at all in a live environment is relatively impressive. In the accompanying release notes, Facebook says that the system's able to identify a wide range of objects and scenes, which the research team funnels into "concepts" to keep the it on track, rather than getting confused on the specifics. At the time of launch, the team has focused the system on recognizing approximately 100 different concepts based on their prominence in Facebook photos as well as the accuracy of the visual recognition engine.
"The current concepts, for example, cover people's appearance (e.g., baby, eyeglasses, beard, smiling, jewelry), nature (outdoor, mountain, snow, sky), transportation (car, boat, airplane, bicycle), sports (tennis, swimming, stadium, baseball), and food (ice cream, pizza, dessert, coffee). And settings provided different sets of information about the image, including people (e.g., people count, smiling, child, baby), objects (car, building, tree, cloud, food), settings (inside restaurant, outdoor, nature), and other image properties (text, selfie, close-up)."
Based on these parameters, the system's able to provide highly accurate image results.
"We make sure that our object detection algorithm can detect any of these concepts with a minimum precision of 0.8 (some are as high as 0.99). Even with such a high quality bar, we can still retrieve at least one concept for more than 50 percent of photos on Facebook."
And that's a lot - every day, people share more than 2 billion photos across Facebook, Instagram, Messenger, and WhatsApp. Across all of those images, Facebook's new system's able to describe at least one key element in the majority of cases with a very high degree of accuracy (if the system's accuracy threshold falls below 80%, no caption is generated). That still leaves some room for improvement, but it's a technical feat beyond what many would believe to be possible - and it's only possible due to Facebook's massive scale and capabilities in testing and improving their image recognition engine.
Over time, the research team is planning to "keep increasing the vocabulary of automatic alt text to provide even richer descriptions".
A New Perspective
The evolution of Facebook's image recognition technology has far-reaching applications, beyond the core purpose of helping the visually impaired share in the wider Facebook experience. As with Twitter, which recently announced the addition of manual alt-text for images on their platform, the addition of photo descriptions provides a whole new data stream to work with, and a new way to gather insights and intelligence from within social networks.
Imagine if you could search social networks by the content included in images and get insights into what types of pictures are more popular and where? At the moment we can access a whole range of personalized data into what people like, what people are interested in, and those data points can provide deeply insightful correlations that tell us more about who people are and what our audiences are likely to respond to. The addition of visual context will only add to this - one day soon, you'll be able to set up alerts for not only keywords, but image content within posts too.
Say, for example, you're a pizza company and you want to know what pizzas are most popular in your region - you could set up an image alert for 'pizza' and get a complete track of every time someone in your area has posted an image including the food. Maybe you sell premium dog food - targeting people who post a lot of pictures of their dogs would likely mean you're reaching the type of audience who're willing to spend a little bit extra on their pets. You could set up an alert for your products or brand representatives, enabling you to respond to posts faster and capitalize on 'in-the-moment' buzz, and as the system advances, you might even be able to set up alerts for instances where your logo appears.
There's a whole range of ways in which image recognition could be used to track and correlate user behaviors and build a more accurate, more insightful, overview of relevant mentions. Really, you're only bound by your own imagination as to how that additional data could be utilized for brand benefit.
The addition of image recognition is a big step for social media marketing, and marketing more generally. There's a heap of ways such data could be used - travel agents, for example, could get a better idea of the places people are more likely to want to visit based on the photos they post, then target them with relevant ads accordingly. While the main focus is on assisting the vision impaired - and that, in itself, is a game changing initiative - the expanded applications for image recognition technology cannot be under-estimated.
While there's still some way to go with Facebook's image recognition AI - and no doubt people are keen to see how accurate the real-world results are before getting too far ahead of themselves - the release of this new tool shows just how far the technology has come, how much Facebook and their AI Research team have been able to develop their image recognition tools in a relatively short space of time. Such tools have the capacity to better connect the world - through greater inclusion, most directly, but through increased understanding and context more widely. In fact, Facebook themselves are already finding new uses for this technology - they've built a system that can automatically analyze satellite images of the Earth's surface and determine where people are actually living in order to determine where they should be focusing their connectivity efforts via their internet.org initiative.
There's also Facebook's controversial facial recognition AI, which is banned in some nations for being too instrusive. That technology has wide-ranging implications of its own, from security to personal tracking - and, of course, stalking, which is why it comes with various concerns. But such tools do have the capacity to change the way we think, to advance the way we're able to collect and collate data.
The addition of automatic alt text is the latest development in this movement, and while it's not been heralded as a major breakthrough, it's implications are significant, no matter how you look at it.
Automatic alt text is now available on iOS screen readers set to English, with plans to expand the functionality to other languages and platforms soon.
Follow Andrew Hutchinson on Twitter