With social interactions becoming more visual, if social platforms want to be able to keep up, they need to develop ways to detect and classify image and video content, enabling them to better surface relevant posts in search, and better detect potentially offensive material.
On this front, Facebook has been working for years on its advanced image recognition technology, which can now automatically categorize images based on their content. For example, run a search for "black shirt photo," and Facebook’s system is able to "see" whether there's a black shirt in a photo, and search based on that, even if the photo wasn't tagged with that information. You can also search for a location or event, as shown in this example.
But that’s only the start – while Facebook’s image recognition tools have continued to evolve, the changing way in which people use images has also forced Facebook’s team to come up with additional elements and qualifiers to help detect and categorize content.
For example, memes have become a popular social sharing option, and generally contain text overlaid on an image. Is it possible for Facebook to extract that text and use that as another data point?
This is the focus of Facebook’s new Rosetta text in images detection system, which covers not just memes, but any text contained within an image posted on Facebook or Instagram.
The Rosetta system, Facebook says, is already extracting text from “more than a billion public Facebook and Instagram images and video frames (in a wide variety of languages), daily and in real time”.
That’s a huge amount of extra data points, which will facilitate a wide range of uses. For one, it'll provide more context for visually impaired users, while it will also enable better search and discovery of relevant content, based on visual cues.
For brands, the technology could also have significant utility. A couple of examples:
- In being able to search for images based on text, you could find people who already buy your products, or related products, if those items are visible in the background of images. This could enable you to reach out to these users with related offers
- By being able to detect that certain users wear clothes with your branding on them, and regularly post images in those clothes, you could target those users and provide them with special offers, enabling you to not only reach people who are more likely to be interested in such offers, but who you know will also likely continue to post images in the same, giving you an additional promotional boost
- If image data is provided as another insights tool, you could gain more perspective on your target audience by cross-matching their product purchases (based on image recognition) with their other usage and demographic data points, helping to target your outreach.
There’s a wide range of ways in which image recognition can be used – and what’s more, Facebook's also improving its text translation tools, with an extra 24 languages added into their automatic translation services this week.
These advancing processes provide a whole new range of research and discovery implications, but with the volume of visual posts increasing, it’s likely the text in image tools that will provide the most significant shift.
“The naive approach of applying image-based text extraction to every single video frame is not scalable, because of the massive growth of videos on the platform, and would only lead to wasted computational resources. Recently, 3D convolutions have been gaining wide adoption given their ability to model temporal domain in addition to spatial domain. We are beginning to explore ways to apply 3D convolutions for smarter selection of video frames of interest for text extraction.”
The capacity to search through the billions of posts and updates across Facebook and Instagram each day, based on more advanced methodology, will open up a huge range of new opportunities. It takes times, but Facebook’s systems are advancing, and will provide increased utility on this front in the very near future.