The holy grail of speech recognition is not just getting a computer to transcribe the words you are saying, but to truly understand what those words mean. Context is that hardest thing to 'get' for programs that deal with and try to parse human language. Cracking something like effective and consistent voice recognition that is contextually appropriate would fundamentally change the way we interact with our electronic environment. So it shouldn't be surprising that the following video from Soundhound is getting quite a bit of attention:
As of this writing the video has gotten nearly 1 million views in two days, was on the front page of Reddit, and has the technosphere buzzing. The video is a demonstration by Soundhound's CEO Keyvan Mohajer of their new digital assistant program Hound. The program seems to roar post a number of the voice recognition speed bumps that have stalled such efforts in the past.
The program adroitly handles long and complicated questions. It understands the intention of inquiries through context clues, not mistaking Washington State for Washington, D.C. It is able to modify previous questions with new criteria without having to restate the original question. And it does all of it very, very quickly.
Before now, Soundhound was most known as the not-Shazam app for identifying music. Unlike Shazam, the Soundhound app allowed you to hum a tune for the app to identify. With the advent of Hound, Soundhound seems to be announcing its intention to compete with Microsoft's Cortana, Google Now, and Apple's Siri in the growing digital assistant app maret.
The app, which was in development for nine years, is, as the video's YouTube description admits, not perfect. As Popular Science found, it has trouble with some very subtle differences in inquiries, like those between "Where can I eat lunch near me?" and "Where's the closest place to eat lunch?" (It did respond to "What's the closest, cheapest option for lunch?" wth robust results.) It had trouble setting up meetings, and it's translation options are sketchy, not providing needed pronunciations.
Mohajer is aware of these shortcomings, stating that he wants to be able to do things like (verbally) feed Hound a series of ingredients and have it spit back recipes. Soundhound is also providing 'Houndify' tools that will allow interface developers to integrate verbal commands into their programs.
The most important thing to realize is that Hound seems to be another step forward in the effort to smooth over human interaction with computers and other electronic devices. Whatever the future holds, it looks to be a promising one. We've come a long way since The Simpsons made a throwaway gag out of the Apple Newton's inability to do what it was built to do. Eat up Martha? Bah!