Most of the time, Alexa, Siri, and Bixby are handy assistants for everyday tasks, but sometimes rapidly advancing voice technology can get in the way. Voice UI (VUI) is suddenly everywhere, but there are challenges to overcome with the still-developing technology. While voice is neither a platform or the UI paradigm of the future, it is an additional UI that's here to stay.
Finding the niches it works best in, and playing to voice’s strengths, are the key challenges designers and marketers will now need to overcome.
A UI in Search of an Application
Years of work has gone into making Siri and Google Assistant answer your questions and act as a conversational partner with access to all the information you might want. Despite this, we pretty much use them to check the weather, the scores to last night’s game, and maybe pose a random query or two for the internet. Alexa showed up and gave us a greater ability to manage our home audio experience, but if we’re honest, VUI is still a novelty for most people. While it's a powerful tool, most users don’t know what they can do with it, or how to integrate it into their lives.
The old maxim in real estate is that it’s all about location, and VUI shares that quality in how we interact with devices by voice is largely determined by our location. Driving has become the hottest location for VUI as a hands-free and eyes-busy location where voice excels.
Around the home, being able to control your music while cooking or doing other chores has also proven popular, and the ability to get a quick answer to a question, without having to pull out your phone, is another useful process.
Other locations, however, are less amenable to VUI. The library, for instance, provides a good opportunity for graphical user interfaces (GUI), but is a poor place for any communication by voice. Likewise, a business meeting or conference setting is a poor environment for VUI.
Where VUI Still Doesn’t Measure Up
While VUI has come a long way, and continues to improve, it's still hampered by difficulties that turn everyday users away.
Accuracy problems continue to plague the major voice-activated systems in the US - the problem is not so much speech recognition (recent measurements have Siri, Google, Cortana, and Alexa at greater than 90% accuracy in getting the words you speak right), but even when these assistants get your input correct, they produce often hilariously bad returns. Simple queries like “I need a doctor” may yield a list of nearby doctors or a Wikipedia entry on doctors, while more complex questions often fall outside of their ability to process and understand. This is to be expected, perhaps, at this stage of development, but it remains a huge turn-off for users.
Certainly, if you use VUI to ask about the weather, the answer will begin almost instantaneously. However, the information provided often takes longer to process than a visual response. Google’s Assistant will pop up a weather card on-screen if you ask about the weather, and in the time it takes Google to read out the information about where I am, today’s forecast, and the current temperature, I’ve had enough time to scan all that information off the screen three times, as well as get an idea of the temperature forecasts for the next few hours.
A recent article complains about Alexa’s tendency to provide far too much information when you ask it to play a song - Alexa doesn’t simply start the song, it'll tell you the song it's playing, the artist performing, what the source of the song is (Spotify, your library, etc…), and what device it's playing on. For songs featuring multiple artists, this can result in a nightmarishly long introduction when you just wanted to hear some music.
Reading Lists Is Boring
Designers talk about cognitive load - this is the amount of effort required to process the information provided. Users interacting with a GUI tend to have a steady and manageable amount of attention required to process the information on-screen. VUI, however, demands no attention when not actively interfacing with it, and a high degree of attention when the user asks a question and has to listen for a response.
This presents a problem when you use VUI to deliver information that comes back in the form of a list. Ask for a quick dinner recipe for chicken, and you may be presented with a list of options to narrow that down. Unfortunately, the list may be nine entries long, and by the time you reach the end, you’ve forgotten what the third entry was. The brain simply isn’t able to carry that much information at one time - in fact, by around the seventh item on a list, your maximum attention is likely to be reached and your eyes will begin to glaze over as you head to the nearest GUI and give up on VUI.
Presenting the Future
Of course, there are solutions to the difficulties currently experienced by users with VUI, and many companies are pouring billions of dollars into finding them and improving the experience. One way in which voice interaction is improving is by making the experience more conversational for users. As design enables us to personalize results and respond in more natural ways, conversational VUI will make users more comfortable interacting with the technology.
Incorporating both voice responses and visual data in a seamless way between the screen and the speaker is key to overcoming the limitations of VUI, and creating a more natural interaction for the user. Currently, voice response is only marginally connected to any visual interaction, with devices like Amazon’s Echo are almost completely cut off from any screen. As the internet-of-things grows, it will become imperative that information finds a way to present both by voice and screen to fit the user’s needs.
Of course, technology will always be prone to error, but managing verbal faux pas effectively may tend to make users more comfortable interacting by voice. This means delivering a better response than “I’m sorry, I didn’t quite understand that”, or blindly guessing at the answer when a query isn’t well understood. By involving the user in the error correction process, the user learns more about the limitations and will grow more confident in pushing the boundaries.
Enhancement, Not a Replacement
Those who fall into the error of seeing voice interfaces as a replacement for the graphical interface fail to learn from history. Perhaps science fiction can be our guide instead - on the Starship Enterprise, the crew of the 23rd century regularly communicate with the computer by voice, while also incorporating visual and touch interaction as well. While we're likely to surpass those methods by the 23rd century, it still provides a picture of how we, collectively, imagine a seamless computer interaction should take place.
Our greatest challenges then are not so much with bringing the technology up to speed, but with imagining better ways to use the technology. VUI provides us with another tool for interacting with technology, when used well.