Designers are sometimes accused of being too finicky about details that don’t really matter. If it works, then it’s good enough, even if it isn’t pretty, right? I have a story that shows how good enough can be a tempting, but dangerous philosophy to adopt.
I have a love/hate relationship with the speech system in my car. I drive a four-year-old Hyundai, so it’s not the newest or most sophisticated system available. Like most automotive speech systems, it pairs with my phone via Bluetooth so the system has access to my contact list. My sole use case is dialing by name, saying things like “Call Paul Sherman on mobile.”
I love this interaction because about 90% of the time, my husband’s phone is ringing a few seconds later and I forget all about the system. This is somewhat miraculous because the system’s response to my request breaks so many rules of voice interaction design:
Calling Paul Sherman on mobile. Say yes to proceed. Otherwise say back or cancel.
The first sentence is an echo of my request that confirms what the system heard, which is fine but the rest of the response goes wrong in multiple ways. “Say yes to proceed,” is backwards; it’s the equivalent of the waiter saying “Say yes if you’d like more bread.” The word proceed itself is odd because we don’t typically think of proceeding with phone calls. Another odd thing is that I’m supposed to say “yes” if everything is good, but where is the “no” option? In my car, the options are yes, back, or cancel, not yes or no. And what’s the difference between back and cancel? If something’s wrong, I want to cancel, but doesn’t that also mean going back? [End of rant.]
My point here is not that there are a lot of things wrong with the system’s response, but instead, that I don’t care anymore. Why? Because the system offers me something useful, it works, and doesn’t get in the way of completing my task. So maybe despite my persnickety analysis, I think you can argue that this design is good enough.
But that analysis breaks down for the case where the user tries to make a call one of two similar sounding names from your phone’s contacts. A prime example for me are my mother and brother, listed in my contacts as Mom Hura and Thom Hura. When my car’s speech system is unsure which of the two I want, this is the flow of the conversation:
Susan: Call Thom Hura on mobile.
System: Who do you want to call? 1: Mom Hura. 2: Thom Hura.
System: Calling Thom Hura on mobile. Say yes to proceed. Otherwise say back or cancel.
Susan: Yes. (accompanied by disgruntled muttering under my breath.)
The system’s response that seemed a bit clunky in the first case is surprisingly awful in this new context. The issues that were easily ignored on the happy path spoil the interaction in the (relatively common) case of disambiguation.
In this case, the good enough response takes me out of the flow of making a call and draws my attention toward the interaction. The problem is that the system is double-confirming who I want to call. By choosing between the two options in the disambiguation question, I have already confirmed that I do indeed want to call Thom. When the system then requires me to explicitly confirm my choice again, it introduces friction into the experience. The speech system is no longer an invisible and intuitive tool for getting things done, it’s an obstacle I have to overcome.
It’s important to note that the culprit here is not speech recognition failure. A person might just as easily mishear my request and ask me to clarify if I meant Mom or Thom; among humans, this clarification question doesn’t necessarily derail the conversation. The problem with my car’s system is not that it needed disambiguation, but that the design was not good enough to support this interaction. It’s a design problem, not a technical one.
Exactly this sort of design issue is in danger of spoiling a new generation of speech interactions. New conversational technologies are emerging quickly and have technical capabilities that significantly outpace older automated speech systems. The risk for new speech systems is that they will succumb to a good enough design philosophy. There is a strong desire among conversational technologies not to be lumped in with older, poorly received speech technologies. In the rush forward, let’s not forget the lessons we’ve learned in the past 20 years about the vital role of voice interaction design. A quick perusal of ratings for Amazon Alexa skills, for instance, shows that customers are not satisfied with the experiences they’re currently being offered, and the main complaints are about the interactions, not about technical performance. The next generation of conversational interfaces need to strive to be more than good enough.