You ask Alexa to play a song, Siri for the weather or the Google Assistant to make a call. But what happens when your toddler asks a voice-activated device a question?
Your daughter pauses, stammers, mispronounces a few words. She’s a beginner, after all.
In return: Silence. Or the familiar, default robot apology.
For such popular household technology, that’s a missed opportunity to reach every member of the family, a new University of Washington study finds. Children communicate with technology differently than adults do, and a more responsive device — one that repeats or prompts the user, for example — could be more useful to more people.
“There has to be more than ‘I’m sorry, I didn’t quite get that,'” said co-author Alexis Hiniker, an assistant professor at the UW Information School. “Voice interfaces now are designed in a cut-and-dried way that needs more nuance. Adults don’t talk to children and assume there will be perfect communication. That’s relevant here.”
The study is published in the proceedings of the 17th Interaction Design and Children Conference, held in June in Trondheim, Norway.
Nearly 40 million U.S. homes have a voice-activated assistant like an Amazon Echo or Google Home, and it’s estimated that by 2022, more than half of U.S. households will own one. While some interfaces have features specifically aimed at younger users, research has shown that these devices generally rely on the clear, precise English of adult users — and specific ones, at that. People for whom English is not their first language, or even those who have a regional accent — say, a Southern accent — tend to hit snags with smart speakers, according to a recent Washington Post analysis.
The UW study shows how children will persist in the face of a communication breakdown, treating a device as a conversation partner and in effect, showing developers how to design technologies that are more responsive to families.
“They’re being billed as whole-home assistants, providing a centralized, shared, collaborative experience,” Hiniker said. “Developers should be thinking about the whole family as a design target.”
In this study, the team recorded 14 children, ages 3 to 5 (and, indirectly, their parents), as they played a Sesame Workshop game, “Cookie Monster’s Challenge,” on a lab-issued tablet. As designed, the game features a cartoon duck waddling across the screen at random intervals; the child is asked to “say ‘quack’ like a duck!” each time he or she sees the duck, and the duck is supposed to quack back.
Only in this study, the duck has lost its quack.
That scenario was something of an accident, Hiniker said. The team, with funding from Sesame Workshop, was originally evaluating how various tablet games affect children’s executive function skills. But when they configured the tablet to record the children’s responses, researchers later learned their data-collection tool shut off the device’s ability to “hear” the child.
What the team had instead was more than 100 recordings of children trying to get the duck to quack — in effect, attempting to repair a lapse in conversation — and their parents’ efforts to help. And a study of how children communicate with nonresponsive voice technology was born.
Researchers grouped children’s communication strategies into three categories: repetition, increased volume and variation. Repetition — in this case, continuing to say “quack,” repeatedly or after pausing — was the most common approach, used 79 percent of the time. Less common among participants was speaking loudly — shouting “quack!” at the duck, for instance — and varying their response, through their pitch, tone or use of the word. (Like trying an extended “quaaaaaack!” to no avail.)
In all, children persisted in trying, without any evidence of frustration, to get the game to work more than 75 percent of the time; frustration surfaced in fewer than one-fourth of the recordings And in only six recordings, children asked an adult to help.
Parents were happy to do so — but, the team found, they were also quick to determine something was wrong and take a break from the game. Adults usually suggested the child try again and took a shot at responding, themselves; once they pronounced the game broken — and only then — did the child agree to stop trying.
The results represented a series of real-life strategies families use when faced with a “broken” or uncommunicative device, Hiniker said. The scenarios also provided a window into young children’s early communication processes.
“Adults are good at recognizing what a child wants to say and filling in for the child,” Hiniker said. “A device could also be designed to engage in partial understanding, to help the child go one step further.”
For example, a child might ask a smart speaker to play “Wheels on the Bus,” but if the device doesn’t pick up the full name of the song, it could respond with, “Play what?” or fill in part of the title, prompting the child for the rest.
Such responses would be useful even among adults, Hiniker pointed out. Person-to-person conversation, at any age, is filled with little mistakes, and finding ways to repair such disfluencies should be the future of voice interfaces.
“AI is getting more sophisticated all the time, so it’s about how to design these technologies in the first place,” Hiniker said. “Instead of focusing on how to get the response completely right, how could we take a step toward a shared understanding?”
Hiniker has launched another study into how diverse, intergenerational families use smart speakers, and what communication needs emerge.