AllAboutVoice 2018 was the first Voice Assistant Conference in Germany.
150 attendees from various backgrounds and nationalities met to discuss
what’s happening and what’s ahead in the space of voice interfaces.
Here are my notes from the talks, along with a selection of tweets!
The conference started with an introduction by Tim Kahle from 169 Labs.
He began by showing how much the smart speakers sales and voice skills count trends
are similar to the smartphone sales and apps count trends of last decade.
Because the voice interface can be faster and more convenient to use
than other ways of consuming audio content, one impact is the
growth of audio experience usage from the average user.
From this trend, Tim described the rise of assistants. Going from Voice only
to Voice-first to Multi-modal interfaces, up to the recent release of FB Portal,
voice technologies are already changing generations (e.g. example of a kid on a
TV swiping without success): tomorrow, kids might well say “it it ain’t talking,
I had a hard time explaining my 3 years old that he can’t just shout “Alexa
stop” when he doesn’t like the car radio, I had to tell him “Alexa is not here”
Tim concluded that, with 1 Billion voice-enabled devices end of 2018,
you will soon talk to any device.
Hannes Ricklefs told us about Design and Delivery of Conversational experiences.
Leading their Voice & AI Team, Hannes shared the challenges, approaches and
learnings they found.
Conversational experiences are a new frontier, of which the BBC aims to be forefront.
There are however some challenges: staying true to their tone of
impartiality/credibility/relevance, focusing on the value for
the individuals/communities/country, remaining exciting and competitive to
We’ve been a trusted voice in the living room since the 1920s.
Some early POCs around gamified stories helped answer some questions:
- How do users interact with that?
- Do they enjoy long audio responses?
- What tools can support content creation? (They created a Web UI for conversational experience crafting)
From those insights derived some design principles:
- It’s Already a conversation (the conversation can start before the skill starts)
- It’s like Washing up: you want it to get out of the way
- It might have a Spacial reference: factor-in context
- It’s in a Public Space: privacy considerations are key
- Will it take Less Steps than web/app: voice is a convenience, not added friction
These principles fueled the BBC’s next experiments, from a Word Cup skill
that was “way more of a conversation” and a BBC for Kids interactive
experience with several personas.
Hannes concluded by sharing some learnings from all these experiments:
2 is the magic number: providing three options to choose from confused kids -> choice is costly, relevance is crucial
Don’t bore us, get to the chorus: back to the Washing up design principle of not being in the way
Take responsibility: you can’t blame the user on where the dialog goes
Being asked a question on content discovery in these voice interfaces, Hannes replied that they are not ready yet. Ultimately they want to offer it, but they chose to rely on other entry points (apps, web, …) until they can do it right via voice.
Karen Kaushansky talked about Designing the Future, with Voice.
Starting with some history of breakthroughs enabling good voice interfaces (NLU
models, specialized hardware, system-level integration), Karen then discussed
reasons for the rise of conversational interfaces. She touched on several
points: how these bidirectional channels fulfill users’ desire for control,
conversation lets us go where the users are, and app fatigue calls for
easier ways to access content. Voice also has a unique potential to be a
conduit between channels that removes friction: “Alexa, ask the Food
Channel what’s this recipe”.
The interaction model can be quite different: are you letting the user command
something, is this a way to give them superpowers?
The relationship model is also important: are we building a tool, an
assistant, a companion?
The success of voice experiences depends on building a relationship with the user.
Karen mentioned some interesting trends in voice experiences:
- You+: with immersive applications from Star Trek to the NBA, the user goes
from consumer to actor
- Multiple assistants: you get to choose which to use, to the point one can trigger another, more specialized (e.g. BMW’s car assistant)
- Most succesful VX help with conversations people love (connecting with
ppl, timely info, …) and hate (customer service, pay bills, …)
- Social robots: AIs helping social relations, e.g. Kids’ Court to settle arguments
Karen’s conclusion was a call to question: “How do we make sure these technologies make us more human?”
UX and Voice: Dr Nick Fine presented Research, Design and Testing in the New Spoken World
Nick starts by asking: "how do we bring the human experience into voice?"
His answer follows the usual UX process: research, design, test, and
He insisted on the importance of research: is this a new functionality or an
existing one ported to voice? What’s the user problem this is solving?
User needs are at the center of voice interface design.
Does voice help or hinder in this context? Let’s not cause voice fatigue
Researching context can be tricky: how does being at home/in a lab affect your users?
How does the context (acoustics, noise, user focus) affect the experience?
On testing, Nick advised to apply the standard practice of Guerilla testing:
using a fake device/conversational role-playing can help iterate faster and cheaper.
An important design consideration is the interface’s personality: what kind
of personality will your users enjoy? Do you even know your users?
This brings two challenges: understanding your user’s personality (are they
extraverts/introverts, friendly/unfriendly, dominant/submissive?); and
understanding what it implies (e.g. a friendly user enjoys a friendly
greeting, but a highly neurotic user might actually enjoy neuroticness-reducing background better)
Once those two questions are answered, the creative part begins -
leveraging several voice parameters to convey personality:
tone/cadence,gender,vocabulary, consistency (does it adapt along?)
The German rail network has more than 280k stations. How can DB make its navigation easier?
They explored building a voice assistant to search for
connections/departures/arrivals. The search experience will help filtering by travel
type/categories/direct connections, offer synonyms for the local terminology (
ICE=“long distance”), provide some personalization benefits (“Shortest path to my office”) and leverage the context (“Route to Munich Central”).
The main challenges were around UX: how do users speak with us? What kind of
info should we give, and in which way? And how do we handle Station speech
To address these, Kristian described the intensive user testing (29 sprints over
12 months!) they ran. These brought insights on what usage to expect, how much
information is appropriate, and the need to define a brand personality.
Regarding speech recognition errors of the German station names,
they were mostly due to inconsistent data formatting. The solution was to
normalize it, which was a challenge as there are more than 280K train stations in Germany.
Such a project requires resources: 19 DB colleagues from 8 departments + 3 Alexa
experts collaborated on this skill.
The next steps will be price info and voice booking, but not blindly: DB will first answer the question how does booking look like today through sales analysis.
The search experience’s reception is already quite positive: Kristian shared that a significant share of travel searches are now done through voice search; users reporting the convenience and ease of use when traveling as main reasons to prefer that interface.
Max Amordeluso gave some insights on How to build your Voice Strategy with Alexa.
Alexa’s EU Lead Evangelist, Max was the right one to give a high-level overview
of where voice is today and where it is headed.
Max started from the why: it’s All about people. We don’t do voice
interfaces because the tech is cool, but because it feels natural.
It’s Individual: voice experiences can put the user back in the center.
It’s also Communal, with 82% of Alexa households having several users.
Voice is already big today: >33% US customers use voice interfaces weekly, 10s
million devices sold.
Tomorrow looks promising: expecting by 2022 70M households with smart speakers,
spending 7.3B$ a year on retail.
Amazon is very active in this: 13 new products with Alexa this year,
Alexa Connect Kit enabling other actors with a platform.
Max prompts us to think multi-modal: new voice interfaces are no longer
voice-only, which can simplify interactions and enhance the experience.
Finally, the ecosystem is getting more connected:
Systems are getting integrated: BMW has Alexa embedded with autonomous SIM Card
Networks are getting integrated: Partnership with Skype to link user networks
Jess Williams shared learnings from building a successful voice experience.
Aiming at building a couple skills that they could monetize, Jess and her
coworkers considered what successful domain they could explore:
Interactive kid stories? Meditation/Workout apps? Games?
We decided to stop coding, and come-up with 10 ideas each
This brainstorming brought them to the idea of Guess my name.
Jess described the steps to doing it right: writing a sample dialogue to
test its flow (discovering and solving friction sources along), designing the
voice interaction model, and only then building the skill. Once built, go
for more user testing to refine it, until you feel ready for Certification
At this point the focus is mostly on polishing: having a nice logo
and description, diversifying the prompts, asking users for feedback in-skill.
Once you are ready, get a few good ratings to seed the skill, boost reviews
through your networks/your other skills/competitions/etc and continually
update with new content!
The conference concluded with two talks on the topic from very different
Thorsten Jansen from the DWF law firm hinted at the regulatory and compliance
challenges in the voice ecosystem: from mandatory info before an online sale
to the complexity of attribution in multi-actors supply chains, voice interfaces
bring a new layer of complexity to old issues and add their share of new ones.
Lars Schalkwijk from AMP discussed Sound Branding: although it’s not a
new topic (with brands with strong audio print like James Bond standing since decades
and estimated worth several billions), voice interfaces are a new facet of a brand’s audio
experience. Several challenges need to be faced by brands to be recognized
across several channels or several zones (e.g Marge Simpson’s voice being
localized, yet with the same characteristics), and a clear design process is essential.