I’m just back from VoiceCon, a two-day conference around Conversational Interfaces in Berlin. This was the first edition, happening alongside the second edition of MLConference; as one could attend talks from either conferences during the two days, here’s a recap of the ones I attended from the Voice track!
We can see the ecosystem is maturing. Gone are the days of developers fiddling around toy VUIs: with both Demand and Supply rising, the discussion now revolves around topics of User acquisition, Content discovery, and Monetization. Still, there is a long way to go: the right tools and methods are mostly to be discovered, and Developer eXperience will be crucial to the ecosystem’s growth.
Let’s see how VoiceCon speakers addressed those topics
I’ll update this post with materials from the conference as they are released. In the meantime, if you want to see how a powerful search API could help your VUI serve its users, head to alg.li/voice to find out
Wally Brill says Look Who’s Talking: Google’s Head of Conversational Design shared guidelines for creating voice personas
Voice is a medium to share information, which comes with a lot of metadata.
Having evolved with language for millenias, we cannot avoid getting these cues: as Clifford Nass wrote in Wired for Speech, we unconsciously assign a personality to a voice.
This will depend on the voice’s age, gender, education as inferred by vocabulary, register of speech…
Which brings us to personas: they are what makes two individuals different, even if they shared the same information.
Think about the difference between the classical Romeo & Juliet movie and the more recent version with new personas: the same characters inspire very different feel, as their personas are quite different.
In Voice interfaces, the experience is the brand. So your VUI’s persona will
have a huge impact on the user’s perception of your service.
Sometimes our partners say "Thanks, but we don’t want a persona for our VUI."
Yet, there is no such thing as no persona: you can either design it
mindfully or let it happen on its own.
Designing a Brand Persona
Wally shared a 4-steps design process:
Understand the brand. What are its values? Its product(s)? What happens
in its call centers (which is a great source of insights on the business processes)?
Understand the user. What are their personas? Their values? Their budget?
The use-case they want to address (and at which frequency)?
Understand the task the conversational agent should solve
Create appropriate characters based on the above
Finding the right Register
The agent’s tone and vocabulary shoud depend on its relationship to the user: Should it act as a peer? An employee? An advisor?
We need a North star, to guide all the people that are writing for our voice assistant.
-> Hence the tip of writing a Monolog: Give depth to your character by writing a paragraph or two of something they might say to a friend. This will be key to collaborating towards the same goal!
Great assistant design is conversation design: you should never start from the code!
A lot has to be considered: should it be Submissive, or act as an Authority?
Should it sould Distant, or rather Friendly?
A last tip for testing your assistant: Don’t show it in writing, it won’t be representative of the experience… Bring it to life and test it as a VUI!
Florian Hollandt helped us see Voice as a Game changer
As with any new platform, the gaming industry is at the forefront of what it
enables. Voice games are no exception: their supply is growing quickly with
Amazon & Google casting a large net for developers, seeing games as a great way
to develop their usage beyond utility skills like setting an alarm.
As for the demand side, well with every new tech people want to be entertained (who thought “smart watch games” would be a thing?)
(A point to keep in mind, there are many gamer personas: Voice games are used by
many a few minutes at all, by some a bit every day, and by a few users several
hours per week!)
Game monetization schemes are discussed since decades, but VUIs bring a new
perspective on those:
- Paid games: not working that well on mobile, even less on VUI
- Advertising: not an option for now
- In-skill payment: works, but hard to build a business on one-shots -> Subscription
- Get payed by Amazon: challenge prizes, etc… Nice but not sustainable
- Sell a product, with a skill that makes it more attractive!
- Purchase a product within a skill, sometimes indirectly (skill as funnel to shop)
- Have your skill increase your brand value
Florian shared several business models that are being proven around VUI:
- B2B: Work as agencies, provide consultancy, tooling, content, …
- Voice Game studio: companies making voice games with their own IP
- Marketing Skill: Pokemon, Duplo, SpongeBob…
- Accessory skill: buy a product, get a useful skill along
Destiny2’s Ghost: equip stuff, transfer items, …
- Trivia skill: same UX on mobile app & speaker, play together seamlessly
Escape room in a box: play with or without skill, giving extra info
- Product with a mandatory Skill component: product that is built around a skill
When in Rome: Board game with lots of audio content, skill as rulebook/Game Master
- Fan-built community skill: community integrations, amateur skills for D&D/FF/Pokemon/Poker…
- Not really a business model as this rarely pays back, but could trigger opportunities
When asked what limits a broader adoption of Voice Gaming, Florian shared
discoverability as the main challenge to solve: Voice gamers lack a great search interface!
Derek Chezzi & Nabeel Hussain helped us answer: Is Voice Relevant to your Business?
VUI supply and demand are rising. This brings new applications and contexts
where they could bring a lot of value, which makes everyone wonder: how is voice relevant to my business?
To answer this, ask yourself if voice might change:
- your collaboration methods?
- your User’s expectations?
- your Logistics, e.g. goods Delivery?
Derek and Nabeel used a made-up diaper brand as an example. The user experience looks like this:
- Ground level: going to the store, searching for the right model, purchasing
and bringing home
- Subscription models: provide more convenience, at the expense of choice
- Amazon: sudden game changer, takes all the friction away while leaving choice open
Digital shift changes the consumer mindset from ownership to access
Speed is important to users: e.g. how Shazam brings greater UX than
searching fragmented lyrics online
Humans speak 150w/m vs type 40w/m. Which is why 43% say using voice search is quicker than using a website or app.
What does it mean for our diaper brand? What can we offer to make parents’ job easier?
- Is voice critical to the job my audience tries to get done?
- Will voice improve the experience of doing it?
- Can I use voice to market my product? Do we have expertise worth sharing?
A first axis to position yourself is how core to your product the Voice experience would be?
Core Product <> Enhanced feature <> Not Core Product
Black Friday Uber skill etc…
A second criteria will help you position yourself: will you provide a
transactional experience where users act, or an informational experience
where they learn?
Transactional vs Informational
Do something vs Know something
Transact, Core : develop your skill as a value-add product
Info, Core : provide assistant features e.g. FAQ via voice
Transact, not Core : Partner with existing publishers, e.g. listing in voice marketplaces
Info, not core : Content marketing, Answer boxes on parenting tips
Robert C. Mendez talked about content discovery in Design Content for Amazon Vendors
Internet of Voice: “We make the Internet talk”
Compared to previous technologies introducing new ecosystems, voice is the
fastest tech adoption ever!
Robert took apart two kind of disruptions with Voice:
- Input: Voice complements Keyboard & Screen, but doesn’t replace
- Marketing: sites and apps can adapt more using intent, going from “who user is” to “who user is and what they want”
Big disruption in Content: make it voice-first or you’ll disappear!
This is a big risk, but also a big opportunity: Natural language based on Intents + Personalization can bring huge improvements in marketing success!
Smart Voice Shopping is an additional channel: 30% of voice users do shop with it
Currently goods purchased via voice are those that don’t need a screen
(returning purchases, well-known or commodity products, …)
These need to be solved without penalizing other use-cases: ranking factors are not specific to voice!
The bottom-line is: adapt for your customer journey.
Design voice-first, not voice-only. This is especially true as voice brings some NLU challenges, like hearing “Brad herring” instead of “Brathering”
-> Prepare the content: product’s descriptions need to be in natural language, optimizing for what your customers are most likely to say when searching your products!
Kasia Ryniak & Rafal Cymerys asked on Voice Shopping: is it the Future or a Failure?
Why are customers reluctant to voice purchasing?
Reliability : Will it do what I want?
Certainty : I need to see it!
Limitations : Can’t get to know all the product options
Habits : I’m used to buying via mobile app
These were already the arguments against e-commerce!
If we study the rise of e-commerce and m-commerce (mobile), it was not a big
leap from traditionnal commerce but lots of small leaps.
What does it take to leap from M-Commerce to V-Commerce?
- Attract more users
- Form habit of using voice
- More devices, more accessible use-cases
- Entertainment drives adoption -> the voice gaming trend is promising!
Voice Search: not only on voice speakers, which is key to onboarding users from where they already are
- First, simple actions providing value will onboard new users
- Provide great voice shopping experiences!
- Deliver value through convenience compared to previous channels
Instead of thinking Mobile first or Voice First, Kasia and Rafal recommend we think Multimodal First: design for the smaller capabilities, and embrace Progressive Enhancement practices!
Customer support is crucial to help your customers onboard
A good trick is lowering the bar of uncertainty for first experiences (e.g. reordering -> Feet in the door!)
Interesting trend: personalization of digital experiences (recommandations, virtual assistants, …) - these are opportunities for your commerce strategy that align well with what Voice Shopping can bring.
Your end goal should always be to create seamless and easy voice shopping experience!
Three tips for being ready for the voice commerce revolution
Strategize for the future, but make small steps
- Think of voice in a broader context (it’s not just about Conversational Interfaces)
- Embrace Voice Marketing
Question: can you share examples of successful voice shopping?
Kasia: Asos. They made a great shopping assistant for in-store usage, package tracking, etc.
Rafal: Domino’s. They build a reordering POC, good first step for use-case!
Comment: Voice search is quite restrictive in what we can return -> this is an opportunity to solve the paradox of choice!
Question: Examples of good personalization?
Answer: Maybe Spotiy? Or Amazon? Today we mostly saw existing great personalization coming to the voice ecosystem.
Tim Kahle & Dominik Meissner shared How we brought a world-famous quiz game on Alexa
Tim and Dominik co-founded 169 Labs, an agency creating voice user interfaces
for various brands. They shared their journey bringing a quiz game to Alexa, adapting to the voice platform while ensuring the gaming experience is seamless across devices.
This brought unique challenges, such as handling User Generated Content: one has
to update the game design to the new interface, sometimes changing the way
content is presented or queried, but also make sure that it doesn’t make the
game harder for some players (for example if you have 30 seconds to answer a
question, but users on their phone can read faster than voice users listen: this
needs to be taken into account to ensure fairness of the game for all!)
Ralf Eggert presented guidelines for Multimodal Development on Alexa
CEO@Travello, Alexa Champion
Ralph started by asking around:
Who owns an echo device with a display? 30% of the audience
Who developed a multimodal skill? 3-4 persons
Who has an APL-based skill? No-one
A definition of multimodal
Multimodality describes Communication practices in term of textual/aural/linguistic/spatial/visual resources used to compose messages.
- Many interfaces are Multimodal but not Smart: for example elevators where you input with touch and get visual/voice feedback.
- Earlier devices were headless: echo is just a slient that can send input to a voice server
June '18: 5.9% of Alexa users in the US own an Echo device with a display
But keep in mind when designing that twice more Echo Spot were sold than Echo Show!
Former model: the Display Interface
Alexa’s Display Interface provides 7 presentation templates to present data. However the templates don’t adapt well to Echo Spot: even Amazon docs show bad template display on Spot…
Layouts are static, designs are hard to test: it was time for a new approach!
Introducing APL: Alexa Presentation Language
In public beta since october, it lets you define an APL Package: such
containers can hold documents and images, to be cached on the device.
Within packages, your templates can have conditional content using
This new approach brings:
Reusability: composable blocks, imports, resources and styles
Frictionless testing on real device
Separation of concern by extracting the presentation from the logic
With a few drawbacks:
- Response is more complex
No saving option in the authoring tool
- No graphical editor yet
Start using APL today, but consider the number of sold devices and adapt
to your audience accordingly!
Joseph Jaquinta & Stacy Colella presented an approach to Designing for Voice
A core concept is Voice literacy: Users learned to navigate web and
developed some models. But they can’t leverage those in voice… This is great! It’s good to fail, it’s a learning opportunity that lets us teach new models to our users
Voice lacks what Visual takes for granted
- Linear vs Multidimensional
- Skimming/Scanning capacity
- Clear Content organization
- Optional information hierarchy
- Established navigation tools (whereas with voice navigation is in the user’s mind)
This last point brings important Memory span requirements to consider.
- Limit what you listen for
- Train users to use a limited vocabulary and grammar
- Ensure projected speech respects syntax/pronon
- Monitor accuracy of NLU
- Create new navigation tools
Don’t try to have a conversation
The biggest challenge at the moment is design
Contextual prompting: track UX and tailor responses for where the user is currently
We should have a fair idea of what they want to do next.
-> e.g. gather stats on usage, then bias prompts towards less used paths!
-> e.g. you are with your friend Raven. You can say “inventory” to know what you are carrying. (Give information in context when it is likely useful next)
- Initial experience: smooth learning curve, give the essentials only
- Tutorial: introduce more concepts, make visible the big picture
- Prompting: in context, show subtleties
Repeat as a function of relevance
Reduce repetition frequency over time
The navigation “beep”
The best user experience should be obvious in retrospect. You gotta be smart
to invent something that is obvious!
We have so much we’d like to tell the user. We can’t tell it all, they would be dead
-> Beep for more: train the user that #thebeep means "there is more information"
Super low friction!
Method of Loci
Memory palace: link some information to a physical journey
-> Spatialize options and let your user map it intuitively!
North, you’ll find the User support counter.
Bonus: users already now how to navigate maps!
-> Enables Discovery with low friction
VoiceUIs are limited compared to GUIs. But current apps are far from reaching that limit!
A note on spacial navigation via audio and how training is key:
We see visually impaired players sticking out as specially skilled to navigate audio words.
- Human speech is too complex to understand, work around it!
Educate users on what they can do (but keep track of how often you do)
- Convey the critical content, yet don’t swallow the rest
- Don’t just assume something is “too complex for voice”. Users will eventually want to explore your app’s landscape!
Jan König from Jovo talked about Build cross-platform apps for VUI
Jovo provides open-source tooling for cross platform apps, Voice CMS through subscription, and paid consulting.
Today Alexa wins according to sales numbers, but Google has a headstart on phones. So like with iOS/Android, both seem here to stay: we need to cater to both!
Voice App Development Lifecycle
Cross-platform shouldn’t just mean common denominator.
Both platforms have different conversational models. Google Assistant works like
a Dinner party analogy: I don’t know the answer to your question, let’s ask Tim who’s an expert - compare this to Alexa exposing a consistent personality with a unique voice
-> Alexa sees itself as a consistent brand <> Google Assistant as a moderator/facilitator
This brings the Challenge of different mental models: play against Alexa VS play with MauMau?
Anticipate Voice speaker’s obsolescence:
I bet we won’t have smart speakers in a few years, once voice assistants will
Context matters: speakers = morning/evening, while phones are used everywhere
Cars/earbuds are new frontiers to explore!
-> Each device will have its own killer app
-> Design Context-First
Consider bandwidth cost: some successful voice studios stream 12Gb/day of content!
Voice App Architecture: build your interfaces as logicless clients
Jan announced that Jovo V2 is coming soon: more modular and extensible, with a focus on Developer eXperience, and especially continuous integration and deployment
This focus is key to a successful developer ecosystem, as we saw from this
I’m using Jovo because I like your docs more than the platform’s
Test with real users to get real feedback!
- Improve both Logic and Language Model
- Learn how people talk, learn NLU mistakes and adapt
Finally, two last talks that were quite interesting although I didn’t take much
Jeremy Wilken presented how you can Embed Google Assistant into your own devices
Today with DIY toolkits, cheap electronics and a great voice ecosystem, building
smart devices is easier than ever. Jeremy gave some instructions and tips on
building such devices, from use-case ideas (like adding voice-command to your
smart garden plot!) to constraints (you can’t work around having an internet connection) and implementation tips (knowing what Assistant Library supports that the
gRPC version doesn’t, e.g Hotword Detection, can help you choose your technical stack).
Jochen Emig from ONSEI showed us steps to Creating a Google Action with Python
All? Not quite! A
village inhabitated by indomitable Gauls bunch of opinionated developers have come to love Python for its flexibility, ease of use, and vast standard library/ecosystem; of course they would want to keep using it in their VUIs!
In his talk, Jochen walked us through implementing a webhook for Google Actions
in Python. This results in a “best of both words” where you can both leverage
VUI tooling and powerful machine learning tools!