Designing a VUI – Voice User Interface

Discover why conversational UIs and voice apps are surging in popularity and learn how to design voice user interfaces (VUIs) for both mobile and smart home speakers.

Designing a VUI – Voice User Interface

By Frederik Goossens

Frederik is a certified UX designer and product owner. He is a creative thinker experienced with user research and business analysis.

PREVIOUSLY AT

More and more voice-controlled devices, such as the Apple HomePod , Google Home , and Amazon Echo , are storming the market. Voice user interfaces are helping to improve all kinds of different user experiences, and some believe that voice will power 50% of all searches by 2020.

Voice-enabled AI can take care of almost anything in an instant.

  • “What’s next in my Calendar?”
  • “Book me a taxi to Oxford Street.”
  • “Play me some Jazz on Spotify!”

All five of the “Big Five” tech companies—Microsoft, Google, Amazon, Apple, and Facebook—have developed (or are currently developing) voice-enabled AI assistants. Siri, the AI assistant for Apple iOS and HomePod devices, is helping more than 40 million users per month , and according to ComScore , one in 10 households in the US already own a smart speaker today.

Whether we’re talking about VUIs (Voice User Interfaces) for mobile apps or for smart home speakers, voice interactions are becoming more common in today’s technology, especially since screen fatigue is a concern.

Amazon

What Can Users Do with Voice Commands?

Alexa is the AI assistant for voice-enabled Amazon devices like the Echo smart speaker and Kindle Fire tablet—Amazon is currently leading the way with voice technology (in terms of sales).

On the Alexa store, some of the trendiest apps (called “skills”) are focused on entertainment, translation, and news, although users can also perform actions like request a ride via the Uber skill, play some music via the Spotify skill, or even order a pizza via the Domino’s skill.

Another interesting example comes from commercial bank Capital One, which introduced an Alexa skill in 2016 and was the first bank to do so. By adding the Capital One skill via Alexa, customers can check their balance and due dates and even settle their credit card bill. PayPal took the concept a step further by allowing users to make payments via Siri on either iOS or the Apple HomePod, and there’s also an Alexa skill for PayPal that can accomplish this.

But what VUIs can do, and what users are actually using them for, are two different things.

ComScore stated that over half of the users that own a smart speaker use their device for asking general questions, checking the weather, and streaming music, closely followed by managing their alarm, to-do list, and calendar (note that these tasks are fairly basic by nature).

As you can see, a lot of these tasks involve asking a question (i.e., voice search).

Statistics for smart speaker usage in the US

What Do Users Search for with Voice Search?

People mostly use voice search when driving, although any situation where the user isn’t able to touch a screen (e.g., when cooking or exercising, or when trying to multitask at work), offers an opportunity for voice interactions. Here’s the full breakdown by HigherVisibility .

Android Auto voice app and voice user interface

Conducting User Research for Voice User Interface Design

While it’s useful to know how users are generally using voice, it’s important for UX designers to conduct their own user research specific to the VUI app that they’re designing.

Customer Journey Mapping

User research is about understanding the needs, behaviors and motivations of the user through observation and feedback. A customer journey map that includes voice as a channel can not only help user experience researchers identify the needs of users at the various stages of engagement, but it can also help them see how and where voice can be a method of interaction.

In the scenario that a customer journey map has yet to be created, the designer should highlight where voice interactions would factor into the user flow (this could be highlighted as an opportunity, a channel, or a touchpoint). If a customer journey map already exists for the business, then designers should see if the user flow can be improved with voice interactions.

For example, if customers are always asking a certain question via social media or live support chat, then maybe that’s a conversation that can be integrated into the voice app.

In short, design should solve problems. What frictions and frustrations do users encounter during a customer journey?

VUI Competitor Analysis

Through competitor analysis , designers should try to find out if and how competitors are implementing voice interactions. The key questions to ask are:

  • What’s the use case for their app?
  • What voice commands do they use?
  • What are customers saying in the app reviews and what can we learn from this?

US-based full-time freelance UI designers wanted

To design a voice UI for an app, we first need to define the users’ requirements. Aside from creating a customer journey map and conducting competitor analysis (as mentioned above), other research activities such as interviewing and user testing can also be useful.

For voice interface design, these written requirements are all the more important since they will encompass most of the design specs for developers. The first step is to capture the different scenarios before turning them into a conversational dialog flow between the user and the voice assistant.

An example user story for the news application could be:

“As a user, I want the voice assistant to read the latest news articles so that I can be updated about what’s happening without having to look at my screen.”

With this user story in mind, we can then design a dialog flow for it.

issuing a voice command for voice controlled user interface

The Anatomy of a Voice Command

Before a dialog flow can be created, designers first need to understand the anatomy of a voice command. When designing VUIs, designers constantly need to think about the objective of the voice interactions (i.e., What is the user trying to accomplish in this scenario? ).

A users’ voice command consists of three key factors: the intent , utterance , and slot .

Let’s analyze the following request: “Play some relaxing music on Spotify.”

Intent (the Objective of the Voice Interaction)

The intent represents the broader objective of a users’ voice command, and this can be either a low utility or high utility interaction .

A high utility interaction is about performing a very specific task, such as requesting that the lights in the sitting room be turned off, or that the shower be a certain temperature. Designing these requests is straightforward since it’s very clear what’s expected from the AI assistant.

Low utility requests are more vague and harder to decipher. For example, if the user wanted to hear more about Amsterdam, we’d first want to check whether or not this fits into the scope of the service and then ask the user more questions to better understand the request.

In the given example, the intent is evident: The user wants to hear music.

Utterance (How the User Phrases a Command)

An utterance reflects how the user phrases their request. In the given example, we know that the user wants to play music on Spotify by saying “Play me…,” but this isn’t the only way that a user could make this request. For example, the user could also say, “I want to hear music … .”

Designers need to consider every variation of utterance. This will help the AI engine to recognize the request and link it to the right action or response.

Slots (the Required or Optional Variables)

Sometimes an intent alone is not enough and more information is required from the user in order to fulfill the request. Alexa calls this a “slot,” and slots are like traditional form fields in the sense that they can be optional or required, depending on what’s needed to complete the request.

In our case, the slot is “relaxing,” but since the request can still be completed without it, this slot is optional. However, in the case that the user wants to book a taxi, the slot would be the destination, and it would be required. Optional inputs overwrite any default values; for example, a user requesting a taxi to arrive at 4 p.m. would overwrite the default value of “as soon as possible.”

Prototyping VUI Conversations with Dialog Flows

Prototyping designers need to think like a scriptwriter and design dialog flows for each of these requirements. A dialog flow is a deliverable that outlines the following:

  • Keywords that lead to the interaction
  • Branches that represent where the conversation could lead to
  • Example dialogs for both the user and the assistant

A dialog flow is a script that illustrates the back-and-forth conversation between the user and the voice assistant. A dialog flow is like a prototype, and it can be depicted as an illustration (like in the example below), or there are prototyping apps that can be used to create dialog flows.

An illustration of a dialog flow for VUI design

Apps for Prototyping VUIs

Once you’ve mapped out the dialog flows, you’re ready to prototype the voice interactions using an app. A few prototyping tools have entered the market already; for example, Sayspring makes it easy for designers to create a working prototype for voice-enabled Amazon and Google apps.

Prototyping VUI apps with Sayspring

Amazon also offers their own Alexa Skill Builder , which makes it easy for designers to create new Alexa Skills. Google offers an SDK; however, this is aimed at Google Action developers . Apple hasn’t launched their competing tool yet, but they’ll soon be launching SiriKit.

Amazon

UX Analytics for Voice Apps

Once you’ve rolled out a “skill” for Alexa (or an “action” for Google), you can track how the app is being used with analytics. Both companies offer a built-in analytics tool; however, you can also integrate a third-party service for more elaborate analytics (such as voicelabs.co for Amazon Alexa, or dashbot.io for Google Assistant). Some of the key metrics to keep an eye out for are:

  • Engagement metrics, such as sessions per user or messages per session
  • Languages used
  • Behavior flows
  • Messages, intents, and utterances

Alexa

Practical Tips for VUI Design

Keep the communication simple and conversational.

When designing mobile apps and websites, designers have to think about what information is primary, and what information is secondary (i.e., not as important). Users don’t want to feel overloaded, but at the same time, they need enough information to complete their task.

When designing UI for voice commands, designers have to be even more careful because words (and maybe a relatively simple GUI) are all that there is to communicate with. This makes it especially difficult in the case of conveying complex information and data. This means that fewer words are better, and designers need to make sure that the app fulfills the users’ objective and stays strictly conversational.

Confirm When a Task Has Been Completed

When designing an eCommerce checkout flow, one of the key screens will be the final confirmation. This lets the customer know that the transaction has been successfully recorded.

The same concept applies to voice assistant UI design. For example, if a user were in the sitting room asking their voice assistant to turn off the lights in the bathroom, without a confirmation, they’d need to walk into the sitting room and check, defeating the object of a “hands-off” VUI app entirely.

In this scenario, a “Bathroom lights turned off” response will do fine.

Create a Strong Error Strategy

As a VUI designer, it’s important to have a strong error strategy. Always design for the scenario where the assistant doesn’t understand or doesn’t hear anything at all. Analytics can also be used to identify wrong turns and misinterpretations so that the error strategy can be improved.

Some of the key questions to ask when checking for alternate dialogs:

  • Have you identified the objective of the interaction?
  • Can the AI interpret the information spoken by the user?
  • Does the AI require more information from the user in order to fulfill the request?
  • Are we able to deliver what the user has asked for?

Add an Extra Layer of Security

Google Assistant, Siri, and Alexa can now recognize individual voices. This adds a layer of security similar to Face ID or Touch ID. Voice recognition software is constantly improving, and it’s becoming harder and harder to imitate voice; however, at this moment in time, it may not be secure enough and an additional authentication may be required. When working with sensitive data, designers may need to include an extra authentication step such as fingerprint, password, or face recognition. This is especially true in the case of personal messaging and payments.

Duer voice assistant with face recognition software

The Dawn of the VUI Revolution

VUIs are here to stay and will be integrated into more and more products in the coming years. Some predict we will not use keyboards in 10 years to interact with computers.

Still, when we think “user experience,” we tend to think about what we can see and touch. As a consequence, voice as a method of interaction is rarely considered. However, voice and visuals are not mutually exclusive when designing user experiences—they both add value.

User research needs to answer the question on whether or not voice will improve the UX and, considering how quickly the market share for voice-enabled devices is rising, doing this research could be well worth the time and significantly increase the value and quality of an app.

Further Reading on the Toptal Blog:

  • The World Is Our Interface: The Evolution of UI Design
  • Future UI and the End of Design Sandboxes
  • Wearable Technology: How and Why It Works
  • Design With Precision: An Adobe XD Review
  • Voice of the Customer: How to Leverage User Insights for Better UX
  • Chatbot UX: Design Tips and Considerations
  • Designing for Tomorrow: Addressing UI Challenges in Emerging Interfaces (With Infographic)

Understanding the basics

What is a tangible user interface.

A tangible user interface is one that can be interacted with via taps, swipes and other physical gestures. Tangible user interfaces are commonly seen on touchscreen devices.

What is a speech interface?

A speech interface, better known as a VUI (Voice User Interface), is an invisible interface that requires voice to interact with it. A common device that has voice recognition software is the Amazon Alexa smart speaker.

What does an Echo do?

Amazon’s Echo smart speaker uses voice recognition software to help users perform tasks using voice interactions, even if they’re across the other side of the room. Echo smart speakers are powered by a voice assistant called Alexa, and VUI apps called “Skills.”

  • Prototyping

Frederik Goossens

London, United Kingdom

Member since August 7, 2017

About the author

World-class articles, delivered weekly.

Subscription implies consent to our privacy policy

Toptal Designers

  • Adobe Creative Suite Experts
  • Agile Designers
  • AI Designers
  • Art Direction Experts
  • Augmented Reality Designers
  • Axure Experts
  • Brand Designers
  • Creative Directors
  • Dashboard Designers
  • Digital Product Designers
  • E-commerce Website Designers
  • Full-Stack Designers
  • Information Architecture Experts
  • Interactive Designers
  • Mobile App Designers
  • Mockup Designers
  • Presentation Designers
  • Prototype Designers
  • SaaS Designers
  • Sketch Experts
  • Squarespace Designers
  • User Flow Designers
  • User Research Designers
  • Virtual Reality Designers
  • Visual Designers
  • Wireframing Experts
  • View More Freelance Designers

Join the Toptal ® community.

Image

  • Content Marketing
  • On Page Optimization

Image

A Definitive Guide to Voice User Interface Design (VUI)

Image

Voice-controlled devices are on the rise today. One of Google’s articles, “How voice assistance is reshaping consumer behavior,” about 70% of requests to the Google Assistant are made in natural languages instead of keywords people type on a web page. In addition, 41% of those with smart speakers feel like talking to a real person. 

Many experts predict that voice user interface design will revolutionize how we interact with computers in the next decade. This post looks at voice user interface design and the critical aspects of designing visual interfaces.

Table of Contents

Understanding Voice User Interface (VUI)

A Voice user interface is designed to allow users to interact with a device through voice commands. Increased use of digital devices is known to cause fatigue which has given rise to the development and use of voice user interfaces.

With VUIs, users don’t have to look at the screen to control devices and apps. The world’s leading tech companies like Amazon, Google, Facebook, Apple, and Microsoft have developed (or are developing) voice-controlled devices and voice-enabled AI assistants . Great examples include Google’s assistant, Apple’s Siri, and Alexa from Amazon.

Besides the AI assistants, smart devices are available on the market today, including Apple HomePod, Google Home, and Amazon Echo. Voice interface and interactions can only become more popular in the future. According to Smart Audio Report , 25% of US adults own a smart speaker, and 33% of the US population uses voice search features. 

If you plan to create a voice user interface design, ensure you understand how it works to create a VUI design that won’t frustrate users but provides a better user experience .

Why Voice User Interface Design Matters

With the leading tech giants investing millions in voice technology, one may wonder whether voice technology will replace screens. While that is yet to be achieved, VUI is on an upward trajectory and taking off at a significant speed. Here’s why:

The Technology

Artificial intelligence (AI) is gaining momentum, with many tech companies embracing it. Some experts think that we might experience a robot takeover in the future. Thanks to AI and cloud computing, machines can understand many human speech variations more accurately than before.

Intuition is a critical component in speech communication. Other than mind-reading, speech has the least friction than other communication methods.

Voice offers an excellent opportunity to build mutual trust, friendship, and affinity. Building a positive rapport is beneficial to a company and will help create a better user experience that will give repeat customers.

user research voice interaction

How Does a Voice Interface Work?

Different Artificial Intelligence (AI) technologies, including Automatic Speech Recognition, Speech Synthesis, and Name Entity Recognition make the voice user interface. You can add voice UIs to devices or inside applications.

The VUI processes the user’s voice and speech. They are backed by AI technology that enables them to understand the user’s intent and provide a response. The VUIs speech components are stored in a private or public cloud.

Like most companies, you may want to include a graphic user interface (GUI) to the VUIs for a better user experience. Visual and additional sound effects allow the user to know whether the device is listening, processing speech, or giving a response.

Advantages and Disadvantages of Voice Interface

There are endless possibilities when exploring the benefits of VUI. These includes:

  • Ease of use: Users who cannot get along with technological devices can take advantage of voice to request tasks from AI assistants and VUI devices.
  • Saves time: It takes less time to dictate than typing text messages when requesting a task. Voice is more convenient for users than typing.
  • Eyes-free: VUI will come in handy if you need an eyes-free experience. This is especially the case if you experience screen fatigue issues or when you need to focus on a task rather than the device.
  • Hands-free: Sometimes, it is more practical to speak than type. This is the case when cooking, driving, and doing similar tasks.

What are the Disadvantages of VUI?

  • Misinterpretation: Voice recognition software is not without flaws. For instance, it may not understand and interpret the language context, leading to errors and misinterpretation. Besides, VUIs may not differentiate homonyms like ‘real’ and ‘reel’ or ‘road’ and ‘rode,’ leading to misinterpretation.
  • Privacy concerns in privacy spaces: Many users will find it hard to give voice commands to devices in public spaces due to privacy concerns and noise.

The Difference Between Voice-Only Interactions and Multimodal Ones

A multimodal interface is where you eliminate the need to use your hands but want to see the results of your voice commands on a screen. A great example of a multimodal interface is a voice-controlled TV. With this interface, a user can view more information than a voice-only device. When it comes to voice-only devices, you need to consider cognitive overload and the quality and speed of information delivery.

Let’s look at the example below to shed more light on the difference between voice-only and multimodal interactions.

Suppose you want a CBD cookbook. A voice-only device will read the result for you at a reasonable pace. On the other hand, a multimodal device will display different results on your device and you could command it to open your preferred option from the list. While you will control the device with voice, you will see the results on the screen.

That means designers should consider both voice-only and multimodal interfaces when designing devices and apps.

VUI Design Fundamental Properties

Before we look at how to design a voice user interface, let’s look at the crucial properties of VUI design:

Hand-and-Eye Free

You need to create a voice-first user interface design even when the VUI device has a screen . While the screen makes the voice interaction better, the user should be able to complete the operation without looking at the screen.

Of course, some tasks cannot be completed by voice alone. However, that doesn’t mean creating actions requiring users to rely heavily on the screen alone to complete tasks. If you have a task that relies on a screen, create a case where users start with voice before switching to a visual interface.

Tone of Voice

Voice is way more than a medium of interaction. Listening to someone (even for a few seconds), you learn a lot about them—gender, age, education, trustworthiness, intelligence, etc.

As such, you need to give your VUI a personality. It needs to match your brand values and be specific to evoke a unique personality.

Personalization

Personalization is another critical component of VUI design. Personalization goes beyond identifying a user name—it is about identifying unique user needs and creating information that matches them.

VUI provides an excellent opportunity for product designers to personalize each user’s interaction. It helps identify new and returning users and create user profiles. After all, as the system learns more about the users, it offers a more personalized experience.

Human-Like Conversation

No one wants to feel like they are communicating with a robot, and your VUI is not exceptional. The conversation should be natural—resemble a natural human conversation. If your system requires users to remember certain phrases to perform specific tasks, you are getting it wrong.

As a rule of thumb, let users use their everyday language. If the commands are unclear, something is wrong, and a redesign may be necessary.

You cannot create a robust user engagement without trust. Trust is a critical component of a good user experience. Creating good interaction with the voice interface is a great way to create trust.

Some of the ways you can achieve this include:

  • Be careful with private data: Don’t verbalize sensitive data as it may lead to privacy issues, especially since the users might not be alone
  • Avoid pure promotional content: No one wants to be sold to. Avoid mentioning brands or products out of content as users may view it too salesy
  • No offensive content: Introduce sensitive changes by age or region

How to Design a Voice User Interface

Designing a voice user interface is different from any other UX project. In this section, we look at the process of VUI design.

Conduct User Research

To identify problems and users’ pain points, you need to conduct user research. User research will help understand the interaction between the user persona and an assistant in different engagement stages.

Aim to understand the needs, behaviors, and motivations of the user. The goal is to understand how you can use voice as an interaction method in the customer journey map. Is there an opportunity where voice interactions can help enhance the user flow? If you are yet to create the customer journey, think about how you can implement voice interactions as an opportunity in the user flow. If the user journey is already created, focus on seeing how voice interactions can improve the customer journey.

Ideally, you need to solve users’ problems to improve the user flow.

Competitor Analysis

A VUI competitor analysis is critical to determine how competitors implement voice interactions. Some of the factors to focus on when analyzing a competitor’s product include:

  • The type of voice command they use
  • Customer reviews
  • The use cases of the app

Use the information to design a better product.

Define User Requirements

User research and competitor research are not enough. Conducting interviews and user testing will help define users’ pain points and requirements. This way, you will focus on different scenarios before creating conversation flows. Note user requirements with user stories and design dialog flows for each. Next, prototype VUI conversations using the dialog showing the interaction between the user and the device.

Key things to remember when prototyping VUI conversation with dialog flows include:

  • Keep the interaction conversational and simple
  • Have a strong error strategy
  • Confirm when a task is completed
  • Create an additional layer of strong security

The dialog flows guide users in the customer journey map. It should consist of:

  • Keywords that encourage the interaction—this includes voice triggers such as “Hello @username.”
  • Branches showing the direction of the conversation
  • Sample dialogs

Ideally, a dialog flow is a script with the entire conversation. Some apps that can make the process of creating dialog flow simple include:

It is also important to test your dialogs. Ideally, start testing your VUI designs as soon as you have the sample dialogs. Getting feedback during the design process helps identify usability issues and fix them early enough.

A great way to test out dialog is to act it out. Have one person act as the system and the other as the user. As you practice the scripts, focus on how they sound when spoken aloud.

However, it is crucial to remember that non-verbal language does not apply to VUIs systems. Ensure that the participants don’t have eye contact when testing your dialogs.

Another way to test your dialogs is by observing actual user behavior. Take note of users who use your product for the first time and observe any usability issues.

Consider the Anatomy of a Voice Command

VUI designers need to think about the possible interaction scenarios and objectives—what exactly does the user want to achieve. A user command consists of three factors: Intent, utterance, and slot.

This refers to the primary objective of the user’s voice command. Voice interactions are categorized into two: low utility and high utility interaction.

A low utility interaction involves vague and hard-to-decipher tasks. For instance, when a user needs more information about a topic. The user interface needs to confirm whether the information is available in its service scope before asking more questions to understand and respond better.

On the other hand, a high utility interaction involves specific tasks like requesting lights in the bedroom be turned off.

It refers to how a user utters the voice command to trigger a task. While some phrases for requests can be easy to understand, UX designers should not ignore other variations. For instance, instead of saying, “play me song X,” a user could say, “could you play song X.”

Designers should consider these variations to make it easier for AI to understand and respond to requests.

Slots can either be optional or a requirement depending on the task. For instance, if a user requests music on Spotify, they may say, “play me music.” Since the AI can respond to the request without the variable, the slot here is optional. However, if a user wants to book a reservation at a specific time, the slot here will be the time and is a requirement.

Industries Likely to be Impacted by Voice User Interface

While all industries can use voice interactions, some will likely experience the most significant impact. These include:

Visually Impaired Devices

The popularity of VUI that benefits the whole population will bring large improvements in voice services for people with visual impairments who rely on them. Visually-impaired persons have long relied on things like screen-readers, but the experience has its shortcomings. These people will experience a wealth of online information with devices custom-designed for voice.

The automobile industry can significantly benefit from voice interaction due to these reasons:

  • Operating other devices while driving is not recommended
  • Looking at graphical interfaces while driving can lead to accidents
  • It allows extended periods of uninterrupted driving
  • Many customers are not new to voice assistants in cars

Enabling voice command in cars will be a natural improvement. For example, “find an ATM nearby,” play music, “read my emails,” etc.

Googling things like this while driving is not just inconvenient but will put you at risk. Improved voice command means better user experience and safety.

Customer Service

As machines become increasingly reliable, it will significantly impact call centers, and artificially-intelligent bots may complete direct interactions in the future, giving people more time to deal with more complex issues.

Wearable Electronics

Many wearable electronics rely on an operating system or smartphones to access information. However, we could see wearables that interact through voice eliminate this intermediary to perform various functions.

Quote Request Form

How urgent is this project* Low Medium High

Your Marketing Budget?* $3,500 $4,995 $6,795

What services may we assist you with? * Content Creation Digital PR Technical SEO Web Design & Development Search Engine Marketing Consultation

Product Design Bundle and save

User Research New

Content Design

UX Design Fundamentals

Software and Coding Fundamentals for UX

  • UX training for teams
  • Hire our alumni
  • Journal of UX Leadership
  • Our mission
  • Advisory Council

Education for every phase of your UX career

Professional Diploma

Learn the full user experience (UX) process from research to interaction design to prototyping.

Combine the UX Diploma with the UI Certificate to pursue a career as a product designer.

Professional Certificates

Learn how to plan, execute, analyse and communicate user research effectively.

Learn the principles of content design, from mastering tone and style, to writing for interfaces.

Understand the fundamentals of UI elements and design systems, as well as the role of UI in UX.

Short Courses

Gain a solid foundation in the philosophy, principles and methods of user experience design.

Learn the essentials of software development so you can work more effectively with developers.

Give your team the skills, knowledge and mindset to create great digital products

Join our hiring programme and access our list of certified professionals.

Learn about our mission to set the global standard in UX education

Meet our leadership team with UX and education expertise

Members of the council connect us to the wider UX industry

Our team are available to answer any of your questions

Fresh insights from experts, alumni and the wider design community

Read stories from our students who have made successful careers in UX after completing our course

Designing for voice interfaces: The opportunities and challenges of UX design

There was a time when user interfaces could only be interacted with by a touchscreen, a keyboard, or a mouse. Today, however, there’s been an explosion in voice user interfaces—allowing for hands-free interactions using just our voices.  

Free course promotion image

The State of UX Hiring Report 2024

Learn how to start your UX career with hard facts and practical advice from those who have gone before you. In this report, we look at UX hiring trends in 2024 to help you break into the industry.

voice ux design blog header image

These voice interfaces have a lot of promise, but they also come with many challenges.  In some cases, they can be extremely helpful—for example, when Siri tells you a fact you wanted to know or Alexa plays your favourite music when you ask. In others, they can present obstacles — for example, if you call a store and a chat bot picks up, it can take a long time to get to what you want. 

UX designers need to understand all the challenges and opportunities for voice UX design. This article will give you the tools to get there. We’ll cover:

What are voice interfaces and what is voice UX design?

3 opportunities of voice ux design, 3 challenges of voice ux design , how to get started with ux design for voice interfaces, voice ux design best practices.

Let’s get started!

[GET CERTIFIED IN UX]

Voice user interfaces are made to interact with users through voice commands. Voice interfaces are used in a number of settings including chat bots, AI assistants such as Google Assistant and Amazon’s Alexa, smart devices like Amazon Echo and Google Home, and many others. Any user interface where you interact through voice and don’t need to look at a screen is a voice interface.

It should come as no surprise that the big difference between traditional and voice UX design is a lack of visuals. In fact, voice UX designers must figure out how to create a system that’s easily understood and used without visuals. Instead, tasks must be completed easily through voice commands. Any graphical interface, if there is one, is secondary.

Voice UX design is similar to other forms of UX design in that the tools that they use are similar. Voice UX designers do user research , create personas, user flows, wireframes , and prototypes, and conduct user testing just like any other UX designer. However, the challenges of working with voice are quite different from graphical user interfaces. They create the complete script for voice interfaces while also thinking about how people may respond emotionally to the characteristics of different voices.

The voice interface market is on track to reach $24.9 billion by 2025, according to Rutgers Creative X . Clearly there’s a lot of opportunity in voice UX design. Some of those opportunities include:

Conversation flow

It is vital to create a conversation flow where users are given clear responses and instructions when needed. scripts for conversations should account for all possible user responses, including errors. innovative conversation flows are still being created and opportunities for more abound., personalization.

Getting to know the user and adapting the information to their wants and needs is important for many voice interfaces. In fact, the more the voice interface learns about its users, the more personalised the experience it can offer. Learning what to use to personalise a voice system is different for different systems and represents an opportunity for voice UX designers.

Adaptability

The voice interface should have the ability to adapt to different users and their different needs over time. For example, if a user uses school supplies for 9 months of the year but not over the summer, the system should learn when it can ask them about whether they have enough notebooks and when it can let that go. This is a rich area that’s full of opportunities.

[UX DESIGN FUNDAMENTALS COURSE]

There are also some big challenges for voice UX design that are further away from being solved. These include:

Lack of capabilities

Without visuals available, it can be difficult to tell the user all that the system is capable of. This can lead some users to stick with only the bare minimum of what the voice interface can do while others may try a lot of commands but never arrive at the system’s biggest strengths. This creates a big challenge for voice UX designers to convey the breadth of their systems’ capabilities.

Lack of trust

Error prevention is pretty much impossible on these systems but error correction should be easy. But it’s not. Voice interfaces hear things incorrectly and it’s hard or frustrating to fix them. Because of these errors, consumers have trouble trusting voice interfaces, and have limited desire to try new capabilities, leading to challenges.

Handling distractions

Voice interfaces are notoriously bad at handling distractions, yet in real life distractions happen all the time. For example, say you get distracted while you’re dictating a message to send. If you were writing on your laptop, you’d just handle the distraction and get back to your message. In the voice interface, though, it’s hard to say where they should pick up from. Should the system read everything and risk annoying the user? Or should the system give only the most important details and risk leaving out something important? These questions are areas of challenge for voice UX designers.

UX design for voice interfaces involves a few key steps, according to Mert Aktas at User Guiding .

Step 1: User research

Observe where the voice assistant can help the user and understand the interaction between the user and the voice interface at various stages of customer engagement. Understand the user persona, of course, but understand the voice persona, too. Is it friendly, businesslike, or funny?

Step 2: Create a competitive analysis

Find cases like the one you’re designing for and create a competitive analysis so you can understand how your competitors are designing voice interactions.

Step 3: Define conversation flows

A conversation flow is a script for a conversation between the user and voice assistant, considering all possible directions in which the user could take the conversation. You’ll start by defining user needs and pain points for different scenarios and then turn those scenarios into different conversation flows.

Conversation flows should start with keywords to begin the interaction, from “Hello” to “Hey Google.” All conversation should include sample dialogues and branches that indicate where the dialogue could go. 

When defining conversation flows, be sure to keep the interaction is conversational, create a strong error handling strategy, and confirm when the task is completed.

Step 4: User testing

Testing the conversation flows is essential as it will show how successful it is. Test it with one person playing the device and the other playing the user. It can be especially helpful when someone from the voice UX team plays the device and someone from outside the company plays the user.

In order to craft the best voice interface for whatever project you’re working on, it helps to keep several best practices in mind. According to Thomas Cree at UX Planet , these include:

  • Use natural language processing: Natural language processing (NLP) is vital to voice interfaces but it is also important to understand its limitations. NLP still has trouble with any language that is complex or nuanced. In order to avoid the problems that could arise from this, make sure the conversational flow is straightforward and easy to understand. If you do this, the experience will be more intuitive.
  • Design for context: Understanding the context of the interaction is key for the voice interface. Context can include time of day, user location, and previous interactions. For example, the time of day will dictate if the voice interface asks the user about whether they want to know about the weather or whether they’d like music while they eat dinner. At the same time though, a conversation like this will dictate that this is a voice assistant in the home. If the same questions came out of an office voice assistant, they wouldn’t make sense within the context.
  • Design for accessibility: Make sure the voice interface is accessible to everyone. This is important, especially for those with disabilities, but will impact those with accents or dialects, too. Making the voice interface accessible to those who are deaf, hard of hearing, or who have speech impediments will make the system that much better. This involves incorporating features like speech-to-text and text-to-speech as input methods and considering the usability of the design for non-native speakers or those with accents.

UX design for voice interfaces is a field that’s increasingly in demand. Understanding the opportunities and challenges available to voice UX designers will help them make the best choices for the devices they’re creating.

If you’re excited about UX design for voice interfaces and want to learn more about a career in UX, here are some things to explore: What is content design? , Pioneering UX in the enterpris e, and Top 15 UX influences you should be following in 2023 .

Subscribe to our newsletter

Get the best UX insights and career advice direct to your inbox each month.

Thanks for subscribing to our newsletter

You'll now get the best career advice, industry insights and UX community content, direct to your inbox every month.

Upcoming courses

Professional diploma in ux design.

Learn the full UX process, from research to design to prototyping.

Professional Certificate in UI Design

Master key concepts and techniques of UI design.

Certificate in Software and Coding Fundamentals for UX

Collaborate effectively with software developers.

Certificate in UX Design Fundamentals

Get a comprehensive introduction to UX design.

Professional Certificate in Content Design

Learn the skills you need to start a career in content design.

Professional Certificate in User Research

Master the research skills that make UX professionals so valuable.

Upcoming course

Build your UX career with a globally-recognised, industry-approved certification. Get the mindset, the skills and the confidence of UX designers.

You may also like

Illustration for the blog on usability testing

The ultimate guide to usability testing for UX in 2024

Illustration for the blog on beginner’s guide to typography design

A beginner’s guide to typography design

Illustration for the blog on How to conduct a heuristic evaluation in UX design: a step-by-step guide

How to conduct a heuristic evaluation in UX design: a step-by-step guide

Build your UX career with a globally recognised, industry-approved qualification. Get the mindset, the confidence and the skills that make UX designers so valuable.

  • Nick Babich & Gleb Kuznetsov
  • Feb 14, 2022

Everything You Want To Know About Creating Voice User Interfaces

  • 19 min read
  • UI , Voice , User Experience
  • Share on Twitter ,  LinkedIn

About The Authors

Gleb Kuznetsov has more than 15 years experience leading product, UI and UX design across web, mobile, and TV ecosystems. He has rightfully acquired a global … More about Nick & Gleb ↬

Email Newsletter

Weekly tips on front-end & UX . Trusted by 200,000+ folks.

Voice is a powerful tool that we can use to communicate with each other. Human conversations inspire product designers to create voice user interfaces (VUI), a next-generation of user interfaces that gives users the power to interact with machines using their natural language.

For a long time, the idea of controlling a machine by simply talking to it was the stuff of science fiction. Perhaps most famously, in 1968 Stanley Kubrick released a movie called 2001: A Space Odyssey , in which the central antagonist wasn’t a human. HAL 9000 was a sophisticated artificial intelligence controlled by voice.

Since then the progress in natural language processing and machine learning has helped product creators introduce less murderous voice user interfaces in various products — from mobile phones to smart home appliances and automobiles.

A Brief History Of Voice Interfaces

If we go back to the real world and analyze the evolution of VUI, it’s possible to define three generations of VUIs. The first generation of VUI is dated to the 1950s. In 1952, Bell Labs built a system called Audrey. The system derived its name from its ability to decode digits — Automatic Digit Recognition. Due to the tech limitations, the system could only recognize the spoken numbers of “0” through “9”. Yet, Audrey proved that VUIs could be built.

The second generation of VUIs dates to the 1980s and 1990s. It was the era of Interactive voice response (IVR). One of the first IVRs was developed in 1984 by Speechworks and Nuance, mainly for telephony, and they revolutionized the business. For the first time in history, a digital system could recognize human voice-over calls and perform the tasks given to them. It was possible to get the status of your flight, make a hotel booking, transfer money between accounts using nothing more than a regular landline phone and the human voice.

The third (and current) generation of VUIs started to get traction in the second decade of the 21st century. The critical difference between the 2nd and 3rd generations is that voice is being coupled with AI technology. Smart assistants like Apple Siri, Google Assistant, and Microsoft Cortana can understand what the user is saying and offer suitable options. This generation of VUIs is available in various types of products — from mobile phones to car human-machine interfaces (HMIs). They are fast becoming the norm.

Six Fundamental Properties Of VUI Design

Before we move to specific design recommendations, it’s essential to state the basic principles of good VUI design.

1. Voice-first Design

You need to design hands-free and eyes-free user interfaces. Even when a VUI device has a screen, we should always design for voice-first interactions. While the screen can complement the voice interaction, the user should be able to complete the operation with minimum or no look at the screen.

Of course, some tasks become inefficient or impossible to complete by voice alone. For example, having users listen and browse through search results by voice can be tedious. But you should avoid creating an action that relies on users interacting with a screen alone. If you design one of those tasks, you need to consider an experience where your users start with voice and then switch to a visual or touch interface.

2. Natural Conversation

The interaction with VUI shouldn’t feel like an interaction with a robot. The conversation flow should be user-centric (resembling natural human conversation). The user shouldn’t have to remember specific phrases to get the system to do what they want to do.

It’s important to use everyday language and invite users to say things in the ways they usually do. If you notice that you have to explain commands, it’s a clear indication that something is wrong with your design and you need to go back to the drawing board and redesign it.

3. Personalization

Personalization is more than just saying “Welcome back, %username%”. Personalization is about knowing genuine user needs and wants and adapting information to them. VUI gives product designers a unique opportunity to individualize the user’s entire interaction. The system should be able to recognize new and returning users, create user profiles and store the information the system collects in it. The more the system learns about users, the more personalized experience it should offer. Product designers need to decide what kinds of information to collect from users to personalize the experience.

4. Tone Of Voice

Voice is more than just a medium of interaction. In a few seconds, we listen to the other person’s voice; we create an impression on that person — a sense of gender, age, education, intelligence, trustworthiness, and many other characteristics. We do it intuitively, just by listening to a voice. That’s why it’s vital to give your VUI a personality — create the right brand persona that matches brand values. A good persona is specific enough to evoke a unique voice and personality.

5. Context Of Use

You need to understand where and how the voice-enabled product will be used. Will it be used by one person or shared between many people? In public or private areas? How noisy is the environment? The context of use will impact many product design decisions you will make.

6. Sense Of Trust

Trust is a foundational principle of good user experience — user engagement is built on a foundation of trust. Good interaction with the voice user interface should always lead to the buildup of trust.

Here are a few things product designers can do to achieve this goal:

  • Never share private data with anyone. Be careful to verbalize sensitive data such as medical data because users might not be alone.
  • Avoid offensive content. Introduce offensive or sensitive changes by age and region/country.
  • Try to avoid purely promotional content. Don’t mention products or brand names out of the context because users may perceive it as promotional content.

Design Recommendations

When it comes to designing VUI, it’s possible to define two major areas:

  • Conversational Design
  • Visual Design

1. Designing The Conversation

At first glance, the significant difference between GUI and VUI is the interaction medium. In GUI, we use a keyboard, mouse, or touch screen, while for VUI, we use voice. However, when we look closer, we will see that the fundamental difference between the two types of interfaces is an interaction model. With voice, users can simply ask for what they want instead of learning how to navigate through the app and learn its features. When we design for voice, we design conversational interactions.

Learn About Your Users

Conversations with a computer should not feel awkward. Users should be able to interact with a voice user interface as they would with another person. That’s why the process of conversation design should always start with learning about the users. You need to find answers to the following questions:

  • Who are your users? (Demographics, psychological portrait)
  • How are they familiar with voice-based interactions? Are they currently using voice products? (Level of tech expertise)

Understand Problem Space And Define Key Use Cases

When you know who your users are, you need to develop a deep understanding of user problems. What are their goals? Build empathy maps to identify users’ key pain points. As soon as you understand the problem space, it will be easier for you to anticipate features that users want and define specific use cases. (What can a user do with the voice system?)

Think about both the problem your user is trying to solve and how the voice user interface can help the user solve this problem. Here are a few questions that can help you with that:

  • What are the key user’s tasks? (Learn about user needs/wants.)
  • What situations trigger these tasks? (In what context users will interact with the system.)
  • How are users completing these tasks today? (What is the user journey?)

It’s also vital to ensure that a voice user interface is the right solution for the user problem. For example, voice UI might work well for the task of finding a nearby restaurant while you’re on the road, but it might feel clunky for tasks like browsing restaurant reviews.

Write Dialog Flow

At its core, conversation design is about the flow of the conversation. Dialog flow shouldn’t be an afterthought; instead, it should be the first thing you create because it will impact development.

Here are a few tips for creating a foundation for your dialog flow:

  • Start with a sample dialog that represents the happy path. The happy path is the simplest, easiest path to success a user could follow. Don’t try to make sample dialog perfect at this step.
  • Focus on the spoken conversation. Try to avoid situations when you write dialog differently than people speak it. It usually leads to well-structured but longer and more formal dialogs. When people want to solve a particular task, they are more to the point when they speak.
  • Read a sample dialog aloud to ensure that it sounds natural. Ideally, you should invite people who don’t belong to the design team and collect feedback.

The sample dialog will help you identify the context of the conversation (when, where, and how the user triggers the voice interface) and the common utterances and responses.

After you finish writing sample dialogs, the next thing to do is add various paths (consider how the system will respond in numerous situations, adding turns in conversations, etc.). It doesn’t mean that you need to account for all possible variations in dialogs. Consider the Pareto principle (80% of users will follow the most common 20% of possible paths in a discussion) and define the most likely logical paths a user can take.

It’s also recommended to recruit a conversation designer — a professional who can help you craft natural and intuitive conversations for users.

Design For Human Language

The more an interface leverages human conversation, the fewer users have to be taught how to use it. Invest in user research and learn the vocabulary of your real or potential users. Try to use the same phrases and sentences in the system’s response. It will create a more user-friendly conversation.

  • Don’t teach commands. Let users speak in their own words.
  • Avoid technical jargon. Let users interact with the system naturally using the phrases they prefer.

The UserAlways Starts The Conversation

No matter how sophisticated the voice-based system is, it should never start the conversation. It will be awkward if the system reaches the user with a topic they don’t want to discuss.

Avoid Long Responses

When you design system responses, always take a cognitive load into account. VUI users aren’t reading, they are listening, and the longer you make system responses, the more information they have to retain in their working memory. Some of this information might not be usable for the user, but there is no way to fast-forward responses to skip forward.

Make every word count and design for brief conversations. When you’re scripting out system responses, read them aloud. The length is probably good if you can say the words at a conversational pace with one breath. If you need to take an extra breath, rewrite the responses and reduce the length.

Minimize The Number Of Options In System Prompts

It’s also possible to minimize the cognitive load by reducing the number of options users hear. Ideally, when users ask for a recommendation, the system should offer the best possible option right away. If it’s impossible to do that, try to provide the three best possible options and verbalize the most relevant one first.

Provide Definitive Choices

Avoid open-ended questions in system responses. They can cause users to answer in ways that the system does not expect or support. For example, when you design an introduction prompt, instead of saying “Hello, its company ACME, what do you want to do?” you should say, “Hello, its company ACME, you can do [Option A], [Option B] or [Option C].”

Add Pauses Between The Question And Options

Pauses and punctuation mimic actual speech cadence, and they are beneficial for situations when the system asks a question and offers a few options to choose from.

Add a 500-millisecond pause after asking the question. This pause will give users enough time to comprehend the question.

Give Users Time To Think

When the system asks the user something, they might need to think about answering the question. The default timeout for users to respond to the request is 8-10 seconds. After that timeout, the system should repeat the request or re-prompt it. For example, suppose a user is booking a table at a restaurant. The sample dialog might sound like that:

User : “Assistant, I want to go to the restaurant.” System : “Where would you like to go?” (No response for 8 seconds) System : “I can book you a table in a restaurant. What restaurant would you like to visit?”

Prompt For More Information When Necessary

It’s pretty common for users to request something but not provide enough details. For example, when users ask the voice assistant to book a trip, they might say something like, “Assistant, book a trip to sea.” The user assumes that the system knows them and will offer the best possible option. When the system doesn’t have enough information about the use it should prompt for more information rather than offer an option that might not be relevant.

User : “I’d like to book a trip to the seashore.” System : “When would you like to go?”

Never Ask Rhetorical Or Open-ended Questions

By asking rhetorical or open-ended questions, you put a high cognitive load on users. Instead, ask direct questions. For example, instead of asking the user “What do you want to do with your invitation?” you should say “You can cancel your invitation or reschedule it. What works for you?”

Don’t Make People Wait In Silence

When people don’t hear/see any feedback from the system they might think that it’s not working. Sometimes the system needs more time to proceed with the user request, but it doesn’t mean that users should wait in absolute silence/without any visual feedback. At least, you should offer some audition signal and pair it with visual feedback.

Minimize User Data Entry

Try to reduce the number of cases when users have to provide phone numbers, street addresses, or alphanumeric passwords. It can be difficult for users to tell voice system strings of numbers or detailed information. This is especially true for users with speech impediments. Offer alternative methods for inputting this kind of information, such as using the companion mobile app.

Support Repeat

Whether users are using the system in a noisy area or they’re just having issues understanding the question, they should be able to ask the system to repeat the last prompt at any time.

Feature Discoverability

Feature discoverability can be a massive problem in voice-based interfaces. In GUI, you have a screen that you can use to showcase new features, while in voice user interfaces, you don’t have this option.

Here are two techniques you can use to improve discoverability:

  • Solid onboarding. A first-time user requires onboarding into the system to understand its capabilities. Make it practical — let users complete some actions using voice commands.
  • The first encounter with a particular voice app, you might want to discuss what is possible.

Confirm User Requests

People enjoy a sense of acknowledgment. Thus, let the user know that the system hears and understands them. It’s possible to define two types of confirmation — implicit and explicit confirmation.

Explicit confirmations are required for high-risk tasks such as money transfers. These confirmations require the user’s verbal approval to continue.

User : “Transfer one thousand dollars to Alice.” System : “You want to transfer one thousand dollars to Alice Young, correct?”

At the same time, not every action requires the user’s confirmation. For example, when a user asks to stop playing music, the system should end the playback without asking, “Do you want to stop the music?”

Handle Error Gracefully

It’s nearly impossible to avoid errors in voice interactions. Loosely handled error states might affect a user’s impression of the system. No matter what caused the error, it’s important to handle it with grace, meaning that the user should have a positive experience from using a system even when they face an error condition.

  • Minimize the number of “I don’t understand you” situations. Avoid error messages that only state that they didn’t understand the user correctly. Well-designed dialog flow should consider all possible dialog branches, including branches with incorrect user input.
  • Introduce a mechanism of contextual repairs. Help the system situation when something unexpected happens while the user is speaking. For example, the voice recognition system failed to hear the user due to the loud noise in the background.
  • Clearly say what the system cannot do. When users face error messages like “I cannot understand you” they start to think whether the system isn’t capable of doing something or they incorrectly verbalize the request. It’s recommended to provide an explicit response in situations when the system cannot do something. For example, “Sorry, I cannot do that. But I can help you with [option].”
  • Accept corrections. Sometimes users make corrections when they know that system got something wrong or when they decided to change their minds. When users want to correct their input, they will say something like “No,” or “I said,” followed by a valid utterance.

Test Your Dialogs

The sooner you start testing your conversation flow, the better. Ideally, start testing and iterating on your designs as soon as you have sample dialogs. Collecting feedback during the design process exposes usability issues and allows you to fix the design early.

The best way to test if your dialog works is to act it out. You can use techniques like Wizard of Oz , where one person pretends to be a system and the other is a user. As soon as you start practicing the script, you will notice whether it sounds good or bad when spoken aloud.

Remember, that you should prevent people from sharing non-verbal cues. When we interact with other people, we typically use non-verbal language (eye gaze, body language). Non-verbal cues are extremely valuable for conveying information, but unfortunately, VUIs systems cannot understand them. When testing your dialogs, try to sit test participants back to back to avoid eye contact.

The next part of testing is observing real user behavior. Ideally, you should observe users who use your product for the first time. It will help you understand what works and what doesn’t. Testing with 5 participants will help you reveal most of your usability issues.

2. Visual Design

A screen plays a secondary role in voice interactions. Yet, it’s vital to consider a visual aspect of user interaction because high-quality visual experiences create better impressions on users. Plus, visuals are good for some particular tasks such as scanning and comparing search results. The ultimate goal is to design a more delightful and engaging multimodal experience.

Design For Smaller Screens First

When adapting content across screens, start with the smallest screen size first. It will help you prioritize what the most important content is.

When targeting devices with larger screens, don’t just scale the content up. Try to take full advantage of the additional screen real estate. Put attention on the quality of images and videos — imagery shouldn’t lose its quality as they scale up.

Optimize Content For Fast Scanning

As was mentioned before, screens are very handy for cases when you need to provide a few options to compare. Among all content containers, you can use, cards are the one that works the best for fast scanning. When you need to provide a list of options to choose from, you can put each option on the card.

Design With A Specific Viewing Distance In Mind

Design content so it can be viewed from a distance. The viewing range of small screen voice-enabled devices should be between 1-2 meters, while for large screens such as TVs, it should be 3 meters. You need to ensure that font size and the size of imagery and UI elements that you will show on the screen are comfortable for users.

Google recommends using a minimum font size of 32 pt for primary text, like titles, and a minimum of 24pt for secondary text, like descriptions or paragraphs of text.

Learn User Expectations About Particular Device

Voice-enabled devices can range from in-vehicle to TV devices. Each device mode has its own context of use and set of user expectations. For example, home hubs are typically used for music, communications, and entertainment, while in-car systems are typically used for navigation purposes.

Further Reading : Designing Human-Machine Interfaces For Vehicles Of The Future

Hierarchy Of Information On Screens

When we design website pages, we typically start with page structure. A similar approach should be followed when designing for VUI — decide where each element should be located. The hierarchy of information should go from most to least important. Try to minimize the information you display on the screen — only required information that helps users do what they want to do.

Keep The Visual And Voice In Sync

There shouldn’t be a significant delay between voice and visual elements. The graphical interface should be truly responsive — right after the user hears the voice prompt; the interface should be refreshed with relevant information.

Motion language plays a significant part in how users comprehend information. It’s essential to avoid hard cuts and use smooth transitions between individual states. When users are speaking, we should also provide visual feedback that acknowledges that the system is listening to the user.

Accessible Design

A well-designed product is inclusive and universally accessible. Visual impairment users (people with disabilities such as blindness, low vision, and color blindness) shouldn’t have any problems interacting with your product. To make your design accessible, follow WCAG guidelines .

  • Ensure that text on the screen is legible. Ensure your text has a high enough contrast ratio. The text color and contrast meet AAA ratios.
  • Users who rely on screen readers should understand what is displayed on the screens. Add descriptions to imagery.
  • Don’t design screen elements that flicker, flash, or blink. Generally, everything that flashes more than three flashes per second can cause users with motion sickness headaches.

Related Reading : How A Screen Reader User Accesses The Web

We are at the dawn of the next digital revolution. The next generation of computers will give users a unique opportunity to interact with voice. But the foundation for this generation is created today. It’s up to designers to develop systems that will be natural for users.

Recommended Related Reading

  • “ Alexa Design Guide ,” Amazon Developer Documentation
  • “ Conversation Design Process ,” Google Assistant Docs
  • “ Designing Voice User Interfaces: Principles Of Conversational Experiences ,” Cathy Pearl (2017)
  • “ Applying Built-In Hacks Of Conversation To Your Voice UI ,” James Giangola (video)
  • “ Creating A Persona: What Does Your Product Sound Like? ,” Wally Brill (video)
  • “ Voice Principles ,” a collection of resources created by Clearleft.

Smashing Newsletter

Tips on front-end & UX, delivered weekly in your inbox. Just the things you can actually use.

Front-End & UX Workshops, Online

With practical takeaways, live sessions, video recordings and a friendly Q&A.

TypeScript in 50 Lessons

Everything TypeScript, with code walkthroughs and examples. And other printed books.

Share this with your social Community

img

Short Details

Posted July 31, 2023

  • App Development Company
  • User Experience
  • AR and VR Technologies
  • Web Development Companies

img

Voice User Interface (VUI): Designing for Voice-Enabled Web Experiences

  • July 31, 2023   5 Mins Read

image

  • July 31, 2023 by Mansi Garg

Imagine a world where you can speak your thoughts and desires, and the digital realm responds promptly, seamlessly integrating into your daily life.

Whether you want to search for information, control smart home devices, order groceries, or even book a ride, the voice is becoming the conduit that effortlessly bridges the gap between humans and machines.

Designing for voice-enabled web experiences requires a deep understanding of how humans communicate, think, and interact. It involves crafting user interfaces that are not limited to visual elements but extend to the power of speech.

In this blog, we will explore the intricacies of VUI design and dive into the best practices, techniques, and considerations that go into creating exceptional voice-enabled web experiences.

Table of Contents

  • 1. Overview of Voice-Enabled Web Experiences
  • 2. Importance of Voice-Enabled Web Experience
  • 3. Benefits of VUI Design
  • 4. Challenges of VUI Design
  • 5. Key Concepts in VUI Design
  • 6. Designing for Voice-Enabled Web Experiences
  • 7. Technical Considerations for VUI Design
  • 8. Future Trends in VUI Design
  • 9. Conclusion
  • 10. FAQs  

Overview of Voice-Enabled Web Experiences

Voice-Enabled Web Experiences refer to the use of voice as a primary interface for interacting with web-based applications and services. It allows users to interact with websites, web applications, and other online platforms using natural language and voice commands instead of traditional text-based input. With the proliferation of voice assistants and smart speakers, voice-enabled interactions have become increasingly popular and are transforming the way users engage with technology.

Importance of Voice-Enabled Web Experience

  • Accessibility: Voice-Enabled Web Experiences make technology more accessible to a wider range of users, including those with visual impairments, physical disabilities, or individuals who struggle with traditional keyboard input. It enables a more inclusive experience for all users.
  • Convenience and Efficiency: Voice interactions can be faster and more convenient than typing, especially in scenarios where users are multitasking or have limited mobility. Voice commands enable users to perform tasks hands-free and get immediate responses.
  • Natural and Intuitive: Voice is a natural mode of communication for humans, and voice-enabled experiences provide a more intuitive and user-friendly interface. It reduces the learning curve associated with complex user interfaces and enables a more natural interaction paradigm.
  • Contextual and Personalized: Voice assistants can leverage user data and context to provide personalized experiences. By analyzing user preferences, history, and contextual information, voice-enabled systems can deliver customized recommendations, content, and services.
  • Multimodal Experience: Voice-enabled interfaces can be combined with other modalities, such as visual displays or haptic feedback, to create a rich and immersive user experience. This allows for more flexible and adaptable interactions in different environments and across various devices.

Read More :- Guide to Design Mobile App in 2022 | Everything You Need to Know

Benefits of VUI Design

  • Enhanced User Experience: Voice interactions provide a more intuitive, natural, and conversational interface, resulting in a better user experience . It can simplify complex tasks, reduce cognitive load, and make interactions more engaging and interactive.
  • Increased Accessibility: Voice-enabled interfaces make technology accessible to a wider audience, including individuals with disabilities or those who have difficulty with traditional input methods.
  • Hands-free and Multitasking: Voice commands allow users to perform tasks and access information without using their hands, enabling multitasking and convenience in various scenarios, such as cooking, driving, or exercising.
  • Personalization and Contextualization: VUI design can leverage user data and contextual information to deliver personalized experiences, recommendations, and content tailored to the user's preferences and needs.

Challenges of VUI Design

Key concepts in vui design.

  • Natural Language Understanding (NLU): NLU involves the ability of a voice-enabled system to accurately understand and interpret user input, including the intent and context behind the spoken commands or queries.
  • Dialog Management: Dialog management focuses on designing conversational flows and interactions between the user and the voice-enabled system. It involves handling turn-taking, managing context, and guiding the user through the conversation.
  • Voice User Interface (VUI) Prototyping: VUI prototyping involves creating interactive prototypes of voice-enabled systems to test and refine the user experience. It helps designers and developers visualize and iterate on the voice interactions before implementation.
  • Persona and Voice Tone: VUI design often involves defining the persona and voice tone of the voice-enabled system. The persona represents the system's personality and characteristics, while the voice tone reflects the style and manner of communication.
  • Error Handling and Recovery: Effective error handling is crucial in VUI design to guide users when they make mistakes or encounter errors. Designers need to anticipate potential errors, provide clear error messages, and offer suggestions for recovery.
  • Multimodal Design: Multimodal design involves integrating voice interactions with other modalities, such as visual displays or tactile feedback, to create a cohesive and seamless user experience. It requires careful consideration of how different modalities complement each other.

Designing for Voice-Enabled Web Experiences

A. User-Centered Design Approach

  • User Research and Personas: User research involves gathering insights about the target users, their needs, preferences, and pain points when interacting with voice-enabled web experiences. Personas are fictional representations of different user types that help designers empathize with and design for specific user groups.
  • User Journey Mapping: User journey mapping visualizes the user's end-to-end experience with the voice-enabled web experience, identifying touchpoints, pain points, and opportunities for improvement. It helps designers understand the user's goals, context, and interactions at each stage.
  • Voice User Flows: Voice user flows outline the sequence of steps and interactions between the user and the voice-enabled system. It focuses on designing the conversational flow, including prompts, user responses, system actions, and error handling.

B. Content Design and Information Architecture

  • Adapting Content for Voice Interaction: Content designed for voice interactions should be concise, conversational, and easily spoken aloud. It requires adapting written content for spoken delivery, considering factors like natural language, pacing, and readability for voice output.
  • Structuring Conversational Content: Conversational content should be structured logically, using a hierarchical approach to ensure clarity and ease of understanding. Breaking down complex information into smaller, contextually relevant chunks enhances the user's comprehension and reduces cognitive load.
  • Navigation and Command Design: Voice interfaces require intuitive and user-friendly navigation and command design. Designers should create clear and intuitive navigation structures and define easy-to-understand voice commands to enable users to access specific features or content.

C. Voice Interaction Design Principles

  • Clear and Concise Dialogue: Voice interactions should be designed to be clear, concise, and easy to understand. Avoiding jargon, using simple language, and providing relevant information help users comprehend and engage with the voice-enabled system effectively.
  • Error Handling and Recovery: Effective error handling is crucial in voice interactions. Designers should anticipate potential errors and provide clear error messages, prompts for correction, and suggestions for alternative actions to help users recover from mistakes.
  • Feedback and Confirmation: Providing feedback and confirmation during voice interactions is essential to reassure users and maintain their trust. Auditory cues, such as voice prompts or confirmation sounds, can indicate system responsiveness and validate user actions.
  • Personalization and Contextualization: Designing for personalization and contextualization involves leveraging user data and preferences to deliver customized experiences. Tailoring the content, recommendations, and responses based on the user's history and context enhances the overall user experience.

D. Multimodal Experiences

  • Combining Voice with Visual Elements: Multimodal experiences integrate voice interactions with visual elements, such as text, images, or graphical interfaces. Designers should ensure visual elements complement and enhance the voice interactions, providing additional context, feedback, or supplementary information.
  • Adapting for Different Devices and Platforms: Voice-enabled web experiences should be designed to work seamlessly across different devices and platforms. Consideration should be given to varying screen sizes, input capabilities, and interaction patterns to provide a consistent and optimized experience.
  • Handling Interruptions and Transitions: Voice interactions should be designed to handle interruptions or transitions smoothly. Users may pause, switch devices, or receive external notifications during an interaction. Designers should consider how the voice-enabled system handles these interruptions and seamlessly resumes the interaction. x

Technical Considerations for VUI Design

  • Speech Recognition and NLP Technologies: VUI design relies on speech recognition and Natural Language Processing (NLP) technologies to accurately interpret user input. Designers should understand the capabilities and limitations of these technologies to optimize the design and ensure reliable voice recognition.
  • Integrating Voice Assistants and APIs: Voice-enabled web experiences often integrate with voice assistant platforms and APIs. Designers should be familiar with the integration requirements and guidelines to create seamless and interoperable experiences.
  • Accessibility and Inclusive Design: Ensuring accessibility and inclusive design is crucial for voice-enabled web experiences. Designers should consider accessibility standards, provide alternative input methods, and ensure that voice interactions are accessible to users with disabilities.
  • Performance Optimization for Voice Interfaces: Optimizing the performance of voice interfaces is essential for a smooth user experience. Designers should consider factors like response time, latency, and system efficiency to minimize delays and ensure real-time interactions.
  • Privacy and Security Considerations: Voice interactions involve processing sensitive user data, so privacy and security should be prioritized. Designers should follow best practices for data protection, secure communication channels, and obtain user consent for data collection and usage.

Future Trends in VUI Design

Voice-Enabled Web Experiences and VUI design are continuously evolving fields. Here are some future trends to watch for:

  • Emotion Detection and Sentiment Analysis: Voice-enabled systems may incorporate emotion detection and sentiment analysis to better understand user emotions and tailor responses accordingly. This could enable more personalized and empathetic interactions.
  • Improved Natural Language Understanding (NLU): NLU technology is expected to advance, allowing voice-enabled systems to better understand complex queries, handle ambiguity, and provide more accurate responses.
  • Multi-turn Conversations: Voice interactions may evolve to support more complex multi-turn conversations. This would enable users to have extended dialogues and perform intricate tasks with the voice-enabled system.
  • Integration with Augmented Reality (AR) and Virtual Reality (VR): Voice interactions can be combined with AR and VR technologies to create immersive and interactive experiences. Users may navigate virtual environments and interact with objects using voice commands.
  • Integration with the Internet of Things (IoT): Voice-enabled systems can integrate with IoT devices, allowing users to control and interact with their smart homes, appliances, and other connected devices using voice commands.
  • Voice Commerce: Voice-enabled systems are likely to play a significant role in voice commerce, enabling users to make purchases, place orders, and conduct transactions using voice commands.

Voice-enabled web experiences and VUI design offer tremendous potential for web development companies to create innovative and engaging user interfaces. This can help businesses drive growth and enhance user satisfaction. With a focus on user-centric design, continuous learning, and strategic partnerships, you can unlock the full potential of this exciting technology and drive business growth for your app development company .

What is an example of a voice user interface VUI?

An example of a voice user interface (VUI) is Amazon's voice assistant, Alexa, which allows users to interact with various smart devices using voice commands and natural language.

What is the goal of building system personas for voice user interface design?

The purpose of creating system personas for VUI design is to understand the target users and their characteristics, preferences, and needs. Personas help designers empathize with users, tailor voice interactions to their requirements, and create more personalized and effective experiences.

Which utilities are used in the design of VUI?

Some tools used in the design of voice user interfaces include Dialogflow, Alexa Skills Kit, Bixby Developer Studio, and Watson Assistant. These tools provide platforms and frameworks for designing, building, and testing voice interactions and integrations with voice assistant platforms.

About Author

image

An extensive background working in Tech, Travel, and Education Industries. Currently involved in entire business operations process: Benefits strategy and implementation, systems integration, Human Resource handling, Outsourcing engagement & strategizing the company architecture. Learning different stages of the business cycle. Coached leaders in various areas, including - employee relations, complaints, and response management.

Share this on:

You May Also Like

Imagine a world where you can speak your thoughts and desires, and the digital realm responds promptly, seamlessly integrating into your daily life. Whether you want to search for information, contro

Mansi Garg | July 31, 2023

image

How to Build a Fintech App - Everything You Should Know!

With the advent of technology, the financial industry has experienced a massive transformation in the past few years. Fintech applications have revolutionized the way we manage and invest our money.

image

Sakshi Aggarwal | March 13, 2023

image

How to Create a Payment Gateway? Everything You Should Know

Are you aware that the world is going through a significant shift in the way we make payments? According to a recent report by Deloitte, the total value of digital payments worldwide is estimated to r

image

Mansi Garg | April 14, 2023

image

Revealing Starbucks Marketing Strategy and What You Can Learn From It?

We all know that Starbucks is one of the most recognized coffee chains worldwide. It has over 24,000 stores in more than 70 countries. But why is Starbucks successful, and what marketing techniques do

image

Mansi Garg | June 28, 2022

image

How to Build HIPAA Compliant Healthcare Apps: Everything You Should Know!

If you’re in the healthcare industry, then you know that data privacy and security are of utmost importance. In order to protect patients’ information, the Health Insurance Portability and

image

Sakshi Aggarwal | November 22, 2022

image

Top Ways to Use Augmented Reality in Mobile Applications in 2022

Augmented Reality and Virtual Reality are the two leading buzzwords in the technology era. What began as a completely new, significantly different technology has rapidly revolutionized into something

image

Mansi Garg | August 26, 2022

image

Healthcare Mobile App Development: A Complete Tutorial

The healthcare industry is one of the most rapidly changing and growing industries worldwide. Mobile devices and apps have drastically changed how providers and patients interact and communicate.So, i

image

Mansi Garg | July 06, 2022

image

Unveiling the Types of Blockchain Projects Reigning the Decentralized Economy

As blockchain technology continues to evolve, so too does the landscape of projects built on its foundation. The worldwide Blockchain market is predicted to expand at a CAGR of 42.8% (2018-2023), dire

image

Mansi Garg | October 31, 2022

image

Healthcare App Ideas that Will Help You Succeed in 2022 and Beyond

As we head into the future, more and more people are looking to find ways to improve their healthcare. And with good reason - healthcare can be expensive, and it can be difficult to get the right care

image

Sakshi Aggarwal | July 26, 2022

image

Blockchain and Web Development: The Intersection of Security and Trust

Blockchain technology and web development are two powerful innovations that have the potential to transform our world. While they may appear distinct, they share similarities and can work together to

image

Sakshi Aggarwal | July 24, 2023

image

What Is DeFi? Guide to Decentralized Finance

Decentralized Finance (DeFi) is a modern and evolving region of finance that is less centralized and more open to innovation and collaboration. DeFi enthusiasts laud its prospect of disrupting convent

image

Mansi Garg | September 23, 2022

image

Flutter vs. React Native: Which One is Better for Mobile App Development?

The two hottest frameworks in the mobile app development world are Flutter and React Native. They’re both cross-platform solutions that allow you to write code once and deploy it to Android and

image

Sakshi Aggarwal | May 18, 2022

image

A Guide to Blockchain App Development

Blockchain technology has been a hot topic recently due to its potential to revolutionize various industries. Blockchain is a distributed ledger technology that ensures transparency, security, and dec

image

Sakshi Aggarwal | March 24, 2023

image

Benefits of NFT Gamification To Transform Gaming Industry

In recent years, the gaming industry has seen a surge in popularity, with many gamers turning to online gaming platforms and console games in order to escape reality. With so many people playing video

image

Sakshi Aggarwal | September 19, 2022

image

iOS App Development Guide for Business

Businesses these days are looking to have an edge over their competition by having a strong online presence. A website is not enough anymore, and many companies are turning to mobile apps as a way to

image

Mansi Garg | November 18, 2022

image

Unveiling Top App Prototyping Platforms that You Must Use in 2022

When it comes to mobile app development, one of the most important things you need to consider is the prototyping process. This will allow you to create a working model of your app so that you can tes

image

Sakshi Aggarwal | March 29, 2022

image

How to Develop a Video Conferencing App like Zoom?

Depending on what niche you’re in, video chat apps are becoming increasingly common in the world of business and technology. Whether it’s a small startup company or a multinational corpora

image

Mansi Garg | April 06, 2022

image

Augmented Reality Trends of 2023: New Breakthrough in Alluring Technology

Technology has come a long way in the past decade, and augmented reality (AR) is one of the most exciting development fields. AR technology superimposes digital content into the real world, creating a

image

Sakshi Aggarwal | April 19, 2023

image

How to Launch a Metaverse Casino Game Like Blackjack 21- A Detailed Guide

The world of gaming is rapidly evolving, and the latest buzzword is "metaverse." The term refers to a virtual world where users can interact with each other and digital objects in real time, using imm

image

Mansi Garg | March 28, 2023

image

Create a DeFi Staking Platform in 2023: A Complete Tutorial

DeFi is a new kind of investment that’s taking the world by storm. So what is it? Essentially, DeFi is a digital asset class that allows you to invest in cryptocurrencies and other digital asset

image

Mansi Garg | February 17, 2023

image

How to Build a Taxi App Like Uber or Careem? Everything You Should Know!

Gone are the days when people used to wave down a taxi on the street or wait for one at the airport. With the advent of technology, people can now book a taxi with just a few taps on their smartphones

image

Mansi Garg | December 20, 2022

image

A Guide on How to Hire a Team of Remote Developers

Hiring a team of remote developers can be a daunting task, but it doesn't have to be. With a little bit of planning and the right approach, you can find the perfect candidates to build your dream prod

image

Sakshi Aggarwal | November 04, 2022

image

Top New Blockchain Technology Trends to Follow in 2023

Picture this - a world where business transactions are seamless, secure, and transparent. This might have seemed like a distant dream before the advent of cryptocurrencies and blockchain technology, b

image

Sakshi Aggarwal | May 05, 2023

image

Top 7 Blockchain App Development Ideas to Boost up Business

Want to establish a new business or improve an existing one? You should consider using blockchain technology Being a distributed database, Blockchain allows for secure online transactions. This techn

image

Sakshi Aggarwal | September 30, 2022

image

Real Estate Mobile App Development: A Complete Guide

The world is digitizing at a very rapid pace, and in such a scenario, real estate businesses must also go digital to stay ahead of the competition. One of the best ways to digitize your business is de

image

Sakshi Aggarwal | June 02, 2022

image

Everything You Need to Know About Android 13 Beta 4

Google released Android 13 beta 4 to the public, and with it comes a slew of new features and updates. In this article, we'll walk you through everything you need to know about the latest version of A

image

Mansi Garg | July 27, 2022

image

How to Build a Food Delivery App like Glovo? Everything you should know!

Have you ever found yourself in a situation where you desperately needed a product or service but didn't have the time or energy to go out and get it? Well, fear no more because on-demand delivery app

image

Sakshi Aggarwal | April 27, 2023

image

How Can Wearables Influence Mobile App Development Future?

In the last few years, wearables have become increasingly popular. Fitness trackers, smartwatches, and even smart glasses are becoming more and more commonplace. And as the technology improves and bec

image

Sakshi Aggarwal | June 06, 2022

image

5G Launch Will Revolutionize the Mobile Industry: Know How!

Prime Minister Narendra Modi eventually launched 5G in India at the 6th edition of the IMC (India Mobile Congress). Reliance Jio and other telecom organizations documented the various use cases of 5G

image

Mansi Garg | October 03, 2022

image

WWDC 2023 Overview: A Glimpse into Apple's Future

An extensive background working in Tech, Travel, and Education Industries. Currently involved in entire business operations process: Benefits strategy and implementation, systems integration, Human Re

image

Mansi Garg | June 09, 2023

image

How is Metaverse Impacting the Future of eCommerce?

Lately, the tech world has been abuzz with talk of the Metaverse, a groundbreaking concept that promises a shared virtual space where people can interact and engage with one another. This futuristic i

image

Sakshi Aggarwal | June 27, 2023

image

Tech Trends Transforming the Business Landscape

“Our future world will have to find equilibrium in the technology pendulum swing.”-Stephane Nappo Technology has always been a major driving force behind changes in the business landscape

image

Sakshi Aggarwal | July 11, 2022

image

9 Things Users Want from a Mobile App

The mobile app market has grown to a staggering size, with over 1.8 million apps available in the Google Play Store and Apple App Store combined. Mobile apps have become a necessity for peop

image

Mansi Garg | January 18, 2022

image

Digital Transformation With AI. Is your Organization Ready?

Do you know what digital transformation with AI is and how it can impact your business? Organizations today are under pressure to digitally transform to stay competitive. This digital transformation

image

Mansi Garg | April 28, 2022

image

How Much Does It Cost To Build A Streaming Service – A Step By Step Guide

The rise of online video streaming services has revolutionized the entertainment industry, prompting businesses worldwide to explore the possibility of launching their own platforms. With giants like

image

Mansi Garg | May 25, 2023

image

How is Metaverse Presenting New Possibilities for Healthcare Businesses?

In the augmented reality world, the metaverse concept has gained significant attention. It represents the convergence of physical and virtual spaces, providing users with a platform to interact and en

image

Sakshi Aggarwal | June 21, 2023

image

Node.JS 19 is Available! Here is What You Need to Know

The launch of Node.js 19 is now available! It substitutes Node.js 18 as the current launch line, with Node.js 18 being encouraged to long-term support (LTS) next week. What do these two launches mean

image

Sakshi Aggarwal | November 07, 2022

image

Legacy Application Modernisation: What It Is, Why You Need It, And How To Do It?

Mobile app development is quickly becoming a necessity for businesses. As the world becomes increasingly digital, companies of all sizes rely on mobile apps to reach customers and increase customer en

image

Sakshi Aggarwal | February 28, 2023

image

Dubai Rest App: Your All-in-One Solution for Real Estate Services

Did you know that Dubai's prime residential market is projected to experience the world's strongest growth in 2023? The Middle East is buzzing with opportunities, especially in the realm of mobile app

image

Mansi Garg | May 31, 2023

image

Latest Technology Trends That Trigger Android App Development Process

Do you run your own business and want to build an Android app? If yes, you must know about the latest technology trends playing a significant role in the android app development process. Technology i

image

Sakshi Aggarwal | May 10, 2022

image

How To Create A Money Transfer App? Everything You Should Know

Nowadays, the financial industry has encountered massive digitization, and mobile apps play a significant role in it. There are a wide variety of money transfer apps available, catering to the needs a

image

Mansi Garg | May 06, 2022

image

iBeacon App Development - Key Challenges And Amazing Tips

Technology is proliferating with the advancement of time. The introduction of smartphones has changed the whole scenario right from communication to shopping. Users are now able to shop without moving

image

Mansi Garg | January 19, 2022

image

How to Build an International Money Transfer App?

The introduction of online payment applications has changed how people perform financial transactions. A mobile phone with a banking app lets you quickly resolve various financial matters.

image

Sakshi Aggarwal | August 09, 2022

image

Top 11 Advantages Of iOS Application Development For Your Business

Mobile applications play a vital role in the development of multiple businesses in this digital world. Most companies are investing in iOS app development to strengthen their market appearan

image

Sakshi Aggarwal | December 24, 2021

image

Guide to Design Mobile App in 2022 | Everything You Need to Know

Are you looking to design a mobile app in 2022? Well, mobile application development is an ever-changing field, and it can be hard to keep up with the latest trends and best practices. But w

image

Sakshi Aggarwal | April 14, 2022

image

What are the Benefits of Using Digital Technology in Mobile App Development

With the ubiquity of smartphones and tablets, it only makes sense that mobile app development - which is the process of creating applications for smartphones and tablet devices - is becoming more popu

image

Mansi Garg | September 01, 2022

image

Why Does Your Business Need a Food Delivery App and How to Get Started?

Businesses after COVID are going through several changes, and the food industry is no different. Restaurants that have been doing dine-in are now struggling to keep up with the demand for delivery and

image

Sakshi Aggarwal | December 06, 2022

image

What Makes Flutter Ideal For Cross-Platform App Development? - A Complete Guide

Table of Contents   1. What is Flutter? 2. Why Choose Cross-Platform Development? 3. Why is Flutter the Best Platform to Make Cross-platform Applications? 4. How Much Does it Cost to

image

Sakshi Aggarwal | April 25, 2022

image

The Rise of Progressive Web Apps (PWA): Enhancing User Experiences

In today's digital world, businesses must keep up with ever-increasing consumer expectations and find new ways to engage their audience. That's where Progressive Web Apps (PWAs) come in. PWAs are a r

image

The Ultimate Guide to MVP Development

As the world of startups becomes increasingly competitive, building an MVP is crucial for entrepreneurs looking to test their ideas and launch successful businesses. By creating a minimum viable prod

image

Mansi Garg | May 19, 2023

image

How to Make Learning Engaging through Gaming Technology?

What if you can combine the fun of gaming and the educational value of learning? This is an idea that has been gaining popularity in recent years as technology advances. There are several challenges

image

Mansi Garg | February 02, 2023

image

Find Out How Our Clients Reviewed Us On Clutch

It's no secret that the digital world has transformed many aspects of our lives, and it is only going to continue changing in ways we can't even imagine yet. To help businesses keep up with this rapid

image

How To Design A Social Media App and Features that Make it Popular?

Social media apps are all the rage these days. People use them to connect with friends and family, to learn about new products and services, and to stay up-to-date on the latest news. But as popular a

image

Sakshi Aggarwal | January 03, 2023

image

A Complete Guide to Implement AI and Machine Learning in Your Existing Application

When it comes to developing an app, there's a lot to consider. Not only do you need to create a user-friendly interface and design, but you also need to make sure your app is able to meet the demands

image

Mansi Garg | November 11, 2022

image

ChatGPT: Unlocking A New Wave Of AI-Powered Conversational Experiences

Table of Contents 1. What is ChatGPT? 2. What Are the Top Benefits of ChatGPT? 3. How Does ChatGPT Work? 4. Challenges With ChatGPT 5. ChatGPT and the Future of AI 6. Final Thoug

image

Mansi Garg | January 11, 2023

image

Benefits and Use of Blockchain Technology in the Banking Industry

Picture this: a world where traditional banking transforms into a cutting-edge, efficient, and transparent system that leaves everyone in awe. Blockchain, often met with skepticism and uncertainty, is

image

Mansi Garg | June 29, 2023

image

A Step-By-Step Guide To Creating Your First Game App

The world has dramatically changed when it comes to the technology and tools available to developers. Earlier, when creating an app for a single platform was the norm, developers now have access to mu

image

Sakshi Aggarwal | January 19, 2023

image

Steps to Keep in Mind To Build Your Next Gaming App With Zero Coding!

Do you want to build a simple app for your business? Do you want to create an app that enhances the experience of users who play games on their smartphones? Whatever your reason, I have created this g

image

Mansi Garg | February 14, 2022

Leave a Reply

Do you have any project, let's talk about business solutions with us.

icon

India Address

57A, 4th Floor, E Block, Sector 63, Noida, Uttar Pradesh 201301

+91 853 500 8008, [email protected].

USA Address

4400 E Royal Lane Building 3 Suite 290, Irving, Texas 75039, United States

+1-6812307744, 401 park avenue south, 10th floor, new york, ny 10016.

Dubai Address

Sharjah Media City, Sharjah, Dubai, UAE, Dubai, 500001, AE

Kuwait Address

Block 3 Street, Ali Fahad Aldewailah, Building No: 4902 3rd Floor Kuwait 13126, Kuwait, 13126, SA

image

GET IN TOUCH

Hey we are looking forward to start a project with you, chat with us, do you have any question, let’s talk about business solutions with us.

icon

Thanks for reaching out

Someone will get back to you soon.

  • Reviews / Why join our community?
  • For companies
  • Frequently asked questions

Voice Interaction

Literature on voice interaction.

Here’s the entire UX literature on Voice Interaction by the Interaction Design Foundation, collated in one place:

Learn more about Voice Interaction

Take a deep dive into Voice Interaction with our course AI for Designers .

In an era where technology is rapidly reshaping the way we interact with the world, understanding the intricacies of AI is not just a skill, but a necessity for designers . The AI for Designers course delves into the heart of this game-changing field, empowering you to navigate the complexities of designing in the age of AI. Why is this knowledge vital? AI is not just a tool; it's a paradigm shift, revolutionizing the design landscape. As a designer, make sure that you not only keep pace with the ever-evolving tech landscape but also lead the way in creating user experiences that are intuitive, intelligent, and ethical.

AI for Designers is taught by Ioana Teleanu, a seasoned AI Product Designer and Design Educator who has established a community of over 250,000 UX enthusiasts through her social channel UX Goodies. She imparts her extensive expertise to this course from her experience at renowned companies like UiPath and ING Bank, and now works on pioneering AI projects at Miro.

In this course, you’ll explore how to work with AI in harmony and incorporate it into your design process to elevate your career to new heights. Welcome to a course that doesn’t just teach design; it shapes the future of design innovation.

In lesson 1, you’ll explore AI's significance, understand key terms like Machine Learning, Deep Learning, and Generative AI, discover AI's impact on design, and master the art of creating effective text prompts for design.

In lesson 2, you’ll learn how to enhance your design workflow using AI tools for UX research, including market analysis, persona interviews, and data processing. You’ll dive into problem-solving with AI, mastering problem definition and production ideation.

In lesson 3, you’ll discover how to incorporate AI tools for prototyping, wireframing, visual design, and UX writing into your design process. You’ll learn how AI can assist to evaluate your designs and automate tasks, and ensure your product is launch-ready.

In lesson 4, you’ll explore the designer's role in AI-driven solutions, how to address challenges, analyze concerns, and deliver ethical solutions for real-world design applications.

Throughout the course, you'll receive practical tips for real-life projects. In the Build Your Portfolio exercises, you’ll practise how to  integrate AI tools into your workflow and design for AI products, enabling you to create a compelling portfolio case study to attract potential employers or collaborators.

All open-source articles on Voice Interaction

Natural user interfaces – what does it mean & how to design user interfaces that feel naturaly.

user research voice interaction

  • 3 years ago

How to Design Voice User Interfaces

user research voice interaction

How to Use Voice Interaction in Augmented Reality

user research voice interaction

  • 5 years ago

Open Access—Link to us!

We believe in Open Access and the  democratization of knowledge . Unfortunately, world-class educational materials such as this page are normally hidden behind paywalls or in expensive textbooks.

If you want this to change , cite this page , link to us, or join us to help us democratize design knowledge !

Privacy Settings

Our digital services use necessary tracking technologies, including third-party cookies, for security, functionality, and to uphold user rights. Optional cookies offer enhanced features, and analytics.

Experience the full potential of our site that remembers your preferences and supports secure sign-in.

Governs the storage of data necessary for maintaining website security, user authentication, and fraud prevention mechanisms.

Enhanced Functionality

Saves your settings and preferences, like your location, for a more personalized experience.

Referral Program

We use cookies to enable our referral program, giving you and your friends discounts.

Error Reporting

We share user ID with Bugsnag and NewRelic to help us track errors and fix issues.

Optimize your experience by allowing us to monitor site usage. You’ll enjoy a smoother, more personalized journey without compromising your privacy.

Analytics Storage

Collects anonymous data on how you navigate and interact, helping us make informed improvements.

Differentiates real visitors from automated bots, ensuring accurate usage data and improving your website experience.

Lets us tailor your digital ads to match your interests, making them more relevant and useful to you.

Advertising Storage

Stores information for better-targeted advertising, enhancing your online ad experience.

Personalization Storage

Permits storing data to personalize content and ads across Google services based on user behavior, enhancing overall user experience.

Advertising Personalization

Allows for content and ad personalization across Google services based on user behavior. This consent enhances user experiences.

Enables personalizing ads based on user data and interactions, allowing for more relevant advertising experiences across Google services.

Receive more relevant advertisements by sharing your interests and behavior with our trusted advertising partners.

Enables better ad targeting and measurement on Meta platforms, making ads you see more relevant.

Allows for improved ad effectiveness and measurement through Meta’s Conversions API, ensuring privacy-compliant data sharing.

LinkedIn Insights

Tracks conversions, retargeting, and web analytics for LinkedIn ad campaigns, enhancing ad relevance and performance.

LinkedIn CAPI

Enhances LinkedIn advertising through server-side event tracking, offering more accurate measurement and personalization.

Google Ads Tag

Tracks ad performance and user engagement, helping deliver ads that are most useful to you.

Share the knowledge!

Share this content on:

or copy link

Cite according to academic standards

Simply copy and paste the text below into your bibliographic reference list, onto your blog, or anywhere else. You can also just hyperlink to this page.

New to UX Design? We’re Giving You a Free ebook!

The Basics of User Experience Design

Download our free ebook The Basics of User Experience Design to learn about core concepts of UX design.

In 9 chapters, we’ll cover: conducting user interviews, design thinking, interaction design, mobile UX design, usability, UX research, and many more!

  • Design system
  • Web app development
  • UI/UX design
  • Cybersecurity

Voice User Interface: Introduction, Benefits, and Trends

What is a voice user interface, and why should designers care? Learn about the pros, cons, and emerging trends in VUI design.

Written by Ramotion Aug 2, 2023 14 min read

Last updated: Feb 12, 2024

Introduction

The field of user experience design is advancing rapidly. With new technologies and smart devices being introduced every few months, the nature of interaction with physical and digital products is constantly changing. Now products and services can be directly manipulated with the help of touchscreen technology.

There are several other ways to interact with technology, such as virtual and augmented reality, and even with the use of voice commands. Google Assistant, Amazon Alexa, and similar technologies enable voice interaction, thus enriching the user experience.

The design of voice UI is an emerging area of interest in UI/UX design. All around the globe, user interface experts are increasingly paying attention to voice user interface design to highlight how this type of user experience can be enhanced. The unique nature of this interaction demands that voice user interfaces be studied in depth, focusing on the key aspects that impact the user experience.

Voice User Interface

Voice User Interface ( Justinmind )

In this article, we introduced the concept of voice user interfaces (VUI) and their working mechanisms, advantages, and challenges. We also discuss the popular trends when it comes to voice user interface design and some inspiring examples from the real world.

Read along as we discuss this unique, emerging topic that will gain more popularity in the days to come.

Understanding VUI

The voice user interface is a design concept that is continuously evolving. It is critical for designers to understand what voice user interface means and how to design effective voice interactions. This type of design is not merely about incorporating voice commands in interaction and adding just a new layer of features or a new dialog flow to the journey.

Instead, to design voice user interfaces, UI/UX professionals have to ensure that the entire user journey is smooth and that the introduction of a voice command impacts the user experience in a good way.

What is a voice user interface in design?

The voice user interface is a technology based on the principles of speech recognition. The users can interact with such devices by talking to them as in any human-to-human communication.

The basic idea of voice user interface design rests in voice recognition with the help of smart devices. The users can interact with devices by talking to them like human communication.

This is what makes voice interaction unique, attractive, and challenging at the same time. Additionally, designers must incorporate a conversational user interface within the existing graphical user interface to make the experience smooth and familiar for the target audience.

Understanding VUI

Understanding VUI ( Glance )

VUI Working Mechanism

As the name suggests, voice interface design depends on the intelligent use of voice commands and speech recognition to ensure a unique user experience. Using a voice command for efficient interaction requires a blend of graphical user interfaces with voice recognition software.

The working mechanism of a voice user interface is highly dependent on the quality and efficacy of this software, that is, the extent to which the software can recognize the human voice and extract valuable information from the voice commands.

What are the main components of VUI?

The major components of a VUI mechanism are as follows.

  • Speech recognition
  • Natural language processing
  • Speech synthesis

Along with the traditional role of a UX professional, it is essential to think of these interactions as a conversation designer. This is where the knowledge of conversational UX design , such as chatbots, can come in handy.

Designers can bring their expertise from such products and services to create compelling voice interactions, thus going beyond creating a simple dialog flow between humans and machines to add a new feature to the user experience.

Working Mechanism of VUI

Working Mechanism of VUI ( vilmate )

Main components of VUI design

As noted above, speech recognition is one of the most critical aspects when designing voice user interfaces. Along with this fundamental element, several vital components of voice user interface design must be implemented carefully to ensure a successful user experience design.

A good voice user interface is not just good at automatic speech recognition, but it can synthesize natural conversation, understand spoken commands, and transmit messages accordingly. The most critical components of VUI design are as follows.

1. Speech recognition

The first and most crucial component of a voice user interface design is the ability to recognize human voice. Voice or speech recognition can be managed with the help of specialized software tools and advanced programming.

The underlying goal is to equip the interface to recognize and understand the user's voice and pick a voice command when spoken to.

2. Natural language processing

Natural language processing (NLP) is a highly advanced technology that takes voice recognition to the next level. NLP uses artificial intelligence and machine learning not to understand human language and spoken commands.

This ability in a speech interface enables the product or service to understand how humans talk to each other, which is our natural language.

3. Speech synthesis

Recognizing and understanding human language would only be helpful if the machines could synthesize voice commands. This is where a thorough analysis of the spoken language comes into play.

Voice interaction between humans and technology demands that the interface synthesize what is being said and then generate a response based on the user's needs.

4. Feedback

Feedback is the part where the machine ensures a smooth dialog flow with humans. An interactive voice response mechanism listens to the users' commands, understands them, synthesizes the spoken language, and then issues a response aligning with the commands, thus coming full circle.

The efficiency of voice command devices also highly depends on the speed with which feedback is provided to users.

VUI Device Types

The history of voice user interface design goes back to 1984, when interactive voice response (IVR) systems were first introduced. Since then, several VUI devices have been launched over time. However, the progress in the last few years has been remarkable, ranging from virtual assistants – such as Google Assistant and Amazon Alexa – to smart speakers and TVs.

VUI has taken over the technological world. Our homes and offices are full of devices that can recognize and understand a voice command, taking interactions to a new level.

What are some standard VUI device types?

Some common types of VUI devices are as follows.

Smartphones

Virtual assistants, sound systems and tvs.

Some of the most common and advanced voice user interfaces are discussed below.

When we think of smart devices, the first thing that comes to mind is a mobile phone – or a smartphone. A small handheld device that has, quite literally, revolutionized our world, making communication more accessible and more efficient than ever before. There will hardly be any smartphone out there that doesn't have a voice user interface.

Several mobile apps have built-in voice search capabilities, giving users a different way to interact with the design. However, an additional voice app is not necessarily needed for a smartphone, as the operating system is capable of efficiently working with voice commands.

user research voice interaction

Smartphones and VUI ( Wikimedia Commons )

Along with mobile phones, wearable technology, such as smartwatches, fitness trackers, and similar devices, are remarkable examples of voice user interface design. It is incredible how these small devices can provide quality voice and visual interfaces with a usable and user-friendly design.

Apple Watch, for example, can serve as a mini smartphone, where Siri can listen to a voice command and provide the desired feedback.

Wearable Technologies

Wearable Technologies ( Iberdrola )

Virtual assistants are, arguably, the best and most efficient examples of voice interfaces. Technologies such as Google Assistant, Amazon Alexa, Siri, and Cortana have taken the world of voice interactions by storm.

A voice assistant can not only control devices in a smart home but also integrate with several other smartphone or wearable applications, thus making the Internet of Things (IoT) a reality. These devices can recognize and understand complete user phrases and sentences and give feedback in voice output.

VUI is Integral to Virtual Assistants

VUI is Integral to Virtual Assistants ( Business Insider )

Other popular types of devices implementing voice interactions include smart speakers, sound systems, and televisions. Smart speakers and TVs have built-in virtual assistants, similar to smartphones, that can pick up on human language and voice commands.

Like a virtual assistant, a smart speaker can be activated by a wake word and then commanded to play anything based on the user's desire. A smart speaker can also be integrated with other products in the household, making it a part of the mesh of technologies.

Smart Speakers and VUI

Smart Speakers and VUI ( Gemba )

Pros and Cons of VUI

Voice user interface design is an incredible technology – one that has the potential to take user experience to a whole new level. With voice recognition being a part of many digital and physical devices, it is becoming increasingly important for designers to incorporate this technology into their products and services for better user engagement.

Voice interactions and voice technology, however, have their challenges. For starters, incorporating voice control in user interfaces would require a complete overhaul of the UX design process . Some critical pros and cons of voice user interfaces are discussed below.

Pros of VUI

There are several benefits of designing and using voice user interfaces. Some of the most important ones are listed below.

  • With the help of VUI devices, users can perform their tasks faster. Instead of typing out long commands, they can say what they are thinking, and the devices would oblige, thus saving much time and enriching the user flow.
  • Voice-enabled devices also provide more flexibility in interaction. The users no longer have to stick to one mode of interaction. They can directly manipulate devices by pushing buttons, touching the screens, or using voice commands.
  • Accessibility is an excellent benefit for VUI devices. For someone with limited mobility or physical limitations, the voice user interface simplifies the overall experience.
  • From the perspective of organizations, VUI is an excellent way of getting a competitive advantage and being a part of this advancing world. When a user asks for multiple ways to interact with devices, designers need to provide them with alternatives, and this is where the role of VUI becomes critical.

Cons of VUI

VUI design has some specific cons, challenges, and benefits. Some of the most important ones are as follows.

  • One of the most pressing concerns is that of privacy. Since voice-controlled devices are almost always connected to the internet, the personal identifying information of users can be put at risk. It is important to make a user mindful of these challenges in clear and transparent terms.
  • The cost of VUI devices is also another challenge, thus making it hard for everyone to access these technologies. On the one hand, there is the benefit of accessibility, but on the other, many users may not even be able to use these devices because of their high cost.
  • VUI devices are also only suited for some tasks, not at the moment. This means the users cannot entirely rely on voice commands for all their functions, as some direct interaction would still be needed.

Emerging Trend - VUI Design

Almost all modern devices incorporate voice and graphical user interfaces in their products and services. From voice-enabled AI assistants and smart home devices to mobile phones and wearable devices, every new bit of technology is responsive to voice commands, changing how users interact with products and services.

With this increased interest, new avenues have emerged for designers to work on.

What are some essential factors to consider regarding the emerging trend in VUI design?

Following are the factors that designers and organizations are increasingly focusing on.

Speed and efficiency

Ease of use, familiarity with human communication, accessibility, localized design.

Like every other product, whether in the digital or physical world, there is always a need for improvement and the drive to meet users' needs and expectations. The case is similar when it comes to voice-enabled devices. Some of the modern trends are discussed below.

Emerging Trends in VUI Design

Emerging Trends in VUI Design

The speed of interaction is a major area of work for designers as well as product developers. A voice user interface must be responsive so that an action is taken not long after a user requests.

Significant improvements have also been made in this regard. Regarding some voice projects, the responsive nature of interfaces has improved a lot. One example is the speech-to-text feature in Microsoft Word and Google Docs , where the users see instant feedback on their screens.

For UI/UX designers, a good experience is only possible by making the interaction straightforward. This is where voice user interface design still needs much work. Creating dialog flows between humans and machines, with human communication as the ideal model, is tricky.

Combining this with the existing interfaces that the users are familiar with makes it challenging to create a tangible user interface that meets all the target audience's needs.

As mentioned above, human communication is the best communication model for us, where a back-and-forth conversation is easily held based on certain unsaid dialog principles.

Designers and developers have been working on replicating this model in human-machine interaction, where both the user and technology can understand each other in a better way.

This also draws from the basic principle of familiarity in UX design, indicating that user satisfaction would improve if we start from a familiar experience.

Accessibility is an added benefit of voice interfaces. With technologies such as screen readers and text-to-speech interfaces, individuals with cognitive or physical disabilities can get better access to digital content.

However, more work is needed to make voice control technologies genuinely accessible. This is one area where user research is currently being conducted by leading organizations so that the overall experience of all individuals can be enhanced.

When it comes to human conversation and the use of voice recognition software, one important challenge is the localization of technologies. Since most digital technologies are produced in predominantly English-speaking countries, there are few resources for people living in the Global South.

Localization is the process that can make these voice-controlled devices available to people living outside of the Euro-America region. UI/UX designers are conducting extensive user research and working on the localization of voice user interfaces to ensure that different languages, accents, and cultures can also reap their benefits.

VUI Examples

Now that we've looked at the working mechanism of voice user interfaces and some emerging trends let's look at some real-world examples. It is interesting to see how many products and services can voice recognition, thus qualifying – at least to some extent – as examples of voice user interfaces.

It is also incredible to note how much we rely on the voice system. For instance, innovative home technology would be rendered useless without an effective VUI design.

What are some of the best VUI design examples?

The following are some examples of the best VUI designs.

Alexa: Managing the smart home

Android auto: improving the search process, siri: making smartphones accessible.

Some of the best examples of voice user interface design are discussed below. Designers can use these examples as inspirations when working on their projects.

Smart home assistants and hubs like Google Assistant and Amazon Alexa are excellent examples of voice user interface design. Amazon Alexa offers a pretty easy-to-use and interactive voice user interface where the users can manage all the smart devices in their home with the help of a single mobile application and a physical device – such as an Echo or Echo Dot.

A variety of devices, such as smart lights, motion sensors, smart speakers, and TVs can be integrated into the smart home system. Voice assistants can then be used to control all the devices with simple voice commands.

Alexa also allows users to create routines that can be customized based on voice input. This adds to the flexibility for users, making the design easy to manage and use.

Amazon Alexa Manages Smart Homes

Amazon Alexa Manages Smart Homes ( market )

Android Auto and Apple CarPlay are other excellent voice user interface design examples. Both technologies have their pros and cons.

Here, we will only look at Android Auto, notably how voice search is improved and enriched with the help of this product. When driving a car, whether the screen is being used for navigation, it is always safe for users to use voice commands instead of typing text on the screen. This is where the voice search feature of Android Auto is beneficial.

It allows users to add stops to their journey and can also be used to search for music, connect with Bluetooth devices, pick up calls, and perform other essential tasks. This is an excellent example of making the lives of users easier with the help of an effective VUI design.

Android Auto’s Voice Search

Android Auto’s Voice Search ( DroidWin )

As mentioned above, one area of user research in VUI design is focused on making technologies more accessible. Apple's Siri – available in all iPhones, Apple Watches, iPads, and Macbooks – expertly performs this job.

Siri is efficient enough to perform most tasks without users touching their phones. The users can activate this virtual assistant simply by saying, "Hey, Siri!" and then go on with their voice commands.

user research voice interaction

Siri on iPhone ( AARP )

The voice user interface is the current big thing in the design of products and services. This technology is not just limited to voice search or seeking the services of a voice assistant.

Possibilities of incorporating voice control in digital and physical designs are countless. UI/UX professionals must get acquainted with emerging voice user interface design trends to improve their designs significantly.

Discussion above can serve as a great starting point. We discussed the basic concept of VUI design, focusing on the working mechanism and critical components required to design voice devices.

Examples discussed above can also serve as inspiration for aspiring designers, thus highlighting some areas where VUI design can serve as an excellent step ahead in UI/UX design.

Unlock your business potential with us

Empower your business with tailored strategy, innovative design, and seamless development. Ready to take your company to the next level?

  • Top Articles
  • Experiences

User Research and Design for Voice Applications

Ask uxmatters, get expert answers.

In this edition of Ask UXmatters , our experts consider how user research and design for voice applications differs from research and design for traditional, graphic user interfaces (GUIs). First, our expert panel discusses the importance of deeply understanding the context in which people would use an application, as well as the behavior of those who would use it.

Our panel of experts also recommends that we accurately understand the problems a voice application can solve, so it is truly helpful rather than just a cute gimmick. The panel also explores how to collect data from users when you’re designing a voice system or training artificial-intelligence (AI) algorithms.

In my monthly column Ask UXmatters , our panel of UX experts answers our readers’ questions about a broad range of user experience matters. To get answers to your own questions about UX strategy, design, user research, or any other topic of interest to UX professionals in an upcoming edition of Ask UXmatters , please send your questions to: [email protected] .

The following experts have contributed answers to this month’s edition of Ask UXmatters :

  • Richard Alvarez —UX Practice Manager at Saggezza
  • Bob Hotard —Lead UX Designer at AT&T
  • Gavin Lew —Managing Director at Bold Insight

Q: How is user research for voice applications different from user research for products with a traditional GUI?—from a UXmatters reader

“It’s extremely important to do in-field, contextual research or ethnographic studies when doing user research for voice applications,” advises Bob. “This is probably true for any new user interface or technology, but more so for voice.

”My experience has been that there is too much of a knowledge gap to ask users quantitative survey questions about voice applications. They’re still not common enough to ensure all users understand or can properly respond to questions about a voice user interface.

“For example, I have watched a usability study in which a person who was stuck on the task of how to edit a search field on a mobile phone admitted that he would ‘delete the whole thing and just tap the microphone thingy on my keyboard….” This same person had responded to a previous survey he had never used speech-to-text or voice search. Unfortunately, this scenario wasn’t an edge case.

“Age demographics—boomers versus millennials—might influence the knowledge gap for voice user interfaces. You must observe how a person uses a voice app in the real world. This can mean the difference between your designing a good or a bad voice app.

“Some might look at this observation as stating the obvious in regard to researching any user interface. On one hand, that is true. On the other hand, when designing voice apps, it is critical that you actually see how someone speaks to Alexa in their home versus in a lab rather than your asking in a survey, ‘What commands do you use to ask Alexa to [perform a given task].’ Go where your users are or where they would engage with your application and observe them.”

Leverage Voice as a Solution, Not a Gimmick

“The obvious difference with voice is that, in many cases, you are working without a visual component or a multi-modal user interface,” replies Richard. “A traditional GUI has its own set of design patterns to which we’ve grown accustomed, as users. Menus, buttons, typography, spacing, and of course, our mouse pointer—to name just a few—are the conversation pieces that a visual GUI utilizes. We immediately recognize navigation in the header or know that it’s tucked away under a hamburger menu. Mouse movement on a page and the visual changes of menu items, links, and buttons as we hover over them are a GUI’s response to our interactions, telling us ‘I’m listening.’

“With voice-only user interfaces, no wider view of the content is immediately available. A good voice user interface (VUI) is conversational and guides the user through its responses. So, when we think about a voice assistant, we’re thinking about and planning the end-to-end conversation with users, their situation, and the problem they are trying to solve.

“In many ways, our research for voice and traditional GUIs are the same. We start with understanding the use case to build empathy and learn as much as we can about the what , why , and how. So, although we strongly believe that voice is a tremendous step forward in our interactions with the digital world, there are still cases where a traditional GUI would make more sense. We don’t want voice skills to be a gimmick, but rather a solution that improves our interaction with the digital world.

“Our research should identify areas where the use of voice can improve users’ interactions and outcomes,” continues Richard. “Consider a user baking a cake in the kitchen and wanting to know how many ounces there are in two cups. The user might put down the mixing bowl, wash his hands, go to his laptop, do a quick search, come back to the kitchen, wash his hands again—you get the picture. In this example, asking the question of a smart speaker using a VUI and getting an instant response, without ever having to stop mixing the batter, is much more convenient. It’s a hands-free user interface and lets the user continue his normal activities. It’s also faster, which could make a big difference when someone is cooking or doing other time-sensitive activities. Having a smart speaker in a kitchen to aid a user in performing such simple question-and-response tasks is also a very natural approach.

“This kitchen example is a fun way of illustrating how voice can improve such simple interactions with computers. At Saggezza, we’re looking at voice solutions for our clients in warehousing, manufacturing, and back-office solutions. The same advantages of hands-free interaction, speed, and ease of use apply. However, we can now see the results of improved safety, allowing users to continue performing critical jobs in situations where the use of their hands or visual attention on tasks is not only necessary, but a matter of preventing serious injury. Plus, in many cases, our voice solutions result in greater accuracy by providing confirmation before the user moves on to other tasks, as well as by displaying real-time views of data on command.”

Apply the Wizard-of-Oz Technique

“Early voice-application work in User Experience involved testing call flows for interactive voice response (IVR) systems, where the user received and interpreted a voice prompt, then the user’s response produced another voice prompt,” answers Gavin. “To support the design of such systems, UX researchers needed to separate the voice system from the design. Imagine that a user stutters or accidentally stumbles when speaking. If we are using a computer to interpret the user’s response, it is possible that the user might become confused about whether the system simply did not recognize an utterance or the flow was incorrect.

By using the Wizard-of-Oz technique—one inspired by the movie of the same name, in which a man hiding behind a curtain pretends to be a wizard—a user researcher can act the role of the voice system by interpreting the user response and providing a voice prompt. This approach allows you to test and refine a flow, without any added confusion about the accuracy of a voice system.

No Comments

Join the discussion, janet m. six.

Product Manager at Tom Sawyer Software

Dallas/Fort Worth, Texas, USA

Janet M. Six

Other Columns by Janet M. Six

  • When UX Designs for Difficult Tasks Must Be User Friendly
  • Using Objectives and Key Results to Inform UX Design
  • The Future of UX Design
  • UX Research for Worldwide Products

Other Articles by Janet M. Six

  • User Research: An Interview with Sarah Doody
  • UX Design for Big-Data Applications
  • Lean UX for Wearables: An Interview with Greg Nudelman, Part 2
  • Effective UX Leaders

Other Articles on User Research

  • Designing for the User: How Form Insights Shape UX Design Decisions
  • Making Product Managers and UX Designers Wear Users’ Hats
  • How Can UX Research Help Struggling SaaS Products for Businesses Become Successful?
  • Adapting Top Tasks for Startups

New on UXmatters

  • Misinformation and Disinformation Online: What Design Can Do to Remedy This Problem
  • The Psychology Behind Successful User Onboarding: Leveraging Cognitive Biases
  • 3 Crucial Steps in Designing Conversational AI
  • Inclusive Digital Experiences: Redesign Strategies for Addressing Diverse Abilities and Accessibility Challenges, Part 1
  • User Experience in the Era of AI: Enhancing Human-Machine Interactions

Share this article

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Sensors (Basel)

Logo of sensors

The Investigation of Adoption of Voice-User Interface (VUI) in Smart Home Systems among Chinese Older Adults

1 College of Literature and Journalism, Sichuan University, Chengdu 610064, China; [email protected]

2 Digital Convergence Laboratory of Chinese Cultural Inheritance and Global Communication, Sichuan University, Chengdu 610207, China

3 School of Construction Machinery, Chang’an University, Xi’an 716604, China; nc.ude.dhc@upnaygnay

Peiyao Cheng

4 Design Department, School of Social Science and Humanity, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China

Associated Data

The data used in this study are available upon request from the corresponding author.

Driven by advanced voice interaction technology, the voice-user interface (VUI) has gained popularity in recent years. VUI has been integrated into various devices in the context of the smart home system. In comparison with traditional interaction methods, VUI provides multiple benefits. VUI allows for hands-free and eyes-free interaction. It also enables users to perform multiple tasks while interacting. Moreover, as VUI is highly similar to a natural conversation in daily lives, it is intuitive to learn. The advantages provided by VUI are particularly beneficial to older adults, who suffer from decreases in physical and cognitive abilities, which hinder their interaction with electronic devices through traditional methods. However, the factors that influence older adults’ adoption of VUI remain unknown. This study addresses this research gap by proposing a conceptual model. On the basis of the technology adoption model (TAM) and the senior technology adoption model (STAM), this study considers the characteristic of VUI and the characteristic of older adults through incorporating the construct of trust and aging-related characteristics (i.e., perceived physical conditions, mobile self-efficacy, technology anxiety, self-actualization). A survey was designed and conducted. A total of 420 Chinese older adults participated in this survey, and they were current or potential users of VUI. Through structural equation modeling, data were analyzed. Results showed a good fit with the proposed conceptual model. Path analysis revealed that three factors determine Chinese older adults’ adoption of VUI: perceived usefulness, perceived ease of use, and trust. Aging-related characteristics also influence older adults’ adoption of VUI, but they are mediated by perceived usefulness, perceived ease of use, and trust. Specifically, mobile self-efficacy is demonstrated to positively influence trust and perceived ease of use but negatively influence perceived usefulness. Self-actualization exhibits positive influences on perceived usefulness and perceived ease of use. Technology anxiety only exerts influence on perceived ease of use in a marginal way. No significant influences of perceived physical conditions were found. This study extends the TAM and STAM by incorporating additional variables to explain Chinese older adults’ adoption of VUI. These results also provide valuable implications for developing suitable VUI for older adults as well as planning actionable communication strategies for promoting VUI among Chinese older adults.

1. Introduction

Supported by advanced voice interaction technology, interacting with products through speech is no longer a scenario in fiction. It now happens in our daily lives. When planning for the next day, we can simply ask a smart speaker what the weather will be. It will search for weather information automatically and respond to us by speech with the temperature and probability of rain tomorrow. It can even provide recommendations for bringing an umbrella or wearing a coat. VUI has gained its popularity in this decade along with the dramatic improvements of relevant technologies. With the improvements of speech recognition technology, VUI-embedded systems can recognize voice commands accurately. For example, Google has announced that their speech recognition accuracy rate has reached 95% [ 1 ]. Natural language processing (NLP) has also been largely improved and it enables VUI-integrated systems to be capable of interpreting the intended meanings of users. As a result, an increasing number of electronic devices integrate voice-user interfaces (VUI), such as virtual assistant Siri developed by Apple, smart speaker Google Home, and Echo launched by Amazon.

Different from traditional interaction ways that require input and output devices [ 2 ], VUI allows users to interact with electronic devices through speech. Users can command electronic devices by talking to them, similar to a natural conversation in daily lives. Such hands-free interaction provides huge benefits. As input devices such as keyboard and mouse are no longer necessary, users can interact with devices in a shorter time. The interaction speed can be largely improved [ 3 ]. Moreover, as VUI allow users’ eyes and hands to be free while interacting, users can complete multiple tasks [ 4 ]. For instance, users can search for information through VUI while driving a car, which can improve driving safety [ 5 ]. Because of the advantages of voice interaction, VUI has been adopted at a fast rate [ 6 ]. In particular, VUI has become an important modality for smart home systems. Various smart home devices integrate VUI, such as smart speakers, cleaning robots, and smart television. The integration of VUI in smart home systems largely improves interaction efficiency. For instance, instead of using a remote controller, users can directly talk to a smart television. In comparison with using a remote controller through pressing buttons, voice commands largely reduce interaction time and improve interaction accuracy.

VUI can be particularly beneficial for older adults in smart home contexts. As older adults often suffer from the gradual loss of physical and cognitive capabilities, interacting with devices through traditional graphical user interfaces (GUI) can be difficult [ 7 ]. Older adults may have problems reading texts on the screens. They may also fail to type or click due to shaky hands. Instead, VUI can be a promising solution. As speech is a natural method of interpersonal communication, VUI can be much easier for older adults to learn and operate [ 8 , 9 , 10 ]. However, given the potential benefits brought by VUI, how older adults perceive VUI remains unclear. Fragmented evidence shows that older adults are open to VUI embedded in smart speakers [ 8 , 9 , 11 ] but they also show concerns in adopting them [ 9 ]. Therefore, it is necessary to understand what factors drive older adults’ adoption of VUI.

The size of the older population is increasing worldwide every year. This phenomenon is even more serious in China. The number of older adults, whose ages are above 60, has reached 2.6 million, occupying 18.7% of the general population in China [ 12 ]. The percentage of older adults has increased by 5.44% in comparison with 2010. These changes will make profound consequences for Chinese society. Older adults are more easily suffering from chronic disease [ 13 ]. Due to the decline of physical and cognitive abilities, older adults may encounter difficulties in daily lives and thus become less independent, which burden families and societies. The adoption of new technology becomes a promising way to improve older adults’ well-being, such as maintaining independence, improving their safety, and being active in their social networks [ 14 , 15 ].

However, although adopting technologies can assist older adults’ daily livers, they often show resistance to adopting new technologies in comparison with young people [ 16 , 17 , 18 , 19 , 20 ]. Such resistances can become even stronger with the increase in ages [ 21 , 22 ]. Additionally, there is no exception for VUI adoption. Thus, it is crucial to understand how Chinese older adults perceive VUI and what factors influence their adoption of VUI. Gaining these insights can help companies to develop or adapt the current VUI in order to fulfill the needs of the senior segment [ 23 ].

This study aims to fill in this gap. Specifically, this study investigates the factors that influence Chinese older adults’ adoption of VUI. Through literature review, this study firstly proposes a conceptual framework with eight variables that determine Chinese older adults’ adoption of VUI. Next, a survey was designed and conducted with 420 valid participants. Data analyses were conducted using structural equation modeling.

2. Literature Review

This research aims to investigate older adults’ adoption intention of VUI in China. To investigate users’ adoption of VUI, the theoretical models related to technology adoption are reviewed. On the basis of current technology adoption models, the characteristic of VUI is specifically considered. As VUI can threaten users’ privacy, users have to trust VUI systems in order to use them effectively. Thus, we include trust as an additional factor and review relevant theories. Furthermore, as we especially target older adults, we integrate aging-related characteristics into the framework. The literature related to aging-related characteristics is reviewed.

2.1. The Theoretical Models Related to Technology Adoption

To understand the driving factors of users’ adoption of technologies, several theoretical frameworks have been proposed. Roger [ 24 ] has proposed the diffusion of innovation model, which posits five factors that influence diffusion: complexity, tribality, observability, compatibility, and relative advantages. Davis [ 25 ] proposed the technology acceptance model (TAM), which suggests that users’ adoption intention of technology is mainly influenced by the perceived usefulness and perceived ease of use of the technology. The features of the technology itself largely determine users’ perceived usefulness and perceived ease of use. Some subsequent models have also been proposed by incorporating social norms (TAM2) [ 26 ] and enjoyments (TAM3) [ 27 ]. Extending the TAM, Venkatesh et al. [ 28 ] further established the unified theory of acceptance and use of technology (UTAUT), which pointed out that technology adoption is primarily influenced by effort expectancy, performance expectancy, social influence, and facilitating conditions. The diffusion of innovation model has been recommended for use in commercial contexts and for predicting organizational adoption of innovation [ 29 ]. TAM and UTAUT are considered to be more proper for explaining individuals’ adoption of technology [ 30 , 31 ].

Although TAM and UTAUT are robust and powerful models to predict users’ adoption of technology, the explanatory powers differ from contexts [ 32 , 33 ]. TAM and UTAUT are also found to carry some limitations to explain users’ adoption of new technologies [ 34 ]. In order to improve explanatory powers of explaining users’ adoption of technology in specific contexts, new constructs have been identified and included in TAM and UTAUT [ 35 , 36 , 37 ]. For instance, Wang, Tao, Yu and Qu [ 38 ] extended the UTAUT by including the additional factors of technology characteristics and task characteristics to explain Chinese users’ acceptance of healthcare wearable devices. To understand users’ adoption of digital voice assistants, Fernandes and Oliveira [ 39 ] extended the TAM by considering the influence of trust, social interactivity and social presence. Therefore, although TAM is a robust model to explain users’ adoption of technology adoption, it needs to be adjusted depending on specific contexts.

2.2. Trust and Technology Adoption

To understand users’ adoption of information technology-related applications, previous studies pointed out the uncertainty of the IT environment [ 40 ]. Thus, it is necessary to incorporate the construct of trust into the extended versions of TAM and UTAUT [ 41 , 42 , 43 , 44 ]. Trust is a multidimensional concept [ 45 ]. Mayer et al. [ 46 ] proposed the three dimensions of trusts: (1) competence, which indicates the skills and capabilities that allow a system to perform effectively; (2) benevolence, which refers to one’s willingness to believe that another party will not make use of its vulnerability; and (3) integrity, which is defined as one’s subjective evaluation of the appropriateness of another party’s behavior. When used in different contexts, the construct of trust can be interpreted in different ways. In the contexts of users’ adoption of new technology, trust mainly captures the ability dimension and it refers to individuals’ subjective evaluation of the reliability, functionality and helpfulness of the technology [ 47 ]. In e-commerce contexts, where transactions occurred, trust reflects the dimension of benevolence and integrity. Trust is defined as one’s belief that the e-commerce systems will behave responsibly [ 46 , 48 ].

In the context of users’ adoption of VUI, the dimension of benevolence and integrity of trust can be more prominent. Specifically, while using VUI, users have to allow systems to record and track their speech in order to improve VUI system accuracy [ 49 ]. The VUI system records the users’ voice command as well as the background sound in order to provide immediate feedback [ 50 ]. Due to this, users may feel risky or even threatened while using VUI systems to some extent. In this case, users’ trust reflects their perception of VUI systems’ willingness to behave in a socially responsible way: the VUI systems will not leak or misuse their personal information, and their personal information is protected by the VUI systems. Previous research has demonstrated the significant influence of trust on users’ adoption in various contexts, such as e-commerce [ 51 ], 5G technology [ 52 ], Internet banking [ 53 ], digital voice assistants [ 39 ] and young people’s adoption of VUI [ 40 ]. Therefore, to understand older adults’ adoption of VUI, this study includes the construct of trust.

2.3. Older Adults’ Technology Adoption

To explain the technology adoption of a specific user group, previous studies found that the TAM and UTAUT may be insufficient [ 54 ]. Specifically, the models used for young users can be insufficient for older adults because the two groups value different facets of technology. Older adults show resistance to adopting new technologies. Such resistances come from different sources, including physical and psychological factors. A number of studies have demonstrated that older adults can encounter more difficulties while adopting technologies due to the decline of physical capabilities, such as the gradual loss of sensorial capabilities of vision and hearing [ 55 ] and dexterity problems which cause difficulties in typing [ 56 ]. Psychological factors can also cause problems for older adults to adopt new technologies [ 57 , 58 , 59 ]. For instance, in comparison with young people, older adults are found to suffer more from anxiety when adopting technologies.

In order to gain a comprehensive understanding of older adults’ adoption of technology, the senior technology acceptance model (STAM) has been proposed [ 60 ], which highlighted the importance of aging-related characteristics. Specifically, STAM extends TAM through incorporating self-efficacy, technology anxiety and facilitating conditions. Results demonstrated the significant influences of aging-related characteristics on older adults’ adoption. Similarly, to understand the factors that influence older adults’ adoption of mobile health in China, Deng, Mo and Liu [ 61 ] also considered the aging-related characteristic, including perceived physical condition, technology anxiety, self-actualization needs and resistance to change. Therefore, it is necessary to include aging-related characteristics to better understand older adults’ adoption of VUI.

In summary, this study aims to understand the factors that influence Chinese older adults’ adoption of VUI. Although user adoption of VUI has been investigated [ 40 ], it targeted the young generation in western contexts. Limited research attention has been paid to understanding older adults’ adoption of VUI in Chinese contexts. This study aims to fill in this gap. To do so, this study starts from TAM and considers the uniqueness of VUI by including the factor of trust. Next, by referring to the STAM and other studies related to aging characteristics, this study integrates four ageing-related characteristics (i.e., mobile self-efficacy, technology anxiety, self-actualization, physical health condition). The research framework is shown in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is sensors-22-01614-g001.jpg

The conceptual framework of this study.

3. Hypothesis Development

3.1. vui and technology acceptance model.

According to the TAM [ 25 ], users’ adoption of new technology is predicted by perceived ease of use (PEOU) and perceived usefulness. When encountering a new technological application, users tend to subjectively assess the efforts required for using it (PEOU) and the benefits gained from using it. Extensive research demonstrates that users’ intention of adopting new technological applications is positively related to PU and PEOU [ 62 , 63 , 64 , 65 ]. PU largely results from PEOU. In other words, when users perceive a technological application as difficult to use, their perception of users can also be largely discounted. TAM considers technology characteristics as external variables that influence PU and PEOU.

In the context of users’ adoption of VUI, PU refers to the utilitarian benefits of using VUI-driven systems, whereas PEOU reflects users’ perceived difficulty of learning to use VUI. UI directly measures the extent to which users’ intention of using VUI. VUI allows users to complete various interaction tasks by voice controls rather than visual interface controls [ 66 ]. Thus, VUI provides multiple benefits, such as remote interaction and multiple task interaction. Moreover, compared with the traditional user interface, VUI enables users to interact with smart devices in an intuitive way: talking to the smart devices as if talking to a real person. Therefore, considering the benefits brought by VUI, we expect that the utilitarian benefits and convenience will positively influence users’ usage intention. In addition, as interaction with devices through VUI is highly similar to an interpersonal conversation in daily lives, it should be intuitive to learn. Effortless learning can further improve users’ perceptions of usefulness. Previous studies have demonstrated the positive relationships between PU, PEOU and UI [ 67 ] as well as the positive links between PEOU and PU [ 68 ]. The following hypotheses are given:

Perceived usefulness positively influences behavioral intention .

Perceived ease of use positively influences behavioral intention .

Perceived ease of use positively influences perceived usefulness .

3.2. Trust and Technology Adoption

As discussed earlier, trust is an essential factor in influencing users’ adoption of technology. VUI has to record users’ voice command and their daily speech to be responsive. In other words, users have to share their speech in order to use VUI effectively and efficiently. Users may feel more risks associated with using VUI than using a traditional user interface. In this case, trust means that users believe that their personal information will be protected during the usage of VUI [ 40 ]. In users’ adoption of VUI, trust helps alleviate users’ concern that their personal information has been shared and might be misused [ 40 ]. Therefore, we expect that trust is positively related to older adults’ adoption intention of VUI.

Trust positively influences behavioral intention .

3.3. Perceived Physical Conditions

To better understand older adults’ adoption of VUI, it is necessary to consider the changes caused by aging [ 60 , 61 ]. Specifically, in this study, four aspects related to aging were considered: perceived physical conditions, mobile self-efficacy, technology anxiety and self-actualization.

3.3.1. Perceived Physical Conditions

Perceived physical conditions refer to ones’ own belief of the capabilities of vision, hearing and motion in daily lives [ 69 ]. With the increase in age, older adults suffer from the gradual loss of sensory and motor systems [ 70 , 71 ]. The decline of physical health conditions hinders their effective usage of ICT systems [ 72 ]. Past research has demonstrated the negative relationships between older adults’ perceived health conditions and their perceptions of technology and their intention of technology adoption. For instance, Li, Ma, Chan and Man [ 73 ] found that PPC negatively relates to older adults’ perception of the usefulness of health monitoring devices, which in turn lowers their usage intention. PPC is also found to be positively related to perceived ease of use of health informatics systems, which further facilitates older adults’ adoption intention [ 74 ]. Ryu, Kim and Lee [ 64 ] found that PPC leads to the lower intention of participants in video UGC services.

In order to use VUI effectively, users need to have acceptable health conditions, including visual, auditory, and motion ability. Physical disabilities, such as hearing or speaking problems, can become obstacles for older adults’ effective usage of VUI. The current study targets Chinese older adults who are above 55 years old. These older adults start to experience a decline in physical health conditions, which can possibly influence their perceptions of VUI. Therefore, we expect positive relationships between PPC and PU, PEOU.

Perceived physical conditions positively influence perceived usefulness .

Perceived physical conditions positively influence perceived ease of use .

3.3.2. Mobile Self-Efficacy

Mobile self-efficacy refers to one’s subjective evaluation of his/her capability to use mobile devices [ 75 ]. The UTAUT includes the construct of self-efficacy as a factor to influence PEOU, which further influences adoption intention [ 31 ]. Prior research reported that the lack of capacity is one of the difficulties encountered by older adults when learning to use computers [ 76 ]. The higher self-efficacy indicates that users have more expertise and abilities in interacting with mobile devices. Self-efficacy is found to be positively related to technology usage [ 77 , 78 ] and older adults’ perception and adoption of geotechnology [ 60 ].

In terms of the influence of self-efficacy on users’ adoption of VUI, higher self-efficacy brings about among young people [ 40 , 79 ]. However, high self-efficacy could strengthen users’ attachment to the traditional interaction methods, leading to their resistance to new interaction methods, especially for older adults. In fact, Deng et al. [ 61 ] found that older adults often exhibit resistance to change, which further hinders their adoption intention of health information systems. In terms of Chinese older adults’ adoption of VUI in this study, adopting VUI indicates that users need to change their habits, invest considerable learning efforts and spend certain switch costs. For older adults who have high level of mobile efficacy, it becomes even more difficult because of the higher sunk costs, which makes them show more serious resistance to VUI, leading to lower perceptions and trust. The following hypotheses are posited:

Mobile self-efficacy negatively influences perceived usefulness .

Mobile self-efficacy positively influences perceived ease of use .

Mobile self-efficacy positively influences trust .

3.3.3. Technology Anxiety

Technology anxiety mainly refers to the feeling of discomfort that people experience when using technology [ 80 ]. It captures the negative emotions while using technologies. According to UTAUT, technology anxiety hinders users’ adoption intention through PEOU [ 31 ]. Rendering with the negative emotions, users easily perceive technologies negatively and show resistance to adopting new technologies [ 60 , 64 , 81 ]. For instance, in the context of using computers, prior research found that technology anxiety makes users fear using computers and making mistakes, leading to fewer possibilities of using computers [ 82 ].

In the contexts of older adults’ adoption of VUI investigated in this study, technology anxiety should also have negative influences on their perceptions and trust in VUI. Specifically, although users, in general, may experience technology anxiety to some extent, older adults suffer from it more seriously [ 83 , 84 , 85 ]. The negative influences of technology anxiety have been found in various contexts, such as older adults’ PU and PEOU of wearable warming systems [ 86 ], PEOU of geotechnology [ 60 ], adoption intention of mobile health services [ 61 ]. Consistent with this line of research, this study hypothesized the similar negative influences of technology anxiety on users’ perception and trust of VUI.

Technology anxiety negatively influences perceived usefulness .

Technology anxiety negatively influences perceived ease of use .

Technology anxiety negatively influences trust .

3.3.4. Self-Actualization Need

Maslow [ 87 ] also highlights the need for self-actualization is the highest level of a person’s need. Self-actualization relates to people’s sense of satisfaction, desire for personal growth, and pursuit of actualization personal potential [ 72 ]. To pursue self-actualization, people need to be tolerant of new changes, a new phenomenon, and new technologies. People with higher self-actualization needs tend to be more open-minded. They seem to enjoy new adventures through acquiring new skills and making new changes [ 88 ]. They would consider using new technologies as an opportunity for fulfilling their self-actualization needs.

In terms of the adoption of VUI among Chinese older adults, self-actualization could serve as a facilitator for their adoption of VUI. The self-actualization need is not only important for early adults but also for older adults. According to Erikson [ 89 ], a sense of fulfillment is the ultimate purpose that a person pursues to develop in the later stage in life. Thus, driven by the intrinsic motivation of self-actualization, older adults could view adoption VUI as a chance for new adventures. Previous studies found that self-actualization positively relates to older adults’ adoption of e-government services [ 72 ] and wearable health technology [ 90 ]. Therefore, in the context of older adults’ adoption of VUI, similar effects were expected.

Self-actualization need positively influences perceived usefulness .

Self-actualization need positively influences perceived ease of use .

4. Research Methods

4.1. sampling and procedure.

To test the proposed conceptual framework, a survey was designed and conducted. A web-based questionnaire was adopted through the professional online platform of ePanel ( http://www.research.epanel.cn/ , accessed on 8 December 2021). Although online sampling may carry some limitations, it is a valid way for the research aim in this study. The online sampling method was considered as a proper and valid way for data collection in this study because the connection to the Internet and experience with smart devices are required for effective usage of VUI. In other words, users’ experience with the Internet and digital devices is a precondition for VUI adoption. In fact, previous studies have widely used online sampling for investigating users’ adoption of smart devices, such as healthcare wearable devices [ 91 ], smart speakers [ 34 ], and smartwatches [ 92 ].

In terms of participants, participants were included based on two criteria: age and experience with smart devices, such as smartphones, tablets, smartwatches, and smart speakers. As we target older adults, we collected participants who are older than 55 years old, when people’s cognitive and physical capabilities start to decline [ 93 ]. The experience with smart devices was also used as a criterion for selecting participants because it is required by effective usage of VUI. If participants had no experience with smart devices, they would have few chances to use VUI.

Participants were first welcomed to this survey and then filled in the consent. Subsequently, participants were asked to fill in their age and experience with smart devices. These two questions served as screening questions. Participants were allowed to continue this survey if they were older than 55 years old and they had experience with at least one smart device. Next, as participants might be unfamiliar with voice interaction technology, participants were presented with a short introduction video, which briefly exhibited the benefits, usage procedures, and usage scenarios of VUI. To specify, this particular VUI was specially designed by a professional interaction designer. The scenario included using VUI to control smart speakers, smartphones and smart televisions. The video was around 90s. After watching this video, participants were asked to indicate their experience with VUI, perceptions and adoption intention of VUI based on a series of statements. Finally, they were asked about their personal information, including demographic information, their perceived physical condition and their psychological characteristics (see Table 1 in the next section).

Constructs and measurements.

4.2. Measurement

Through the extensive review of current studies, a questionnaire was designed, which included three parts: (1) participants’ perception of voice interaction technology, (2) aging-related characteristics and (3) participants’ demographic information. The measures related to participants’ perception of voice interaction and aging characteristics can be found in Table 1 . These measures were based on or adapted from existing validated measures. Participants were asked to indicate their opinions of measures based on a 7-point Likert scale ranging from 1 (strongly disagree) to 7 (strongly agree). The human protocols used in this work were evaluated and approved by Sichuan University (YJ202203).

4.3. Data Collection

Participants were collected through an online survey. In total, 420 participants were collected (Mean age = 59.67, 50% male). Table 2 showed detailed descriptions of this sample. We aim to cover current and potential users of VUI, which is a commonly used and valid way to investigate users’ adoption intention of specific technologies [ 91 , 98 ].Thus, we did not select participants based on their experience with VUI. We selected participants based on their experience with smart devices, which is a precondition for users’ adoption of VUI. Among the participants who have experience with smart devices, they have experience with VUI more or less. In this way, we capture both current and potential users of VUI. Participants’ experience with VUI can be found in Figure 2 .

An external file that holds a picture, illustration, etc.
Object name is sensors-22-01614-g002.jpg

Frequency table of participants’ experience with VUI.

Descriptive analysis of participants.

The initial descriptive analyses and reliability analyses were conducted by using SPSS 25.0. Next, the data were analyzed by examining the measurement model and the structural model, respectively [ 99 ]. AMOS 24.0 was used to conduct the confirmative factor analysis [ 29 ] to assess the measurement model and perform path analysis.

5.1. Reliability and Validity

The measurement model confirms a goodness of fit: x 2 /df = 1.979, GFI = 0.926, SRMR = 0.040, RMSEA = 0.048, CFI = 0.965, NFI = 0.931. Cronbach’s alpha was calculated for reliability tests. Results revealed satisfactory reliability of all the measures with a threshold of 0.7, except for the measure of perceived physical conditions, which is greater than 0.6 [ 100 ]. The AVE values of all the measures are above 0.5, except for PPC. Next, CFA was conducted to assess the validity, including unidimensionality validity, convergent validity and discriminant validity. Results showed that these measures exhibited adequate validity (see Table 3 and Table 4 for details). Specifically, the standardized loadings of all the items are above 0.5. Although most of the average variance extracted (AVE) is above the threshold of 0.5, AVE for PPC is slightly lower than 0.5. Considering that the composite reliability for PPC is higher than 0.6, the convergent validity of the construct is still adequate [ 101 ]. As for discriminant validity, the square root of AVE should be higher than the inter-construct correlation in the model; however, some square roots are lower than the correlations (e.g., the relationship among BI, PEOU and PU). Thus, it is necessary to further examine whether the current model achieved satisfactory discriminant validity. The latest research suggests HTMT might be a powerful criterion for discriminant validity assessment [ 102 ]. The results of the HTMT (see Table 5 ) test showed all the values are below the threshold of “0.9”, suggesting it achieved an acceptable discriminant validity. In addition, the composite reliabilities of all the constructs were above 0.7. Taken together, the measures used in this study showed satisfactory validity [ 103 , 104 ].

Reliability and unidimensionality.

Constructs correlation matrix.

The HTMT Analysis of discriminate validity.

5.2. Structural Model Assessment

Structural equation modeling was used to analyze the proposed research model with AMOS 24.0. The results revealed absolute fit indices and incremental fit indices (see Table 6 ). All the values are greater than the suggested values [ 105 ], which indicates that the data has a good fit with the proposed model and the data is adequate for further path analysis.

Goodness-of-fit test.

Note: GFI = goodness-of-fit index; SRMR = standardized root mean square residual; RMSEA = root mean square error of approximation; NFI = normed fit index; IFI = incremental fit index; TLI = Tucker–Lewis index; CFI = comparative fit index.

5.3. Hypotheses Testing and Path Analysis

Path analysis was conducted through SEM to examine the relationships among variables. The results of path analyses can be found in Figure 3 and Table 7 . Results revealed that ten out of fourteen hypotheses were supported or partially supported. Behavior intention was predicted by perceived usefulness, perceived ease of use and trust, with a variance of 64.34%. Perceived usefulness was the most determinate variable, followed by perceived ease of use and perceived trust. Moreover, perceived usefulness was predicted by perceived ease of use, mobile self-efficacy and self-actualization, with a variance of 51.09%. Perceived ease of use was explained with the variance of 30.07% by self-efficacy, technology anxiety and self-actualization. Perceived trust was influenced by self-efficacy with a variance of 54.02%.

An external file that holds a picture, illustration, etc.
Object name is sensors-22-01614-g003.jpg

Results of SEM. Note: * p < 0.1; ** p < 0.05; *** p < 0.01.

Results of hypotheses testing.

Note: * p < 0.1; ** p < 0.05; *** p < 0.01.

In terms of the influences of aging characteristics, mobile self-efficacy makes significant influences on perceived usefulness, perceived trust, and perceived ease of use. Technology anxiety influences perceived ease of use negatively in a marginal way. Self-actualization significantly influences perceived usefulness and perceived ease of use.

6. General Discussion

Accordingly, the current study tends to contribute to prior literatures in several ways. To begin with, although previous research might have emphasized the introduction of new media and discussed their acceptance towards the latest technology, limited attention has been given to VUI [ 3 ], especially in the context of China, one of largest elderly populations in the world [ 12 ]. Given the popularity of VUI nowadays, older adults’ adoption intention has been largely overlooked in China. This study addresses the research gap by proposing a model to provide insights on the factors that influence older adults’ adoption of VUI in China. In addition, rare literature has comprehensively discussed the characteristic of VUI and older adults through incorporating the construct of trust and aging-related characteristics (i.e., perceived physical conditions, mobile self-efficacy, technology anxiety, self-actualization). In order to address this gap, this study started from TAM and further extended the model to have a relatively more thoroughly insight of the behavior of the elderly in this digital era. Results revealed that three factors determined Chinese older adults’ adoption of VUI: perceived usefulness, perceived ease of use and trust.

To specify, the results reveal several important findings. Consistent with previous studies on TAM [ 2 ]. Findings confirm that perceived usefulness, perceived ease of use, and trust is three important factors to explain Chinese older adults’ adoption of VUI. Results further reveal aging-related characteristics influence older adults’ perception of ease of use, usefulness and trust. This study finds a positive relationship between trust and the adoption intention of VUI. Trust has been demonstrated as an important factor in the contexts of e-commerce, e-government and technology adoption [ 48 , 108 ]. In the context of VUI, as VUI systems need to perform monitoring functions all the time, users have to share their daily conversations with the systems. The exposure of personal information can make users feel uncomfortable and vulnerable, which hinders users’ adoption of VUI. In this case, trust becomes a crucial factor. Users’ belief that their personal information will be protected becomes can largely alleviate users’ negative feelings and facilitate their adoption of VUI. Consistent with prior research that found the role of trust in influencing young adults’ adoption of VUI in the U.S. [ 40 ], results of this study show a similar pattern. Users who have a higher degree of trust will have a stronger adoption intention of VUI.

This study also reveals the influences of aging-related characteristics. Among the aging-related characteristics, perceived physical conditions did not show any significant influences on perceived usefulness, perceived ease of use and perceived trust. These findings are consistent with previous studies [ 61 ]. One possible explanation would be that healthy conditions serve as a precondition for older adults’ adoption of VUI, but the perceived physical condition itself does not naturally lead to better adoption intention. In other words, relatively healthy physical conditions enable older adults with acceptable physical and cognitive capabilities for using VUI. For instance, a good hearing ability enables older adults to use VUI, but a better hearing ability does not improve their intention of using VUI. Most likely, perceived physical conditions are influenced by other factors, such as technology anxiety.

Different from our hypothesis, no significant influence of technology anxiety is found on perceived usefulness or trust. In line with previous studies [ 61 ], technology anxiety lowers older adults’ perception of ease of use of VUI. A marginal significant negative influence of technology anxiety is found on perceived ease of use ( p < 0.1). This could be influenced by the fact that the benefits of VUI have been well acknowledged by older adults. The anxious emotion does not have a significant influence on their perception of usefulness. Different from other interaction methods (e.g., GUI) that require considerable efforts to acquire, VUI is highly similar to natural speech in daily lives. Such similarities make older adults feel that VUI are close to them and easy to acquire. The anxiety triggered by technology might be largely alleviated because of the intuitiveness of VUI. Thus, no significant influences of technology anxiety on perceived usefulness or trust were detected.

In terms of mobile self-efficacy, as expected, it positively affects perceived ease of use and trust. The extensive experience with mobile devices provides users with a better capability of learning VUI, and thus, they have a more positive perception of ease of use. Similarly, their experience with other technological applications, such as e-commerce, also translates into higher trust with VUI. Through their previous experience, they understand that technology provides have the obligation to protect users’ personal information. There are laws and rules to prohibit the misuse of users’ personal information. Therefore, older adults who have a higher level of mobile self-efficacy form a higher degree of trust with VUI. However, the higher level of self-efficacy does not bring a higher perception of usefulness. Instead, high self-efficacy is found to lower older adults’ perception of the usefulness of VUI. This finding indicates that older adults with a higher level of self-efficacy have more serious resistance to VUI. Specifically, older adults who are skillful at traditional interaction methods may feel that the traditional ways can satisfy their needs and it is unnecessary to change into VUI. Consequently, they have a negative perception of the usefulness of VUI.

As for self-actualization, consistent with our hypotheses, it positively relates to perceived usefulness, perceived ease of use, and perceived trust. Self-actualization is an intrinsic motivation to make achievements [ 87 ]. In line with previous studies that show that a higher level of self-actualization is associated with older adults’ adoption of new technologies [ 61 , 72 , 109 ], this study further confirms this notion by revealing the positive relationship between a higher level of self-actualization and the perception of VUI. Chinese older adults view using VUI as a chance for personal development.

6.1. Practical Implications for Facilitating VUI Adoption

Chinese older adults’ adoption of smart devices remains relatively low [ 110 ]. The complicated interaction is one of the barriers to older adults’ effective usage of smart devices. Using VUI as an interaction method could be a chance to assist older adults’ effective usage of smart products. This study finds that older adults’ adoption of VUI is predicated by perceived usefulness, perceived ease of use and trust. These factors also serve as mediators for the influences of technology anxiety, mobile self-efficacy and self-actualization on older adults’ adoption of VUI. These findings have valuable implications for developers and promoters to develop better VUI and plan for better communication strategies to facilitate adoption by older adults.

Developers should improve the speech recognition quality and language processing quality of VUI, as older adults show a higher adoption intention when they perceive VUI as more useful and ease of use. Both usefulness and ease of use of VUI rely on speech recognition accuracy and natural language processing capability. The higher accuracy of users’ voice commands and better comprehension of users’ intended meanings further improve VUI’s usefulness and ease of use. Specifically, for improving perceived usefulness, developers should carefully assess the contexts for using VUIs. The usage of VUIs can be particularly helpful for complex interaction tasks that require multiple steps, such as searching and navigation tasks. It would be also useful for using VUIs in tasks that are difficult for older adults due to decreasing capabilities, such as typing and dialing tasks.

To improve users’ perception of ease of use, developers can also make the voice interaction simple and intuitive. Involving interpersonal communication techniques into VUI can be particularly helpful for older adults. Designers can think of creating a personality for VUIs, which can largely reduce the psychological distance perceived by older adults. Designers should carefully consider how to create a desirable personality, including gender, tone, speaking styles. As older adults suffer from reduced cognitive load, it would be helpful to use short vocabularies that are easy to remember, such as ‘OK’ and ‘got it’. When it is necessary to highlight certain information, it would also be useful to slow down the speed and improve the volume of voice commands.

Moreover, it is important to improve trust between older adults and VUI. Developers could explore new technologies solutions to improve privacy when using VUI. When promoting VUI, marketers could highlight the sophisticated technologies used to improve privacy as well as the agreements with users for protecting users’ personal information. Policymakers could also try to explain the regulations in law for protecting users’ information and the serious consequences for the misuse of users’ personal information.

This study further shows the influences of mobile self-efficacy, technology anxiety, and self-actualization, which are useful for developers and marketers. Older adults who have a higher level of mobile self-efficacy show a higher perception of ease of use and trust, but a lower perception of the usefulness of VUI. This indicates that a higher level of mobile self-efficacy makes older adults more resistant to the benefits of VUI. When promoting VUI, marketers need different communication strategies for older adults who have a low or high level of mobile self-efficacy. It is necessary to highlight the benefits of VUI, especially the relative advantages of VUI in comparison with previous interaction methods. It would be also possible to first target older adults who have a low level of mobile self-efficacy. Moreover, it seems that VUI is an intuitive interaction method and thus, the influence of technology anxiety is relatively limited. Technology anxiety is found to be marginally related to perceived ease of use negatively. Therefore, developers and markers do not need to pay extensive efforts on how to reduce technology anxiety. In addition, self-actualization is found to make positive influences on perceived ease of use, perceived usefulness, and trust. This finding indicates that marketers should express the message that using VUI is a channel for personal development. Marketers could use multiple channels to express these messages, such as short videos on social media and graphic posters in public places. These efforts could facilitate older adults’ adoption of VUI.

6.2. Practical Implications for Using VUI in Smart Home Systems

Older adults show resistance to adopting smart home devices although they can gain huge benefits from adopting smart home systems. The integration of VUI in smart home systems is promising to facilitate older adults’ adoption of smart home systems. The results of this research not only provide implications for older adults’ adoption of VUI but also for their adoption of smart home systems.

For developers, when integrating VUI into smart home devices, they should pay particular attention to users’ perception of ease of use and usefulness. Specifically, for some smart products, such as smart speakers, the integration of VUI can largely improve users’ perception of ease of use and usefulness because smart speakers provide various functions which require complex interactions. In this case, integration of VUI largely reduces older adults’ learning burdens, which improves their perceptions of ease of use and usefulness of smart speakers in general. Differently, for some products that require simple interactions, integrating VUI may not be an optimal choice because the improvements of perceptions of usefulness and ease of use remain limited. For instance, for a cleaning robot whose function is to clean floors autonomously, users interact with it by pressing a start button, which is direct and simple. Upon completion, users have to physically interact with it in order to clean the dust containers. Thus, because of the simple interaction and requirements of physical interaction, involving VUI in cleaning robots might not largely improve users’ perception of ease of use and usefulness. As developing and integrating VUI into smart devices is costly, developers should carefully consider the appropriateness of involving VUI in smart devices.

This study also shows the influences of aging-related characteristics on older adults’ adoption of VUI, which could also be applicable to explaining their adoption of smart home devices. Specifically, mobile efficacy may lower users’ perceptions of the usefulness of smart home devices, similar to users’ perceptions of VUI. Because users who are very familiar with current mobile devices may feel that these devices sufficiently satisfy their needs, it is not necessary to switch to smart devices. Therefore, to promote older adults’ adoption of smart home devices, it would be interesting to highlight the benefits provided by smart home devices and target users who are less familiar with mobile devices.

In addition, we found a positive relationship between self-actualization and adoption of VUI. It is possible that self-actualization also positively influences older adults’ adoption of smart home devices. When older adults have a higher level of self-actualization, they are more motivated to adopt VUI because they view learning VUI as a chance for personal development. Similarly, for older adults with high self-actualization, learning to use smart home systems could also become an opportunity for them to gain new experiences. Thus, to promote smart home devices, companies should highlight self-actualization messages and target older adults who have a relatively high level of self-actualization.

6.3. Limitations and Future Research

Although this study is carefully prepared, it carries several limitations. We conducted the data collected online. According to CNNIC, 70% of older adults in China are frequent users of the Internet and mobile Internet [ 110 ]. The adoption of smartphones exceeds 80%. The high penetration rate of smartphones and the Internet makes it feasible to collect data online. As VUI is often integrated with smart products, it is also suitable to use the online sampling method. However, the older adults who are less active online might not be covered in this sample. In other words, whether these results can be applicable for older adults who are not frequent users of the Internet still requires further validation, which can be interesting for future research. Moreover, this study provides evidence on the potential usage of VUI toward the target population. A future study can use field experiments to validate the current finding. Specifically, it would be interesting to collect elderly participants who have some hands-on experience regarding VUI usages, which can result in more specific guidelines for developing usable VUI for older adults.

In addition, the average age of participants is 59, who are labeled as young old adults. This group of older adults occupies a large proportion in China, and thus it is worthwhile to focus on this group. However, this group of older adults could be different from older adults whose ages exceed 65. Therefore, future research could replicate this study by focusing on older adults with higher ages. Moreover, this study focuses on VUI adoption intention and older adults’ general perception of VUI. In other words, although older adults are willing to adopt VUI in their daily lives, their actual usage and continuous usage remain unknown. Older adults’ actual usage might be influenced by other factors, such as usability and usage scenarios. Future research could conduct user studies to learn the usability issues with using VUI and generate guidelines for VUI development, which can further facilitate the adoption of VUI.

7. Conclusions

VUI has gained popularity in this decade. It has been integrated with various smart home devices and developed for many usage scenarios. The benefits of VUI should be available to everyone, including older adults, who occupy 25% of the overall population in China. This study investigates the factors that influence older adults’ adoption of VUI in China. On the basis of TAM, this study proposes a theoretical model to predict older adults’ adoption of VUI through incorporating the construct of trust and aging-related characteristics (i.e., perceived physical conditions, mobile self-efficacy, technology anxiety, self-actualization). A survey was conducted with 420 participants who are current or potential users of VUI. Data were analyzed through SEM and the data showed a good fit of the proposed theoretical model. Results further revealed that older adults’ adoption is determined by perceived usefulness, perceived ease of use and trust. These factors also mediate the influences of aging-related characteristics on older adults’ adoption of VUI. Specifically, mobile self-efficacy is found to make positive influences on trust and perceived ease of use, but negative influences on perceived usefulness. Self-actualization makes positive influences on perceived usefulness and perceived ease of use. Technology anxiety only exerts a marginally significant influence on perceived ease of use. No significant influences of perceived physical conditions were found. These results extend the TAM and STAM by incorporating additional variables. These results also provide valuable implications for practice.

Author Contributions

Conceptualization, Y.Y. and P.C.; methodology, Y.Y.; software, Y.Y.; validation, Y.Y.; formal analysis, Y.S. and P.C.; investigation, Y.Y. and P.C.; resources, P.C.; data curation, Y.Y.; writing—original draft preparation, Y.Y.; writing—review and editing, Y.S. and P.C.; visualization, Y.Y.; supervision, P.C.; project administration, P.C.; funding acquisition, Y.S. and P.C. All authors have read and agreed to the published version of the manuscript.

This research was funded by Humanities and Social Science projects of the Ministry of Education in China, grant number 20YJC760009; Shenzhen Science and Technology Innovation Commission under Shenzhen Fundamental Research Program, grant number JCYJ20190806142 401703; the Fundamental Research Funds for the Central Universities, grant number YJ202203.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Conflicts of interest.

The authors declare no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Sound as an interface, methods to evaluate voice user interface (VUI) experiences in various contexts

Research areas.

Health & Bioscience

Human-Computer Interaction and Visualization

Meet the teams driving innovation

Our teams advance the state of the art through research, systems engineering, and collaboration across Google.

Teams

Book cover

International Conference on Human-Computer Interaction

HCI 2018: Human-Computer Interaction. Interaction Technologies pp 117–132 Cite as

Voice User Interface Interaction Design Research Based on User Mental Model in Autonomous Vehicle

  • Yuemeng Du 14 ,
  • Jingyan Qin 14 ,
  • Shujing Zhang 14 ,
  • Sha Cao 14 &
  • Jinhua Dou 14  
  • Conference paper
  • First Online: 01 June 2018

5055 Accesses

8 Citations

1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10903))

With the development of artificial intelligence, autonomous vehicle as a new form of driving comes into public view. In this new Human-Machine Interaction Model, relationship between people-vehicles and user mental model have met a great change which include the information process, sequence, sorting, information architecture, navigation, visual thinking, data visualization, interface and media design. Autonomous vehicle uses the artificial intelligence and driving big data in the Intelligent Transportation System to deal with the cognition and perception, builds new paradigm for the human-machine interaction. The mental model based on human perception and cognition is not suitable for the mental model based on artificial intelligence. To enhance the user experience, the paper presents the hybrid mental model which combine the human mental model and AI mental model, consequently analyzes the user experience and interaction design influenced by mental model which is driven by human intelligence and artificial intelligence in the traditional mode and the autonomous vehicle mode. Based on Induction and Comparative Study on different mental models, the paper concludes the advantages and disadvantages of visual and auditory channels in autonomous vehicle. Finally, the paper presents the principles of voice user interface design in the view of adaption to the new environment, meeting the gap of the needs of users’ satisfaction of control and privacy protection, defines the usability targets and user experience targets for the autonomous vehicle voice user interface design.

You have full access to this open access chapter,  Download conference paper PDF

1 Introduction

Mental model theory is an attempt to model and explain human understanding of objects and phenomenon [ 1 , 2 ]. In pass, people have this models of themselves, others, the environment, and the things with which they interact and form it through experience, training, and instruction [ 3 ]. In the interaction design, mental model is the crucial element for users’ perception and interaction behavior logic. Kenneth Craik defined the concept of mental model in the book The Nature of Explanation as the small-scale model to try out various alternatives, conclude the best, react to future, utilize the past knowledge and experience to deal with the present and the future [ 4 ].

Jay Wright Forrester defined general mental models as the representation of the real system world concepts and relationships [ 5 ]. Susan Carey presented in the paper Cognitive Science and Science Education that mental model based on incomplete reality, past experience and even intuition refers to the process of thinking about the way a person works. Mental models help users to form action and behaviors, influence people’s concerns in complex situations, and determine the way to solve the problem [ 6 ]. Donald Norman showed the relevance of mental models for the production and comprehension of discourse from the user human computer interaction and usability perspective. Norman pointed out the two types of mental models during the design process which are the designer’s product concept and user’s mental model formed during the user and the product system mapping mechanical [ 3 ]. Mental model is the dynamic construction process with time and system comprehension [ 7 ]. For the human computer interaction, mental model from users, engineers and designers bridge the gap of perception and interface, and plays the important role in HCI and design refinement [ 8 ].

In these views, mental models can be constructed from perception, imagination, or the comprehension of discourse [ 1 ], also the mapping media or translation bridge between the reality and the abstract virtual reality or the metaphysical meanings. Consequently, Artificial Intelligence rooted in big data, algorithm, computing and context uses crowds’ wisdom to construct the association, prediction and imagination of the objective world, to map the subjective noosphere. The deconstruction and reconstruction between the objective and subjective leads to human-machine new interaction relationship and new mental model, simultaneously influence interaction design. In the traditional driving environment, Human Intelligence (HI) is the dominant factor for HMI (Human-Machine Interaction) which control the driver with the help of Mechatronics and mechanical installations to drive vehicles. However, autonomous vehicles which are driven by Artificial Intelligence (AI) change the dominant function in a disruptive way. AI embedded in autonomous vehicles carries out the main tasks and duties replace of Human Intelligence which include driving plan, workflow and driving command, identity traffic, navigation and wayfinding. Autonomous vehicles driven by AI liberate humans’ productive forces from the core tasks of driving behaviors, transform the drivers’ status into duality functions. Cognitive redundancy from the drivers and riders pour into entertainment, relax and working through quantified self which is collected driving scenarios big data or user generated contents, profession generated contents and organization generated contents.

As the new HMI and mental models come into being, the paper explores the users mental model transformation in the AI scenarios, comparative research on the main tasks and adjunct tasks with the traditional vehicles, human-machine interaction relationships and mental models originated from users’ cognition, perception workflow, interaction model, information architecture, information dimensions, information types processing and media forms. Finally, the paper chooses the voice user interface for autonomous vehicle design as the case study, analyzes the user experience and interaction design based on the new mental model.

The main contributions include:

System analyze the mental model transformation with the form of AI autonomous vehicles, summarize the cognitive process in the autonomous vehicle driving and riding scenarios.

Comparative research on the different advantage information types between Human Intelligence and Artificial Intelligence, develop the suitable interaction design dimensions.

Present Voice User Interface design suggestions and strategies for the autonomous vehicles with the AI mental model.

Propose the hybrid intelligences mental models which merge the humans’ perception, cognition, languages and the diversified animals’ sensors or humans social network crowds’ wisdom.

2 Related Work

2.1 autonomous vehicle.

Autonomous vehicle, or self-driving car is the automatic perception of surrounding environment by vehicles equipped with artificial intelligent software and various induction devices, including vehicle sensors, radar, GPS and 2D or 3D cameras. Artificial Intelligence algorithm is used to make a correct driving decision. Hardware which include mechatronics devices and information communication equipment are used to realize the autonomous and safe driving of vehicles, reach the destination safely and efficiently, and achieve the OD goal of completely eliminating traffic accidents. SAE International’s new standard J3016 [ 9 ] showed the six levels to define the degrees of vehicles automation (see Table  1 ). The paper researches on Level 4 and 5 autonomous vehicle.

As an intelligent system, autonomous vehicle automatically senses the environment and make driving decisions by the artificial intelligent system which include perception, decision and control, vehicle platform manipulation [ 10 ]. Perception sensors and actuators, 3S (GPS, GIS, RS) positioning devices, intelligent transportation system access data of the driving environment and perform the users, vehicles and traffic environment interaction. The automated driving system which include the dynamic driving task, the parking needs, the roadway and the commuter information receives and collects the big data from the transportation elements and scenarios, the system uses the AI algorithm to analyze the travel route and control vehicles mobility. The feedforward and feedback of the human-vehicle, vehicle-vehicle and vehicle-system combine the Human-Vehicle Interaction. In the whole dynamic driving procedure, AI hastens the birth of AI proactive or active driving mode. The paper focuses on the change of the driving mode guided by Human Intelligence and Artificial Intelligence, explores the advantages and disadvantages of the different mental models and information interaction workflows supported by different intelligence.

2.2 Voice User Interface

As the Artificial Intelligence based on natural language interaction influences a variety of computing architectures as multi-cores CPU system, heterogeneous system, distributed system, vehicles composed of AI devices have the ability of hearing and understanding of the users’ languages in the state-of-the-art autonomous vehicles. Human-Machine Interaction adopts not only the planned programming language, but also the natural languages to support from the “hands on”, “hands off”, “eyes off” to “mind off” and “steering wheel optional” autonomous driving concept. HMI two-way communication and interaction support the machine to proactive or active understand the needs of the user and feedback reply. The key technologies of speech interaction include speech recognition, speech synthesis, and semantic understanding. The input speech is converted to text or command respectively, and the text is converted to machine synthesized speech, and the natural language text is transformed into user intention, so that the machine can understand user’s needs [ 11 ].

In the traditional driving scenarios, voice user interface drastically reduces the distraction and security of the driver for carrying out the driving main task, and distributes the cognition on the secondary task which is related with the non-driving task [ 1 ], such like receiving call and music media entertainment. Voice assistant is the most common way of voice interaction. It can lighten the driver’s visual burden, reduce the distraction of energy and ensure driving safety.

In today’s traditional car market, voice user interaction is widely applied to vehicle navigation. It relies on satellite positioning and real-time traffic monitoring technology to provide drivers with early planning routes, providing speed limit, traffic jam, illegal driving taking evidence photos and other prompts to help drivers successfully drive to their destinations. As a new form of intelligent voice interaction, voice assistant is also gradually applied to the design of vehicle voice system.

Voice User Interface (VUI) provides the new HMI interaction mode and contributes several characters for autonomous vehicles interaction design.

Hearing faster wakes up the user’s attention [ 2 ], just like the athletes starting shots. Compared to the visual image, voice as the low dimension information, become the users’ cognitive focus much more simple than graphic user interface and highlight the mental model theme, however, visual information as the higher dimension information, carries non-focus and diversified information storyline according to different perspectives which include first-person perspective, second-person perspective and third-person or God’s perspective. Different visual perspective causes misunderstanding of the driving tasks and distracts the users’ attention.

Voice information sets up personification mental model and eliminates user interaction psychological barriers. To compare with visual information, VUI is better than Graphic User Interface (GUI) for personification association between the strange or unfamiliar field and the past user experience knowledge. Nevertheless, excessively personification results in high user psychological expectations. When driverless vehicle is executing tasks, the usability targets for the interaction design, such as performance, accuracy and stability will damage user experience in turn.

VUI supports space free multi direction interaction for the directionless longitudinal wave, Different from GUI multi-users interaction through multi-touch visual information, voice communication in the context of AI from multi-users simultaneously interaction produces cocktail party effect [ 3 ]. Voice interaction obstacles come from the multi-users identity and tasks in time non-interruption implementation. However, voice communication with Human Intelligence can skillfully divert attention with the selective auditory attention.

Voice information belongs to opening structure media from the aspect of cold media and hot media, and seldom influenced by visual information metaphor. For this point of view, VUI is less dominated by the interaction designer’s mental model.

VUI decreases the information architecture and hierarchy, supports task skip and switch. the visual information in GUI is restricted by space limitations, requires a series of visual information to transact a complex task in the information hierarchy, voice interaction is not dissipative from information space, and can be directly formed the dialogue or speech task, jump directly from information sequence by overlooking the time sequence constraint.

The paper mainly discusses voice user interface interaction design in the autonomous vehicles. The user in the self-driving cars requires no active control of the dynamic driving interaction behaviors as the change of artificial intelligence guidance of the primary and secondary tasks, HMI interaction pattern with artificial intelligence engenders the design method of voice user interface with the information processing advantage.

3 Comparative Study Between Traditional Driving and Autonomous Driving

3.1 main and secondary tasks changes from the traditional vehicles and autonomous vehicles.

In traditional driving context, the relationship between the user and vehicle is manipulated unilaterally by the drivers and riders. The main task of the driver is to comprehensively analyze the information of vehicle’s movement state and traffic condition, and make the correct driving strategy and make corresponding driving action [ 1 ]. However, in the context of autonomous vehicle dynamic driving and human-vehicle interaction scenarios, sensors and artificial intelligence algorithm take the place of the drivers functions to do the driving context awareness computing and riders consciousness awareness computing. Most driving main and secondary tasks transform into the AI and HI hybrid intelligence interaction scenarios which have no tasks focus and user experience storytelling. Figure  1 lists the change of the main task and the secondary task in details.

Comparison of the main or primary and secondary tasks in traditional driving environment and autonomous vehicle driving environment.

3.2 Mental Model in the Mode of Autonomous Vehicles Driving Context

Based on the comparison of main and secondary tasks, the HMI interaction logic between the user and vehicles based on traditional driving cannot fully understand and meet the new form of HMI interaction in the driverless environment. When users interact with unmanned vehicles, new mental models which combine human intelligence mental models (users, engineers, designers, etc.) and artificial intelligence (pattern, algorithm, sensors principles from animals or crowds’ wisdom) will be generated to adapt and learn new systems. For designers, only by fully understanding the user’s mental models, can we build an accurate interaction system model and interaction paradigm, proper information architecture, workflow, navigation, information hierarchy, interface and media visual design, make the design method fit with the user’s mental models, eliminate users’ doubts and face with new products, and provide a hybrid seamless user experience.

As to the mental model construction methods, some scholars presented different ways and perspectives to analyze. Indi Young thought mental model is consisted of several parts, each part is divided into groups, and the whole model can be used a series of behavior affinity diagrams [ 4 ]. Waern suggested that there are two approaches to construct mental models which depends on whether or not the learners have prior knowledge about the system. The bottom-up approach is used by learners who react to incoming bits and pieces of information, interact with the system, and gradually build a more consistent and complete mental model upward. Most users choose the top-down approach to evoke the learners existing knowledge, modify and adjust the mapping relationship and reconstruct it into a new mental model according to the information they perceive as they interact with the system. Expert users and novice programmers or learners prefer the different mental model types [ 5 ]. The construction of mental models is mostly through psychological experiments to find out some general mental models in a statistical way. As a new research direction, autonomous vehicle is a new application for the research direction, and the experienced subjects or experiments are relatively small data, which fails to form user research with large sample size. This is also the defect of this paper. Therefore, the paper studies from the past experience based on the analysis of user perception of the traditional car and the system interaction model. According to the conceptual model of unmanned vehicles, the paper analyzed the cognitive process to infer the user interaction with the drones, and then summed up the mental model under the new situation, focusing on a variety of mental models and external environmental information process.

In the traditional driving environment, the concept model of traditional automobile is made up of power system, driving system and braking system. The relationship between man and vehicle is driven by the users’ single side, and the cognition process of human interaction on vehicles is based on the users’ perception of the external world, and the perception information acquired by cognition forms the goal and then makes dynamic driving or riders decision-making behavior. Through the voice and graphical user interface for human vehicle interaction, instruction is conveyed by the present humans. Traditional cars are then informed by information feedback, such as lighting and voice, to compare the results of users’ actions with goals, and adjust the behavior again, forming a cycle process. The whole process is supervised by the human intelligence driven by the human’s mental model, which is consistent with human cognition, and then reconstructs the mental model based on information feedback, and finally generates the correct model. Figure  2 shows the cognitive process of driver’s interaction to human and car in the traditional driving environment. Taken the cognitive cycle of decision making model as the reference mentioned by Connolly and Wagner [ 6 ], the paper constructs the HMI cognition process. Through the decision cycle, decision-makers construct the cognitive perceptions of reality, complete the knowledge accumulation, adopt reasonable execution behavior, after its feedback effect on the real world’s environment, the external environment being reformed or changes in turn is recognized or understood by the decision makers again. With the establishment of new knowledge, the new mental model guides the users’ behaviors again and constitutes a cycle of cognitive decision making.

The cognitive process of driver’s interaction between human and car in the traditional driving environment.

The paper points out the change of the mental model based on the traditional driving environment cognition process. As shown in Fig.  3 , in the autonomous vehicle dynamic driving environment and scenarios, self-driving cars are controlled by the Artificial Intelligence, Human-Machine Interaction relationship transformed into human vehicle symbiosis co manipulation. AI is different from HI and the crowds’ wisdom result from the swarm intelligence algorithm. With the function of big data, deep learning from visual and voice computing, AI in HMI leads to new cognition process and mental model. In the new cognition process, the relationship between the perception based on sensors and the cognition based on mental models is changed dramatically. Cognition can be set up without senses perception. Sensors from the animals’ senses principles which include the visual, the sound or the touch can be translated into different waves which are not sensed by humans. The new perception produces new mental model for the cognition, consequently change the logic sequences and mental model mode between the perception and cognition. Diversified mental models lead to rich types of Human-Machine Interaction modes and models. In the new cognition process, autonomous vehicles are empowered by the AI algorithm from the high-performance sensors which receive the intelligent transportation system and the Internet of Things environment. Merging with the human six senses or animals living things senses information, the driving environment which combines the driving context awareness, consciousness awareness and emotion awareness comprehensively form the Artificial Intelligence cognition. Based on big data and new AI HI hybrid cognition information, autonomous vehicles can actively or proactively make the driving decision making, self- control the driving behavior and make the route strategy. Human intelligence supervises the driving scenarios and partially participates in the decision making. The HMI interacts with the voice, images, visual and tangible user interfaces to proceed the information exchange, feedforward, feedback and adjust the decision made by AI, finally fulfills the safety stable driving behaviors.

The cognitive process of HMI interaction in autonomous vehicle dynamic driving environment

In the autonomous vehicle driving environment, there are two different mental models which include Human Intelligence and Artificial Intelligence mental model. The two mental models interact with each other and impact on the external environment simultaneously. Taken the double-loop learning process diagram as the reference, the paper presents the learning process new mode in the mental model in Fig.  4 and represents the impaction of the different two types of mental models on the strategy and principles after the feedback of information reaction respectively. As AI deals with the data from every possible channel from human environment and animals’ senses simulation, mental model made of AI cognition goes straight forward the external environment and cognition results without human perception. The human prior knowledge, information-processing styles and universal common senses intelligence in some situations plays no role on the mental model which is dominated by AI. Some scholars research on the AI simulation on the animal intelligence. For some aspects, humans’ brains structure is not as good as the brain structure of other animals. And artificial intelligence not only simulates human brain intelligence, but also simulates the intelligence of other animals [ 7 ]. In the autonomous vehicles image guidance, the compound eye structure of the fly vision system is simulated, which enables the seeker to realize the 360° search field that the human eye vision system does not possess [ 8 ].

double-loop learning process and double intelligence mental models.

When AI develops the new mental models on the advanced stage, intelligence learns not only from the knowledge and increases the existing mode, but also has the original creation ability to build new mental model with direct cognition without perception. In the single human intelligence dominant mental model, the learning process consists of two translation procedure which includes translation of the external environment signal data into visual or sound information and GUI or VUI translated into command interaction behaviors manipulated by humans. However, in the new mental model which is developed by Human Intelligence and Artificial Intelligence, autonomous vehicles can use other creatures such as Hawkeye, perception of the compound eyes of flies, Pipi shrimp eyes from the second person perspective to perceive the world from the multi-dimensional perspectives while building the cognitive style, the hybrid intelligence mental model will also change the perception which omits the middle part to translate among the meta data, humans, vehicles and the environment. Nature can directly use meta information and metadata to the problem solving and driving context and do not need to translate for humans’ cognition or perception. However, the mental model for the driving and learning process not only depends on the AI and other information communication technology, but also takes the non-technical problems into account, such as the SNS behaviors, social responsibility, profit distribution, in particular people still need informed consent and information feedback during the dynamic driving environment when riding in the autonomous vehicles. Taking the unmanned vehicle assessment regulations into the consideration, there is no need to intervene in the cognitive traffic law, the evaluation of executive ability, and the disposal of emergency channels, while in the joint decision-making of comprehensive driving ability, it is necessary to participate in the decision making together with humans and AI vehicles.

3.3 Comparison of Human Intelligence and Artificial Intelligence

In traditional driving and self-driving scenarios, there are two mental models that are guided by Human Intelligence and Artificial Intelligence. The gradual gain of human intelligence in the struggle between human and nature is the result of labor. Through continuous practice and the accumulation of millions of years of evolution and experience, human knowledge has been accumulated [ 9 ]. Human intelligence is an ability to solve the problem by using of the knowledge and experience to learn new knowledge, concept and ideas [ 10 ]. People can perceive the external world through their eyes, ears, nose, tongue and mind consciousness. And Artificial Intelligence is the science and technology that uses machines to simulate human thinking in order to expand and extend human brain intelligence [ 11 ]. The development of AI based on brain cognitive science, from machine recognition to pattern recognition, from natural language processing and understanding to knowledge engineering expert systems, from knowledge patterns to a new cognitive logic, from unstructured data to structured smart contents [ 12 ]. According to the comprehensive literature, human intelligence is better at dealing with fields that need intuition, inspiration, insight and creative thinking, while AI is better at carrying out reasoning and computing as the main way of thinking and decision-making. Different intelligence needs the corresponding mental model.

In Human Vehicle Interaction, because of the inherent advantages of human intelligence, we can do any dimension switching in the 11 information dimensions, and achieve cross channel perception. But artificial intelligence has no rich information dimension and limited with the three-dimensional spaces, can only switch in finite dimension. Therefore, in contrast to the difference between human intelligence and artificial intelligence in Human-Vehicle Interaction, the difference is concentrated in the one dimension (Voice User Interface), two dimensions (Graphic User Interface) and three dimensions (Tangible User Interface). In these three dimensions, there are corresponding channels of perception, namely auditory, visual and olfactory, which are represented by one dimension speech user interface, two-dimensional graphical user interface and three-dimensional entity interface. Table  2 shows the advantages and disadvantages of human intelligence and artificial intelligence in dealing with these three dimensions’ information with the mental models.

Based on the above comparison, it is found that AI has advantages over human intelligence in dealing with voice information, and there is no difference in the processing of graphic information between AI mental model and human intelligence mental model, but the processing of entity tangible interface information is far from human intelligence. In the design of the Human-Vehicle Interface interaction design, the interaction design of voice user interface can be emphasized in the “eyes off, hands off, and mind off” autonomous vehicle HMI interaction design. And the special talents of human intelligence and artificial intelligence can be played together.

4 Suggestions for the Autonomous Vehicle Voice User Interface Interaction Design

4.1 scenarios for the autonomous vehicle vui.

According to the third section, the change of main task and secondary task leads to mental model transformation. The driver’s main task has changed from driving to entertainment, resulting in a lot of cognitive redundancy, which liberates people’s attention and is suitable for increasing various forms of entertainment and improving user experience, especially the driver’s function change from the driving behavior to riders’ function, the user experience and interaction design focus on the vehicle data and system service. In human vehicle interaction, the way of voice interaction is simple to wake up, the level of interaction can be simplified, and it can directly reach the decision, alleviate the visual load and user perception easily. The following will reflect the design recommendations of the voice user interface from the state of emergency and safe driving.

In a safe running state, the user can liberate the hands for entertainment by taking the active control of an autonomous vehicle. Research has shown that the highly automated system reduces the participation of the driver, the driver into the role and function of passive monitoring vehicle auxiliary system from the active control of the role of the driver in low load state, easily out of control loop not to respond immediately to emergency situations [ 1 ]. The application of the voice user interface can ensure that the user can also understand the state of the autonomous vehicle without special attention. This is because compared with the visual channel, hearing is easy to attract people’s attention at any time, and has the advantages of fast reaction speed and unrestricted lighting conditions. VUI is suitable for use in an emergency dynamic driving context [ 2 ]. The information processing of the autonomous vehicle based on cloud computing and AI algorithm. It is not necessary for the VUI interaction design to present all the complex digital language to the end user, only need to run the state by visual and auditory feedback to the user, do an “informed but not overly burdensome” information communication in case of emergency, artificial intelligence can make the most rapid and accurate judgment, but due to the distribution of rights and responsibilities of the autonomous vehicles driving problem, the design needs to inform the current state of the user. At this time, the voice user interface should arouse the attention of users in the entertainment state by launching noise and lighting changes, and switch to voice and visual form to inform users of the current driving state.

4.2 The Future Trend of VUI

The development of artificial intelligence is becoming more and more powerful, and it is believed that in the near future it will be able to cross the current restrictions and become more intelligent. At present, artificial intelligence imitates human intelligence, which is mentioned in the previous article. In practice, artificial intelligence can also simulate the intelligence of other animals. In this new mental model, artificial intelligence can also be judged by the combination of multiple senses. Abe Davis researched on the visual microphone to passive recovery of sound from video [ 13 ], some blinders even use sound to enjoy the pictures by translation of image into sound. Autonomous vehicles can directly obtain the metadata from the environment to make decisions based on the detection of auditory and visual channels. There will be a great change in the voice interaction process of the human vehicle. The former pattern can be concluded that when AI is learning the thinking mode of animal intelligence, it can skip the recognition channel and perception process, directly from concept to recognition decision and cognition. In the old mental model, the information is translated to the user by machine recognition and then translated to the user secondly. In the new mental model form, the interaction process can be simplified. The autonomous vehicle, by identifying different driving concepts, does not need to inform the user of all process information directly. However, taking into account the responsibility and rights of the dangerous situation, the issue of responsibility and rights mentioned in the previous section needs a visual auditory feedback state for the Human-Machine Interaction.

Under the guidance of artificial intelligence, information can be extracted in a variety of sound confounding. Under the guidance of human intelligence, the dimension of sound and vision can be switched. GUI and VUI transforms according to different driving context with the support of HI and AI mental models. The use of artificial intelligence to hear the sound judgment of danger, the propagation of sound than can be used in artificial intelligence as the end, determine the object state by detecting sound or visual. Visual and auditory switching, which way should we take to acquire multiple voices, can also separate voice, and also can analyze the content of current nodes with visual wiretapping. The highest level of speech interaction can translate the language of the deaf and dumb people, VUI interaction design don’t need to collect and compute the users’ behavior pattern, finally understand the users’ intentions which are told or shown by users, but only set up the OD information to go straight forward to the goal.

5 Conclusion

The mental model is influenced by perception and cognition. Artificial Intelligence and Human Intelligence develop the different mental models separately. Mental model decides the strategy and principle planning. In the autonomous vehicle driving context, AI mental model and HI mental model make the hybrid decision making with the function of perception and cognition. The interaction design takes the HI and AI interaction mode into account based on the different mental models.

Autonomous vehicles liberate humans’ hands, eyes and mind off the dynamic driving context and support the Voice User Interface to deal with the users’ cognition redundancy. The tradition mental model roots from the sequence of algorithm to data to mode and to behaviors, which is based on Human Intelligence. The new mental model based on Artificial Intelligence encourages the second-person perspective to simulate not only the human like but also the other creatures’ perception and cognition modes, finally creates new thinking mode and mental model to do the interaction design. Mental model leads to VUI interaction design method beyond the tradition one and provides all kinds of possibilities for the interaction design.

Johnson-Laird, P.N.: Mental Models: Towards a Cognitive Science of Language, Interface, and Consciousness. Cambridge University Press, Cambridge (1986)

Google Scholar  

Kieras, D.E., Bovair, S.: The role of a mental model in learning to operate a device. Cogn. Sci. 8 (3), 255–273 (1984)

Article   Google Scholar  

Norman, D.A.: The Design of Everyday Things. Basic Books, New York (2002)

Johnson-Laird, P.N.: The history of mental models. In: Manktelow, K., Chung, M.C. (eds.) Psychology of Reasoning: Theoretical and Historical Perspectives, pp. 179–212. Psychology Press, New York (2014)

Wiki, 7 February 2017. https://en.wikipedia.org/wiki/Mental_model

Carey, S.: Cognitive science and science education. Am. Psychol. 41 (10), 1123 (1986)

Nakatsu, R.T.: Diagrammatic Reasoning in AI. Wiley, Hoboken (2009)

Book   Google Scholar  

Jih, H.J.: Mental models: a research focus for interactive learning systems. Educ. Technol. Res. Dev. 40 (3), 39–53 (1992)

Standard SAE: J3016: SAE international taxonomy and definitions for terms related to on-road motor vehicle automated driving systems, levels of driving automation (2014)

Shi, W., Alawieh, M.B., Li, X., et al.: Algorithm and hardware implementation for visual perception system in autonomous vehicle: a survey. Integr. VLSI J. 59 , 148–156 (2017)

Yuan, B., Xiao, B., Hou, Y., et al.: State-of-art and trend of speech interaction for mobile intelligent terminal. Inf. Commun. Technol. 2 , 39–43 (2014)

Duan, L.: Driver mental workload evaluation and application in driver assistance system. Ph.D., Jilin University, China (2013)

Lv, Z.: Human Engineering. China Machine Press, Beijing (2016)

Arons, B.: A review of the cocktail party effect. J. Am. Voice I/O Soc. 12 (7), 35–50 (1992)

Young, I.: Mental Models: Aligning Design Strategy with Human Behavior. Rosenfeld Media, New York (2008)

Waern, Y.: On the dynamics of mental models. In: 6th Interdisciplinary Workshop on Informatics and Psychology: Mental Models and Human-Computer Interaction, vol. 1, pp. 73–93. North-Holland Publishing Co., Amsterdam (1987)

Wang, Y.X., Guenther, R.: The cognitive process of decision making. J. Cogn. Inform. Nat. Intell. 1 (2), 7–85 (2007)

Liu, J., Liu, X.: The relationship between artificial Intelligence and human brain Intelligence. Research on Noetic Science in China 2011 Album, pp. 850–853(2012)

Wang, G.: Research on IR imaging guidance using fly’s visual system. Ph.D., Department of College of Astronautics Northwestern Polytechnical University (2003)

Liu, Q.: Human intelligence and artificial intelligence. Comput. Sci. 19 (2), 55–59 (1992)

Wang, L.: The relationship between artificial intelligence and human intelligence. Technol. Innov. Appl. 31 , 76 (2016)

Tian, Y.: Dictionary of Thinking Science. Zhejiang Education Publishing House, Zhejiang (1996)

Qin, J.: Impaction of artificial intelligence on interaction design. Packaging Eng. 38 (20), 27–31 (2017)

Davis, A., Rubinstein, M., Mysore, G.J., et al.: The visual microphone: passive recovery of sound from video. ACM Trans. Graph. 33 (4), 79 (2014)

Download references

Author information

Authors and affiliations.

School of Mechanical Engineering, University of Science & Technology, Beijing, People’s Republic of China

Yuemeng Du, Jingyan Qin, Shujing Zhang, Sha Cao & Jinhua Dou

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Jingyan Qin .

Editor information

Editors and affiliations.

The Open University of Japan, Chiba, Japan

Masaaki Kurosu

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Cite this paper.

Du, Y., Qin, J., Zhang, S., Cao, S., Dou, J. (2018). Voice User Interface Interaction Design Research Based on User Mental Model in Autonomous Vehicle. In: Kurosu, M. (eds) Human-Computer Interaction. Interaction Technologies. HCI 2018. Lecture Notes in Computer Science(), vol 10903. Springer, Cham. https://doi.org/10.1007/978-3-319-91250-9_10

Download citation

DOI : https://doi.org/10.1007/978-3-319-91250-9_10

Published : 01 June 2018

Publisher Name : Springer, Cham

Print ISBN : 978-3-319-91249-3

Online ISBN : 978-3-319-91250-9

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

ORIGINAL RESEARCH article

Linguistic patterning of laughter in human-socialbot interactions provisionally accepted.

  • 1 University of California, Davis, United States

The final, formatted version of the article will be published soon.

Laughter is a social behavior that conveys a variety of emotional states and is also intricately intertwined with linguistic communication. As people increasingly engage with voice-activated artificially intelligent (voice-AI) systems, an open question is how laughter patterns during spoken language interactions with technology. In Experiment 1, we collected a corpus of recorded short conversations (~10 minutes in length) between users (n=76) and Amazon Alexa socialbots (a voice-AI interface designed to mimic human conversational interactions) and analyzed the interactional and pragmatic contexts in which laughter occurred. Laughter was coded for placement in the interaction relative to various speech acts, as well as for phonetic patterning such as duration and voicing. Our analyses reveal that laughter is most commonly found when the content of Alexa’s speech is considered inappropriate for the discourse context. Laughter in the corpus was also largely short in length and unvoiced– characteristics which are commonly associated with negative social valence. In Experiment 2, we found that a separate group of listeners did not distinguish between positive and negative laughter from our dataset, though we find that laughs rated as more positive are also rated as more excited and authentic. Overall, we discuss our findings for models of human-computer interaction and applications for the use of laughter in socialbot conversations.

Keywords: Laughter, conversation analysis, human-computer interaction, production, Perception

Received: 29 Nov 2023; Accepted: 09 Apr 2024.

Copyright: © 2024 Perkins Booker, Cohn and Zellou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Mx. Nynaeve Perkins Booker, University of California, Davis, Davis, United States

People also looked at

IMAGES

  1. Voice User Interfaces (VUI)

    user research voice interaction

  2. Creating An Engaging And Realistic Voice For Your Voice User Interface

    user research voice interaction

  3. How to Set Up a User Research Framework (And Why Your Team Needs One

    user research voice interaction

  4. How-to Guide for a Flawless Voice User Interface Design

    user research voice interaction

  5. How to Build a User Research Culture

    user research voice interaction

  6. Voice Interaction

    user research voice interaction

VIDEO

  1. Conversational User Interfaces: Enhancing User Experience through Dialogue

  2. All Agents Interaction Voice Lines with other Agents (Part 6) #shorts

  3. Bing Dialog Model: Intent, Knowledge, and User Interaction

  4. Just Chatting In and About in StarCtiizen

  5. 5 Pioneering Approaches: How Generative AI Chat Assistants Can Use Private University Research

  6. u75plus Efficient and intelligent voice interaction experience

COMMENTS

  1. A Definitive Guide to Voice User Interface Design (VUI)

    Step 1: Conduct a User Research. Start off by understanding the interaction between the user persona and an assistant persona in various engagement stages by customer journey mapping. Focus on observing and understanding the needs, motivations, and behaviors of the user.

  2. How to Design Voice User Interfaces

    Voice user interaction may pose more of a challenge in some aspects than a graphically based system; nevertheless, it's fair to say that this mode will become more prevalent as more aspects of everyday life feature voice-controlled interaction. ... mobile UX design, usability, UX research, and many more! Download free ebook Go. A valid email ...

  3. What are Voice User Interfaces (VUI)?

    Voice user interfaces (VUIs) allow the user to interact with a system through voice or speech commands. Virtual assistants, such as Siri, Google Assistant, and Alexa, are examples of VUIs. The primary advantage of a VUI is that it allows for a hands-free, eyes-free way in which users can interact with a product while focusing their attention ...

  4. Designing a VUI

    Conducting User Research for Voice User Interface Design. ... In the scenario that a customer journey map has yet to be created, the designer should highlight where voice interactions would factor into the user flow (this could be highlighted as an opportunity, a channel, or a touchpoint).

  5. Voice Interaction Design: The Ultimate Guide

    Convenience: Voice interaction design enhances the convenience of using products and services, particularly for multitasking users, those on the move, or in situations where hands-free or eyes-free interactions are required. Efficiency: Adopting voice interaction design streamlines the usability of products and services by reducing the number ...

  6. A Definitive Guide to Voice User Interface Design (VUI)

    User research will help understand the interaction between the user persona and an assistant in different engagement stages. Aim to understand the needs, behaviors, and motivations of the user. The goal is to understand how you can use voice as an interaction method in the customer journey map.

  7. Voice UX design: designing for voice interfaces

    UX design for voice interfaces involves a few key steps, according to Mert Aktas at User Guiding. Step 1: User research. Observe where the voice assistant can help the user and understand the interaction between the user and the voice interface at various stages of customer engagement.

  8. Voice User Interaction

    Introduction to Voice Interaction. Voice User Interaction is the study and methods of designing systems and workflows that process natural speech into commands and actions that can be carried out automatically on behalf of the user. The convergence of natural language processing research, machine learning and the availability of vast amounts of ...

  9. Everything You Want To Know About Creating Voice User Interfaces

    1. Voice-first Design #. You need to design hands-free and eyes-free user interfaces. Even when a VUI device has a screen, we should always design for voice-first interactions. While the screen can complement the voice interaction, the user should be able to complete the operation with minimum or no look at the screen.

  10. Voice User Interface (VUI): Designing for Voice-Enabled Web Experiences

    Benefits of VUI Design. Enhanced User Experience: Voice interactions provide a more intuitive, natural, and conversational interface, resulting in a better user experience. It can simplify complex tasks, reduce cognitive load, and make interactions more engaging and interactive. Increased Accessibility: Voice-enabled interfaces make technology ...

  11. Design and Evaluation of Voice User Interfaces: What Should ...

    Voice user interfaces (VUI) come in various forms of software or hardware, are controlled by voice, and can help the user in their daily life. Despite VUIs being readily available on smartphones, they have a low adoption rate. ... Therefore, research on human-computer interaction (HCI) regarding AI-based voice technology is necessary because, ...

  12. What is Voice Interaction?

    Learn more about Voice Interaction. Take a deep dive into Voice Interaction with our course AI for Designers . In an era where technology is rapidly reshaping the way we interact with the world, understanding the intricacies of AI is not just a skill, but a necessity for designers. The AI for Designers course delves into the heart of this game ...

  13. Voice User Interface: Introduction, Benefits, and Trends

    The unique nature of this interaction demands that voice user interfaces be studied in depth, focusing on the key aspects that impact the user experience. ... UI/UX designers are conducting extensive user research and working on the localization of voice user interfaces to ensure that different languages, accents, and cultures can also reap ...

  14. How to Design Effective Voice Interactions in UED

    1. Know your users and their context. 2. Define the voice personality and tone. 3. Design the conversation flow and logic. 4. Optimize the voice content and quality. 5.

  15. User Research and Design for Voice Applications :: UXmatters

    The following experts have contributed answers to this month's edition of Ask UXmatters: "It's extremely important to do in-field, contextual research or ethnographic studies when doing user research for voice applications," advises Bob. "This is probably true for any new user interface or technology, but more so for voice.

  16. Voice Interaction for Training: Opportunities, Challenges, and

    Voice interaction presents considerable opportunity to enhance virtual, mixed and augmented reality training, including interactions between synthetic agents and trainees [1,2,3,4,5].It is projected that conversational platforms will drive the next big paradigm shift in how humans interact with the digital world and simulations [6, 7].Currently, conversational interfaces are mostly implemented ...

  17. How to Design and Test Voice User Interfaces

    Effective user interaction methods for voice-activated devices include clear prompts, natural language understanding, and concise responses. Designers should prioritize intuitive commands, provide ...

  18. Voice interaction and the future of User Interfaces(UI)

    As we look to the future, the continued evolution of voice interaction and the exploration of new interaction methods hold the promise of further enriching our digital interactions and redefining the way we engage with technology and in turn the world around us. Sources: 1. ICS.AI: The Evolution of Voice AI: A Brief History and Future ...

  19. Voice User Interface: Shaping the Future of Interaction

    Jul 24, 2023. Voice user interface (VUI) technology has recently become integral to device interaction. VUI often called a voice interface, is how humans interact with digital systems using voice ...

  20. The Investigation of Adoption of Voice-User Interface (VUI) in Smart

    Driven by advanced voice interaction technology, the voice-user interface (VUI) has gained popularity in recent years. ... Future research could conduct user studies to learn the usability issues with using VUI and generate guidelines for VUI development, which can further facilitate the adoption of VUI. 7. Conclusions.

  21. Sound as an interface, methods to evaluate voice user interface (VUI

    Results of this study demonstrate pain-points and delights for Quick Command and user comfort with voice interactions as part of earbud wear in private versus public settings. Evaluation methods can be replicated to validate future NLP advances and Assistant features before implementation in public facing applications.

  22. Voice User Interface Interaction Design Research Based on ...

    In human vehicle interaction, the way of voice interaction is simple to wake up, the level of interaction can be simplified, and it can directly reach the decision, alleviate the visual load and user perception easily. The following will reflect the design recommendations of the voice user interface from the state of emergency and safe driving.

  23. Voice and touch interaction: a user experience comparison of elderly

    Therefore, the aim of this study was to assess whether voice interaction improves the UX of elderly people when interacting with smartphones. An experiment was conducted, with 20 elderly people, from the combination of qualitative and quantitative research elements.

  24. Apple researchers develop AI that can 'see' and understand screen

    Apple's AI system, ReALM, can understand references to on-screen entities like the "260 Sample Sale" listing shown in this mockup, enabling more natural interactions with voice assistants.

  25. Linguistic Patterning of Laughter in Human-Socialbot Interactions

    Laughter is a social behavior that conveys a variety of emotional states and is also intricately intertwined with linguistic communication. As people increasingly engage with voice-activated artificially intelligent (voice-AI) systems, an open question is how laughter patterns during spoken language interactions with technology. In Experiment 1, we collected a corpus of recorded short ...