Multi-Modal InteractionInternship

Clinical Assistant

In a 12 week internship with M*Modal, I was fortunate enough to work on improving the use of voice-interaction in a multi-modality system, allowing physicians to improve the quality of clinical documentation and patient care, using only speech. Some details are excluded as per NDA requirements.

Role Overview

Research, Design, Testing


Summer 2018

Nov '17


12 weeks

Project Process

The key stages of the research & design work.

Initial Research Aims

Exploring potential for a natural, conversational system.

The burden placed upon the physician in having to switch between dictation and mouse use creates the need for natural, conversational phrasing with these actionable messages.

Research Objectives

  1. Identifying which types of phrases are natural for providers to use with existing actionable messages.

  2. How to make voice-enabled content more discoverable and intuitive, without the need for prior training.

Initial Study Plan

Evaluating the current CAPD system.

The plan centered around understanding how providers instinctively and naturally phrase their commands to resolve the current CAPD message designs, with messages differing in how many specifications are requested, and the format in which options (if any) are shown.

Preliminary Scenario & Prototype

The initial study plan was to have each participant adopt the role of a physician currently in the process of reviewing a list of specification requests made by the CAPD system, having just completed dictating a note from a patient visit, interacting with a mid-fidelity interactive prototype created using Sketch and InVision depicting the current CAPD system.

However, during a UX team critique session, it became clear that the study plan would be insufficient in meeting the research aims, due to the participant medical knowledge required to be able to comfortably interact with the messages, amongst other context-related issues.

Switch to Analogous Domain

Reducing need for participant medical knowledge.

With this in mind, I redesigned the study to be conducted in an analogous domain, removing the need for medical knowledge which would severely restrict the number of potential study participants. The study scenario was altered to have participants play the role of an individual looking to sell their car online through the creation of an advert listing all known vehicle details, now in the process of resolving system prompts that would make the advert more accurate in its description of the vehicle, potentially increasing the sale value.

Paper Prototype & Participant Task

To increase the flexibility of the study, I created a simple paper prototype (as seen below), depicting a user-created advert detailing the vehicle being sold, and on the right a sidebar where the specification requests would appear. Messages detailing these information requests were shown in cut out cards, which during the study, were laid down one by one by the study moderator, with participants phrasing their command to resolve the message, with a think-aloud protocol used to uncover any hesitancies with particular message types.


Studying phrases and key information.

16 company employees participated in the study, the audio recordings of which (around 15 minutes each in total) I transcribed and cleaned up using a semi-automated online service, in preparation for two separate analyses methods in line with the research goals of the study.

Phrasing Structures & Terminology

The key aim was to deliver a corpus of natural phrases used by participants to resolve each message type, such as one specification with options listed as inline text inside the message, two requests with options listed as radio buttons, select dropdowns, etc. This would then allow the Speech & Natural Language Understanding teams to build a greater set of commands that would register and act upon a user’s intent to interact with each message type.

The phrase used to resolve each message type was extracted from all 16 transcripts, and grouped according to that message. Due to the analogous domain used, the phrasing structure had to be broken down into individual components and restructured in a way that would allow it to be applicable to the messages of that particular formatting. Below is an example of this phrase restructuring method.


“Specify body type. Type is hatchback.”


[Specify] [body type]. [Type] is [hatchback].”


[Activation verb] [message title]. [Specification title] is [option].”

Over 120 different phrasing structures (and their related message design) were handed off to the necessary teams, with specific terms for interaction with GUI form elements, and common verbs used to tell the system which message to take action upon also identified.

However, the fact that around 10-15 widely differing phrasing structures were used for each message type, highlighted the uncertainty in participants in how exactly they should phrase their command to resolve each message.

Affinity Diagramming

With the inconsistent phrasing for each message type highlighting the potential of underlying issues with the current design of the messages, I set about extracting the units of interest from the 16 transcripts, to find commonalities throughout the participant sessions with regards to hesitancies dealing with messages, and the system in general.

With around 130 sticky notes of information emanating from the transcripts and my notes, I grouped these units into themes through affinity diagramming, identifying core issues relating to voice-interaction with the currently designed actionable messages.

10 distinct themes were created using the extracted units of information, which after writing up a description of each individual theme, were used to inform the key study findings.

GUI form elements
Control of document narrative
Not dealing with messages
Lack of obvious next step
Aids for further context
Lack of visual cues from system
Phrasing rhythm and methods
Card "activation" methods
Doubts about using voice
Facilitating natural, conversational phrasing

Key Study Findings

Driving future design and research.

The full details of the actual findings of the initial research are NDA-protected, but the following summarizes the key information taken forward to inform the subsequent work.

Overview of findings

  1. Potential for Phrasing Influence

    The design and terminology of the voice-actionable messages can heavily influence the phrased uttered by the user when interacting with it.

  2. GUI Form Elements

    Radio buttons, dropdown, etc. depicting possible answers cause hesitancy in voice interaction, and as a result, reduced confidence in resolving the message using voice.

  3. Narrative Control

    In situations where more detail is required, such as diagnosis updates, treatment plans, etc. users would wish to complete the change manually in the note so that the narrative of their document would remain consistent.

  4. Visual Feedback

    It is absolutely crucial to inform users of command success or failure, as well as possible next steps, through visual system cues.

  5. Conversational Response

    When a single message appears in front of users, a Q&A style response was invoked, with less likelihood of telling the system which message to interact with as it is the only message currently displayed.

  6. Missing Context

    If users have questions as to why a message has appeared, what can be done etc., it's likely that an unsuitable command will be phrased, or no action will be taken at all.

Three Key User Questions for Context

Revisiting the issue of current issues surrounding user confusion when a message currently appears, the team devised three key questions a user subconsciously requires answers to to successfully understand and take action upon a message using their voice.

Why am I being presented with this message?

What is it about the patient or the note that has led to this prompt?

What can I say to resolve this message?

How exactly do I need to phrase a command so that this message will disappear?

Where will this change appear in my note?

What about my documentation will change if I take action upon this message?

Revised Research Scope

Taking advantage of the time remaining.

After consultation with the UX team, we decided that the findings of the initial study provided a solid foundation on which the CAPD actionable messages could be redesigned to make voice-interaction with them not only more discoverable but also more intuitive.

With 5 weeks remaining of my internship, I set about providing tangible recommendations for the team to take forward in redesigning the CAPD messages to be more intuitively voice-actionable moving forward.

Redesigning the Messages

For discoverability and ease of voice-actionability.

The sporadic, inconsistent phrasing used by participants to resolve each message type exposed a lack of guidance from the GUI in how one might confidently go about interacting with the message by voice. I used the findings from the initial study to redesign the messages to signify a greater affordance of voice-actionability with the messages, with continual feedback from the UX team to inform design iteration, with a focus on how the visual design of a message can actually influence the phrasing a user will vocalize to resolve it.

Please note: for NDA reasons, the redesigned voice-actionable messages are not included.

Aims for Redesign of Voice-Actionable Messages

  1. Providing Context

    Offering a simple indication of why a request for specification has been made, and also where in the patient note a change will take place if action is taken.

  2. Speech Guidance

    Using design and in-message terminology to influence users as to how a direct interaction with messages can be phrased.

  3. System Feedback

    Provide visual cues to inform of a successful or unsuccessful command, and offering possible next steps.

Wizard of Oz Study

Testing the redesigned voice-actionable messages.

Given the need to test the performance of the redesigned voice-actionable messages, the second study was somewhat more elaborate than the initial phrase gathering study, both in planning and implementation.


The team decided that a task placed within the context of the medical domain was suitable if the need for medical knowledge was significantly reduced. Participants were placed in the role of a physician dictating the note for a patient visit they had just completed. Participants were given a profile detailing the patient, an elderly gentleman named Mr. Shankly, with hand-written notes on the back depicting memos made by the physician during the visit. They were also given a script to use to dictate the note, and while they were not asked to stick to the script, all 18 participants chose to (significantly aiding the test implementation).


The study room was designed to mimic a typical dictation room in a hospital, that a physician may find themselves in when dictating a note for a patient visit. The participant was placed in front of a monitor, as well as a dictaphone, mouse, and keyboard. The monitor was hooked up to a laptop directly behind, where a researcher would manipulate the prototype to act upon any participant vocal commands or dictation. As the moderator, I positioned myself next to the participant, from where I would brief and debrief them, as well as provide assistance throughout the study should it be required.

High-Fidelity Prototype

Using HTML, CSS, and jQuery, I developed a prototype depicting the existing CAPD system, but with the newly redesigned voice-actionable messages, and a dummy Electronic Health Record on the left-hand side holding the patient note. As participants dictated the script for the note, a simple button press by the researcher acting as the computer would simulate one sentence of dictation. In some cases, an event would fire that would show a request for specification in the CAPD sidebar. When participants chose to resolve the messages, the “computer” manipulated the messages to provide visual feedback, acting upon the participant phrasing. Some participants were completely unaware that the prototype was being controlled by another human, instead of a functional system reacting to their speech.

Study Outcomes

Evaluating the redesign performance.

With 18 participants taking part in the study, all recordings were again transcribed for analysis, although with only a small amount of time remaining in my internship, the analysis process was afforded considerably less time in comparison to the first study carried out.

Redesigned Message Performance

  1. Reduced Phrasing Distribution

    The phrasing used by participants to resolve each actionable message was far more consistent and uniform, with only conjunctions and minor differences between the vast majority. This is likely due to the redesigned messages giving guidance on how to phrase a response to each message.

  2. Greater Contextual Understanding

    In comparison to the initial study, participants were far more aware of what exactly they must do in order to resolve the message in front of them, in part due to understanding what about their note had led to the message being displayed, and how exactly their note would update if they were to take action upon the message.

  3. Voice Discoverability Issues

    Unfortunately, over half of the participants first chose to use the mouse when interacting with the first message they saw, before being asked to use their voice to resolve the remaining messages. While it is promising how easily all participants found it to phrase a command to resolve each message, and the uniformity of those phrasings, the design must be revisited to ensure that physicians without prior training know that they can use their voice to interact with these messages.

Other Areas for Consideration

  1. Showing of Specification Options

    By showing potential options for a specification, such as the type of fracture, for example, there might not only be the issue of large amounts of screen space being taken up, but also the possibility of leading physicians down a path by only showing a subset of possible answers.

  2. Greater Message Capabilities

    While the redesigned messages cater predominantly to situations where only short specifications are required, there may be cases where free speech or text entry would allow for narrative control. For example, allowing physicians to provide information on a diagnoses update, a follow-up treatment plan, etc.

Internship Recap

Summarizing the experience and deliverables.

The greatest challenge of working in a domain as complex and unfamiliar to myself as US healthcare, particularly the case-by-case challenges posed by individual medical institutions and physicians possessing drastically different workflows to account for, as a result of different systems implemented in each hospital.

Internship Deliverables

  1. Phrase Gathering

    Offering short term aid to voice-interactions with the current message design, providing the relative teams with a large corpus of phrases naturally used to resolve each message to increase the likelihood of successful interaction.

  2. Design Recommendations

    Prototyping, testing, and validating redesigned voice-actionable messages, which moving forward, will offer components through which voice-actionable messages, and other content, can be more discoverable, intuitive, and efficient to physicians.

  3. Future Research

    The findings from both studies will provide a foundation for future research efforts in seeking to build more natural, conversational interaction possibilities for physicians, both in GUI and VUI.

The experience working as a member of the close-knit UX team at M*Modal was invaluable, particularly in terms of mentorship offered in differing areas of expertise from every member, is one that I cannot thank them enough for, as well as countless others from other product teams who offered advice, and research participants.

Without a doubt, the most rewarding aspect of the internship was the opportunity to be involved in such impactful work, not only making the day-to-day lives of physicians easier, but in turn the quality of care provided to their patients.