Shore '00: Student HCI Online Research Experiments

University of Maryland

Abstract
Introduction
Experiment
Results
Discussion
Conclusions

Acknowledgements
References
Appendices
Credits
Feedback

Back To Main

Comparison of Telephone Menu Interfaces

Introduction

Telephone menu interfaces are becoming increasingly more popular in settings where automated caller assistance can reduce the cost of employing human operators. In addition, services such as access to voicemail, email, news, and information are being provided via the telephone. Telephones are the most "ubiquitous, best-networked, and simplest computer terminals available today." [1] Their limitation, though, is that they provide only sound for output and only sound and twelve buttons for input, necessitating optimal menu design during the creation of phone-accessible services. Experimental results have demonstrated that consideration of the efficiency and completeness of prompts, efficiency of navigation, user habit adaptability, and novice-expert user differences will lead to an optimal system in terms of user efficiency and satisfaction. We will be considering these issues in the context of a comparison between two email-by phone service providers, Shoutmail and Coolemail.

Complete prompts are necessary because users of voice menu systems do not expect that they will need training to use them. But as more complex capabilities are added, development of prompt structures becomes increasingly challenging. Users must make a sequence of decisions based on audio feedback. The information for these decisions should be available at appropriate points. For example, when novice users decide to save a message that they've just heard, prompts for saving messages should be immediately available.

The issue of prompt efficiency is an important one since users want to hear only that information that helps them achieve their goals. In screen based systems, users have the option to scan for relevant information. In temporarily presented voice interfaces, such scanning is usually not possible. We have investigated some scanning mechanisms, such as Skip and Scan [2], but have also concluded that standard prompts should provide information as efficiently as possible.

Subjective satisfaction depends heavily on efficiency of navigation since frequent users of voice systems show some consistency in the paths they take through the voice menus. For these users, it is important that the paths require minimal keystrokes and a minimal number of decisions. For example, users should not be required to go through irrelevant menus to reach often-used functions. All users, in addition, have established patterns for phone use. They do not, for example, expect to insert delimiters in telephone messages. These needs should be identified to take advantage of automatic responses.

The needs of the novice and the expert are very different. The novice does not have the knowledge of procedural details to navigate through the system. Prompts must provide specific instructions for guiding each step. The expert, though, is familiar with the system structure and needs only simple confirmations for frequently-used functions. For less frequent functions, like changing system options, experts do need easily obtained instructions.

In addition to the issues mentioned above, our involvement with email-by-phone services introduces the problem of optimal speech synthesis techniques. In order to retrieve email, the user must enter an ID and password, then proceed to the inbox where subject headers are read by a computer-generated voice. The quality of this synthesized speech in terms of pronunciation, playback speed, and clarity strongly influences user satisfaction. Michaelis and Wiggins [3] suggest that speech generation is preferable when the

1. message is simple.
2. message is short.
3. message will not be referred to later.
4. messages deal with events in time.
5. message requires an immediate response.
6. visual channels of communication are overloaded.
7. environment is too brightly lit, too poorly lit, subject to severe vibration,
    or otherwise unsuitable for transmission of visual information.
8. user must be free to move around.
9. user is subjected to high G forces or anoxia.

These ideal situations suggest that accessing email by phone is useful only occasionally since with computer access the number of steps required to retrieve a message, the load on the short-term memory of a user, and the difficulty of comprehending the message are all increased. The quality of voice output is improved when pre-recorded human speech is played back at appropriate times, but this is not possible when the reading of dynamic content such as email is required.

Synthesized speech, due to its machine-like character, requires more of the user in terms of cognitive processing. The well-known paper by George Miller, "The magical number seven-plus or minus two," pointed out the limits people have for retaining information [4]. People have the ability to recognize seven chunks of information and maintain it in short-term memory for 15 to 30 seconds. The size of the chunks depends on an individual's familiarity with the information. If we apply this theory to the Shoutmail and Coolemail systems we recognize the potential difficulties new users may have with learning the navigational commands for a telephone menu system by listening to unfamiliar prompts, interpreting synthesized speech, retaining the output in short-term memory, and possibly performing some unrelated task at the same time.

The telephone menu interface, and specifically its use in voice messaging systems, is not new. Its limitations are being overcome with the use of speech recognition for content-based navigation, intelligent filtering and prioritization, and efficient presentation of options [5]. This is evidenced by related work in the field.

Phoneshell [6] offers telephone-based access via touch-tones to incoming messages as well as rolodex, calendar, news, weather, and traffic. Since many users receive dozens of messages a day, Phoneshell supports rule-based filtering to group messages into categories such as "important" and "mass mailings." The user is principally restricted to sequential navigation, however-either reading the next message or the previous one, and many find it tedious to process a long list of messages.

Chatter [7] uses speech recognition to allow the user to retrieve email messages, send voice messages, look up information in a personal rolodex, place outgoing calls, and ask the location of other Chatter users. Messages were presented in order of relevance based on the user's past usage. Chatter used a sophisticated model to track the conversation, although it did not handle recognition errors.

SpeechActs [8] combines the conversational style of Chatter with the broad functionality of Phoneshell, offering a speech interface to mail, calendar, stock quotes, and weather forecasts. SpeechActs improves upon sequential navigation by allowing the user to access messages by number (e.g., "read message 17"). Recognition errors are addressed by SpeechActs, which explicitly verifies requests that are irreversible (e.g., "delete message") and offers progressively more detailed assistance when the system fails to understand the user.

The Wildfire electronic assistant is a commercial system that screens and routes incoming calls, schedules reminders, and retrieves voice mail (but not email). Wildfire allows the user to sort messages, but navigation is chiefly sequential.

Phoneshell, Chatter, SpeechActs, and Wildfire all provide remote access to messages, but none of them offers interaction comparable to what users enjoy with a GUI mail reader. Phoneshell and Chatter prioritize messages but neither summarize them nor allow random access. SpeechActs scans message headers and allows the user to pick out a message by number, but remembering the number of a message adds to the user's cognitive load. Wildfire allows the user to ask if there are messages from people in the rolodex but does not summarize incoming messages and allow users to pick from them at random.

The rising popularity of voice messaging systems justifies additional research into methods of helping users navigate these services efficiently. Previous research provides suggestions for the optimization of many areas of these systems. This study attempts to quantify the navigational efficiency of two telephone menu systems on the time required for users to retrieve a specified email message. Furthermore, we will analyze user satisfaction with these systems in terms of navigational efficiency and quality of computer-generated speech output.


Department of Computer Science: Direct questions and comments to the student editorial team

University of Maryland