SHORE 2001 Logo
SHORE 2001 Logo University of Maryland Logo
Student HCI Online Research Experiments
 
Abstract
Introduction
Experiment
Results
Discussion
Conclusions
 
Acknowledgments
References
Appendices
Credits
Feedback
   
 
SHORE 2001 : Handheld Devices :Which is Faster and More Accurate on a Handheld: Graffiti or Keyboard Tapping?

Authors
 
 
Necip Fazil Ayan nfa@umiacs.umd.edu
Burcu Karagol-Ayan burcu@cs.umd.edu
David Kuehnert davknrt@wam.umd.edu
Amit Thakkar athakkar@wam.umd.edu

Abstract

The number of users of handheld computers has been increasing rapidly for the recent years. The most common usage of a handheld is the entry of new addresses or things-to-do or other small notes. Thus, input entry into a handheld device has become one of the important issues. Two major input entry methods are handwriting using Graffiti alphabet, which is a hand stroke based handwriting recognition system, and tapping on a soft keyboard. In this project, we aimed at comparing Graffiti and keyboard tapping while doing a common task, which involves entering alphanumeric characters and special symbols and switching between keyboards during keyboard tapping. The purpose of this experiment was to see which input entry method is faster and more accurate than the other and to observe the pattern of learning for both methods. Experiments applied to 15 subjects produced statistically significant results. The analysis of the data showed that using keyboard tapping yielded faster and more accurate results in both the initial and later use. Despite its poor performance in the initial use and high number of errors, using Graffiti became much faster as the usage time increased. The learning curve suggests that experienced users may perform faster with Graffiti than keyboard tapping. 

Introduction

With the rise in technology over recent years, users have become accustomed to having all of their data readily available to them at all times.  In order to facilitate this, the handheld computers were introduced in early 90s and they have been very popular in recent years. They are very convenient in the applications that only require pointing and selecting, which can be performed by a stylus - a computer pen. On the other hand, the most common usage of a handheld is the entry of new addresses or things-to-do or other small notes. In the applications requiring data input such as entering address information, the problem of data input has arisen because of the small size of the handhelds. Although full size keyboards can be connected to the handhelds, it is not so practical and not available to everybody. 

To address this issue, the manufacturers of handheld computers have created two main data entry methods: 

  • Soft keyboard, which is displayed on a small portion of the screen of the device and works by tapping the stylus on the characters
  • Graffiti, which is a handwriting recognition method and accepts handwritten characters that are written on a designated location on the handheld device and converts them into ascii characters. 
The figures below (taken from Palm Handbook) present screen shots from a Palm Pilot IIIxe, showing keyboard and Graffiti features.

Both input methods have their own unique advantages. Unfortunately both have their drawbacks as well. The main advantage of soft keyboard tapping is its similarity to an external keyboard: Users with a strong familiarity with keyboards usually feel comfortable when they use keyboard tapping.  On the other hand, it has three major disadvantages: 

  • Users can easily tap an incorrect character because of their close proximity to each other. 
  • The keyboard covers nearly 40% of the screen space, which is already small, and requires extra scrolling.
  • It lacks of kinesthetic feedback and the inability to have a reference point  [13]. Hence, visual contact with the on-screen keypad must be maintained during entry. 
To overcome the difficulties posed by soft keyboard tapping, handwriting recognition methods have become popular. Blickenstorfer [2] analyzes 18 handwriting recognition systems and Gibbs [3] compares 13 handwriting recognition methods extensively. One of the most popular handwriting recognition systems is Graffiti [1]. It is also similar to a keyboard in a sense that the input is character by character and it requires special modes for uppercase letters and special characters. As stated in [6], the major advantage of Graffiti is that it mimics the Roman alphabet as closely as possible while trying to preserve single stroke handwriting philosophy. This allows the users to write characters fast once the user is familiar with the alphabet. Figure 1 presents the alphabetic characters in Graffiti handwriting, where the black dot represents the beginning point. One other advantage is the elimination of need to look at the screen while writing, which also helps fast writing. However, it is still a language to be learned although it resembles the usual handwriting. The other problems are the accuracy of the recognition and the retention time of learning especially special characters. 

Graffiti Alphabet
The graffiti alphabet [MacKenzie97] 

Overview of Previous Experiments 

There has been a great deal of research conducted in the area of comparison of different input entry methods on handheld devices. 

In [4], three methods of character entry on pen based computers, namely hand printing, ABC keyboard tapping and QWERTY keyboard tapping, were compared for speed, accuracy, and user satisfaction. The ABC tapping had the lowest error rate of the group, at 0.6%, and also the slowest entry rate at 12.9 wpm. The QWERTY tapping had the fastest input rate, at 22.9 wpm, and the lowest error rate, at 1.1%. The hand printing method had an input rate of 16.3 wpm, and an error rate. The user satisfaction surveys showed that QWERTY tapping is the most preferred one while the least preferred was the ABC tapping method. 

In [8], six different soft keyboard entry methods were tested to find the speed of those input entry methods. The average number of words per minute was 20.2 for QWERTY, 10.7 for ABC, 8.5 for DVORAK, 8.0 for Fitaly, 8.0 for telephone and 7.0 for JustType. 

In [6], the learning speed of Graffiti was measured. The subjects were tested after one minute of studying the Graffiti reference chart, five minutes of practicing with Graffiti, and after one week without practicing the graffiti input method. The accuracy rates were 86%, 97%, and 97%, respectively, which showed that the learning speed is quite fast for Graffiti. 

In [9], two methods of numeric entry on pen based computers were tested, namely handwriting and pie pad. The study attempted to measure the learning speed and accuracy of two methods. Error rates did not significantly change, but the entry speed did: Speed by handwriting increased by 11%, while the speed by the pie pad increased by 52%. Initially, handwriting was the faster entry method; however at the end of experiments, the pie pad method was 24% faster than handwriting. The subjective surveys showed that the pie pad method was preferred to the handwriting. 

In [7], two different handwriting recognizers, Microsoft character recognizer and CIC's Handwriter 3.3, were tested. The two methods were tested based upon an input text of only lower case letters and also an input text of both upper and lower case letters. The results of the study showed that certain characters were misinterpreted significantly more often than others, and also that the observed accuracy was lower than that claimed by the developers of the products. 

In [5], several methods for entering alphanumeric data to pen based computers were examined. The input entry methods included hand printing, tapping on a soft keypad, stroking a moving pie menu, and stroking a pie pad for numeric entry, and were hand printing, tapping on a soft QWERTY keyboard, and tapping on a soft ABC keyboard for text entry. For numeric data, soft keypad yielded the fastest and most accurate results (30 wpm, 1.2% errors) while the moving pie menu gave the slowest and more error prone results (12.4 wpm, 16.4% errors) for numeric entry. For the text input, soft QWERTY keyboard tapping was the quickest, at 23 wpm and most accurate with 1.1% errors. Hand printing was slower at 16 wpm and had a higher error rate, at 8.1% errors. Finally, tapping on the soft ABC keyboard was the most accurate, at 0.6% errors, but also had the slowest input rate, at 13 wpm. 

In [11], a theoretical model was presented to predict the upper and lower bounds for text entry rates using a soft QWERTY keyboard on a pen based computer. The model was based on the Hick-Hyman law for choice reaction time, Fitts' law for rapid aimed movements, and linguistic tables for the relative frequencies of letter pairs, or digrams, in common English. The model predicted that a typing rate of 8.9 wpm can be achieved for novice users and 30.1 wpm for expert users of the soft QWERTY keyboard. 

Experiment

Overview and Variables

There have been many studies comparing Graffiti handwriting and keyboard tapping in terms of speed and accuracy. However most of these studies have been focused on the comparison of speed and accuracy for either only alphabetic characters or numeric values. Moreover, it was always assumed that the users enter all those information on one screen. However, in most of the applications, the input is a combination of those along with punctuation symbols. They also switch between screens or make scrolling to locate some other information. In this project, we aim at comparing Graffiti and keyboard tapping while doing a common task, which involves entering alphanumeric characters and special symbols and switching between keyboards during keyboard tapping. We will concentrate on entering the address information of a person (name, phone number and home address each in different fields). 

The purpose of this experiment is to see which input entry method is faster and more accurate than the other and to observe the pattern of learning for both methods. We are mostly interested in how the time spent entering text enhances or reduces the speed and accuracy of writing. 

There are two independent variables in our experiment. The first one is the input entry type which has two treatments: Soft keyboard tapping (QWERTY keyboard) and Graffiti handwriting. The second independent variable is the number of trial blocks. It will have 4 treatments showing the degree of learning. In each trial block, the subjects are asked to write a specific number of addresses. 

There will be three dependent variables: Time for correct completion of the task, the percentage of errors encountered and a subjective satisfaction survey. 

Our hypothesis is that for novice users, keyboard tapping is faster and more accurate than the Graffiti handwriting. However, as the experience of the users increases, using Graffiti will lead to a faster entry of text while better accuracy is still achieved by keyboard tapping. 

Pilot Test Results

After we conducted pilot tests on 4 subjects, we decided to make the following changes in our experiment. 
  • Reduce the number of addresses in each trial block from 5 to 3: The first two experiments showed clearly that the time required for entering 5 addresses for each trial block is too much for the subjects. Writing with Graffiti took about 1.5 - 4.5 minutes and using internal keyboard tapping took 1.5 - 3 minutes. 
  • Reduce the size of each address: We decided to remove some of the fields in the address to shorten the length of the experiment. The original addresses included e-mail addresses and two phone numbers, and we decided to keep only one phone number instead of them.
  • Change the addresses to have a uniform distribution of characters in each trial block: We decided to keep the number of characters, number of digits and number of punctuation symbols in each trial block same for a better and more accurate comparison. 
  • Give one address at a time to avoid confusion instead of all addresses on the same trial block at once: Firstly, we proposed to give all addresses in each trial block on the same paper. The pilot tests clearly indicated that this may confuse some of the subjects. Instead of concentrating on entering data on the Palm Pilot, they spent some time on locating what to write on the paper, which in turn has a negative effect on comparing data entry methods.
  • Change the format of the addresses: The pilot tests showed that the format of the addresses we proposed previously is inappropriate for this experiment. The order of the entries for each address (first name - last name, address, city - state - zip code, e-mail and phone number in each line) distracted the users more than we expected. For example, they mostly entered the first name in the field for last name. To reduce the effects of the time spent in locating which information to put on each field on the total data entry time, we decided to give the addresses in the exact format which they will be entered on a Palm Pilot (i.e., in the same order as on the Palm Pilot and specifying the field name for each information).
  • Counting the errors and measuring the time for correct completion: The number of errors should be counted to avoid faster handwriting with lots of errors in it. For a better analysis, we decided to categorize the errors into capitalization errors, alphabetic characters instead of digits or vice versa, replacement, insertion, deletion, and transposition errors. We will also try to count the number of errors for each character (what to write vs. what is written). The time to enter each address will be measured after correcting all errors in the entry, i.e. correct completion is a requirement. We also decided to remind the subjects whenever they make errors but leave the decision of when to correct the errors to them.
  • Reduce the effects of tiredness and boredom: After completing second block, we observed that the subjects may get tired and bored of the task. So, after completing each trial block, each subject will be allowed to rest for some time (probably 1 minute)

Subjects

We conducted our experiments on 15 subjects. All but one of them are students in University of Maryland. The major concern about the selection of subjects is that they have not used a handheld before and do not know Graffiti at all. 

The distribution of subjects with respect to demographic properties are as follows: Of the fifteen subjects, 9 of them are male and 6 are female. 11 of them is between ages 20 and 30, 3 of them is below 20 and 1 of them is between ages 40 and 50. 10 of them are students or professionals in computer science while 5 of them are not. 9 of them are using glasses or contact lenses while 6 of them are not. The average rating for level of keyboard usage is 6.2 out of 9 and the average rating for Graffiti knowledge is 1.2 out of 9 (1 represents no knowledge and 9 represents a strong familiarity). 

Conducting Experiments

We asked the subjects to enter 4 sets of addresses into the address book of a Palm Pilot. Each set consisted of 3 addresses. They entered all addresses first using Graffiti and then the keyboard tapping, or vice versa for eliminating a bias towards one of them. One example address is as follows: 
Last name Maxfield
First Name Paul
Home (240) 698-3571
Address 594 Lovers Ln. Apt 16
City Bethesda
State MD
Zip Code 20378
While preparing the address set for each trial block, we took care of a uniform distribution of characters in each trial block. All 4 trial blocks consisted same number of total characters (190), same number of capital letters (25), same number of digits (60) and same number of punctuation symbols and spaces (28). The total set of addresses and the distribution of characters in each trial block can be found in the Appendices. 

The subjects were chosen assuming that they have not used Palm Pilot before and have no knowledge of Graffiti or keyboard tapping. Therefore, we trained the subjects about 

  • how to enter an address in the address book
  • how to write using Graffiti
  • how to write using soft keyboard tapping
This training session is meant to be an introductory session. The subjects are allowed to enter all the characters in the addresses once. The purpose of the experiment is to observe the learning curve for both input entry methods so we kept the training session as short as possible. 

The subjects were asked to sign an informed consent form and fill out a background survey for only statistical purposes, which can be found in Appendices. Then, they were asked to enter 12 addresses correctly and completely as they appear on the address sheet given to them. In other words, all fields must be entered into the correct fields on the address book and all spelling errors must be corrected to finish an address. Once they finish writing all addresses using one input entry method, they repeated the same process for the other method. While using one input entry method, they were not allowed to use the other one. 

In the first two trial blocks, they had lots of difficulties remembering how to write each character using Graffiti. Thus, they were allowed to use a quick reference guide to Graffiti whenever they needed it. 

We measured the time for completing each address correctly and the number and type of errors they made. In order to finish the task, they are expected to write the addresses exactly as in the address sheet given to them. We classified the number of errors as replacement errors, capitalization errors, character vs. digit errors, missing letters, insertions and transpositions. The subjects are warned to correct the mistakes they did not realize while writing them. 

After writing all addresses, the users are asked to complete a subjective satisfaction survey, which can also be found in Appendices. 

Since we observed the learning curve of two data entry methods, the experiments took lots of time with respect to other experiments. The time for completing all tasks for each subject is about 1-1.5 hour. That is the reason we could not conduct the experiments on more subjects. One other major problem we encountered was the difficulty of measuring the number of errors and the type of errors. 

Results

The raw data, which can be found in Appendices, lists the time for correct completion and the number of errors for both input entry methods for all subjects. We, therefore, have four sets of numbers: Graffiti speed, keyboard speed, Graffiti errors and keyboard errors. 

The mean and standard deviation for the speed and number of errors in each trial block for both methods are given below: 

The mean speed and error number presents the learning curve of both methods. This can be seen clearly on the following graphs, which show the distribution of speed and number of errors in each trial block for both data entry methods. 

The Entry Speed graph displays the average entry time for a single address within a given trial block.  This average was obtained by taking the subjects' mean entry times, within a trial block, and then averaging them.  The black bars at each point on the graph display the standard error.  This error is very small on the Keyboard graphs, due to low variance, making these bars difficult to see. 

Similarly, the Entry Error graph displays the average number of errors occurring within a single address entry, within a given trial block.  Once again, the average was obtained by taking the subjects' mean number of errors, within a trial block, and then averaging them.  Black error bars display the standard error. 

While the Graffiti entry and error numbers take a dramatic drop, the keyboard numbers are fairly constant.  Overall, the graphs display a convergence, in both the number of errors made and the number of seconds taken to enter an address, over time. 

The gulf in entry times in the first trial block is approximately 135 seconds.  This reduces to a difference of only 31 seconds in trial block 4. The standard deviation of the Graffiti entry times also decreases at a higher rate. These same patterns are also apparent in the number of errors made. 

To determine the statistical significance of this data, a two-way analysis of variance was employed since we have two independent variables having 2 and 4 treatments, respectively.  Two two-way ANOVAs were performed using Microsoft Excel 2000, one on the entry rates and another on the error rates. 

The important statistic in the ANOVAs is the P-value for Columns.  This value indicates whether the numbers in the Graffiti and Keyboard times vary too much in relation to each other. A P-value of less than .05 indicates that the probability of the results not being related. As shown in the following figures, it is less than 5%.  This is the proof of statistically significant results. The results of these ANOVA tests are as follows: 

Discussions

The values of both P-values in the Columns row are far below 5%, thus proving that the data used in this project is statistically significant. Now that the validity of the statistics has been established, and the meaning of the statistics explained, one may move on to an interpretation of these numbers. 

The previously noted convergence of the graphs is indicative of the learning curve involved in using Graffiti for data entry.  It is our theory that, though Graffiti takes longer to learn, it is eventually faster than the keyboard for data entry.  Though our data does not definitively support this, it certainly shows a trend toward this model.  The rapid descent in entry times, and the decrease in the corresponding standard deviation, supports our theory that the speed of Graffiti usage increases rapidly over time, for the majority of users.  The trend indicates that, given more trial blocks, tests would show that the entry speed of Graffiti surpasses that of the keyboard.  This hypothesis is further supported by the relatively stagnant entry times reported for keyboard usage. The growing overlap on data ranges, over time, is also of note.

The error rates, as expected, are continuously higher for Graffiti. There is a substantially higher probability of errors while using Graffiti, due to natural variance in hand motion and the software's ability to interpret those motions.  The keyboard interface eliminates these variables and provides a much more structured interface, thereby proving much less error prone. Though the number of errors for Graffiti decreases, it does not approach the Keyboard numbers as quickly as Graffiti entry times close the speed gap.  For these reasons, as supported by the data, we expect that Graffiti will always involve more errors than Keyboard entry. 

As previously stated, our results do not show Graffiti entry times crossing below keyboard times on the graph. Our graphs do, however, show a statistically significant trend toward the probability of that crossing with further testing.  We were unable to perform further testing due to the length of time involved in performing these tests.  Further experimentation is suggested over a longer time period where testing can be performed on a daily basis, instead of a contiguous 1-2 hour period. 

In addition to these statistical results, the user satisfaction surveys highlighted a number of notable items.  On a 1 to 10 scale, the difficulty of using Graffiti was rated 4.4 on average, as opposed to a 1.5 for keyboard usage.  All subjects found the keyboard easier in the first trial block, but 43% of subjects rated Graffiti as easier by the last trial block.  Subjects stated that they achieved comfort with Graffiti and Keyboard usage at approximately the same trial block.  50% achieved comfort with either interface by the 2nd trial block, with the majority of the rest attaining comfort level in the 3rd trial block.  Ease of error correction was evenly split.  In the end, entry method preference was evenly divided between the two methods. 

Subject satisfaction relies on the features and usage experience of an interface.  Subjects enjoyed the simplicity of the keyboard interface and its low error rates.  The "computer like" keyboard appeared to be very intuitive to subjects, offering a small or no learning curve.  This accessibility is a great advantage to new users of the Palm Pilot. Additionally, the keyboard interface limits user input to a strongly defined set.  This reduces the error rate and associated frustrations. 

There were also several complaints about the keyboard interface.  Primarily, these complaints centered on the small size of the keyboard, and the need to constantly switch back and forth between displays.  The keyboard method involves a lot of switching between entry areas and different keyboards.  Subjects found this switching to be distracting and noted that the keyboard interface seemed slower because of this. Two problems spots were commonly notes. These were the precise manner in which the 'P' and scroll arrow keys had to be pressed to register correctly. 

The features that most attracted subjects to Graffiti were its continuity and similarity to handwriting. Subjects found that being able to "write" quickly using Graffiti caused them to perceive the time taken to enter data as faster.  Also, because all characters were constantly accessible, on the same screen as the data entry area, subjects weren't constantly context switching. Another common stated advantage of Graffiti is the fact that there is no need to fix the eyes on the screen during writing, which facilitates the writing using Graffiti. 

There were a number of complaints about the Graffiti interface. Most complaints dealt with the high error rate caused by characters being entered incorrectly.  A major issue is the counter-intuitive shape of some of the characters. Some, like the 'T', do not resemble their standard hand-written counterparts. Variations in handwriting style can also make character entry difficult.  Many subjects noted difficulties with capitalization and punctuation.  All these are sources for the high error rates witnessed in Graffiti and the level of irritation present in its learning curve. Common error characters are:  V, T, 4, L, E, Q, K, N, Y, 9, P, G, X and the parentheses and capitalized letters. We observed that during the initial use of Graffiti, the characters which require more than 2 strokes (such as B, D, P) frustrated the users because this situation caused two letters appear on the screen. This is because the Graffiti handwriting recognition system perceives each stroke as one letter and whenever the users lift their hands off the screen, it is perceived as one character. A few subjects took a lot of time to discover this and made lots of mistakes until they got used to it. 

Conclusions

In this project, we conducted an experiment to compare the speed and accuracy and to observe the learning curve of Graffiti and keyboard tapping during a common task on handhelds. Using 15 subjects with no knowledge about Graffiti and keyboard tapping, we measured the time and the number of errors while using those two data entry methods to enter 4 sets of addresses into the address book on a Palm Pilot. During the first trial block, using keyboard lead to faster entry and fewer errors. Although using Graffiti takes much more time than using keyboard tapping in the initial trial blocks, the later results showed that the time to complete each trial block using Graffiti decreases rapidly, nearly catching the times for the keyboard tapping. The number of errors made by the subjects remain higher for Graffiti with respect to the keyboard tapping. The two-way ANOVA tests showed that the results are statistically significant. 

Impact for Practitioners

The keyboard tapping is easier to learn, less error prone and faster than Graffiti. It is also easier to correct errors. The main problem with the keyboard tapping is the excessive number of switches between screens during a common application. To overcome this problem, manufacturers can try to avoid these switches by employing some other mechanism, such as making the keyboard accessible with no need to switch to some other screen. As an example, Silkyboard [10] attempts to use thesame area for both data entry types and recognize which of them is used at any time. Thus, it avoids switches bertween screens and allows both data entry types to be used simultaneously. 

Another suggestion for the practitioners to obtain a faster data entry is to get advantages of both Graffiti and keyboard tapping and develop another data entry method. This may include a mechanism of writing some characters using Graffiti and some others using keyboard tapping, while minimizing the switches between screens. As an example, TapPad [12] expects Graffiti on the alpha side (alphabetical character side) of the Graffiti area, as usual, but on the numeric side the users can use both data entry methods.

Finally, the Graffiti characters may be replaced by another set of characters by substituting some difficult to write characters by others. The best way is to develop a recognition system where the alphabet is exactly the same with Roman alphabet. This may be a difficult task but it seems the ultimate solution. 

Suggestions for Future Researchers

We believe that using Graffiti for more time (i.e. more trial blocks) will yield faster results than the other method because of high delays faced during switching screens when the keyboard is used. To explore this, a similar experiment should be conducted among experienced Palm Pilot users. 

We also believe that conducting an experiment for suggesting alternative characters instead of difficult characters will be useful in case of Graffiti. Our experiment results clearly indicate that some characters are too different than most users' handwriting and ways to improve the recognition of them are worth to explore.

Refinement of the Theory

Our hypothesis about the learning curve has been supported by the experiment results, which are statistically significant. However, the time for Graffiti in the last trial block turned out to be more than the time for the soft keyboard as opposed to our initial hypothesis. We believe that this is because:
  • We did not use enough number of subjects
  • The total amount of time spent on Graffiti is not sufficient to be an expert.
If a similar experiment is conducted with taking the considerations above into account, we believe that our hypothesis will be verified with statistically significant results.