SHORE 2001 Logo
SHORE 2001 Logo University of Maryland Logo
Student HCI Online Research Experiments
Abstract
Introduction
Experiment
Results
Discussion
Conclusions
Acknowledgements
References
Appendices
Credits
Feedback
SHORE 2001 : Handheld Devices : A Comparison of Grafitti vs. the On-Screen Keyboard for Experienced Palm Users

Authors

Daniel Giambalvo - dgiambal@wam.umd.edu
Ann Frolov - frolova@wam.umd.edu
Navid Norouzi - norouzi@wam.umd.edu

Abstract

Two Palm Pilot input methods, Graffiti and on-screen keyboard were each studied for speed and subject preference on two tasks, a memo field task an multiple field address task. The experiment tested speed to correct completion of the four tasks. Twenty experienced Palm pilot users participated in the study. This was a within subject 2 x 2 design. The subjects were timed on all for tasks, and were given a satisfaction survey with questions to be rated on the scale of 1-9. A two way ANOVA test was performed on the raw data. The test showed no statistical significance in the comparison of the mean speed of completion to correctness using Graffiti and on-screen keyboard. A t-test performed on the survey responses concluded that there was a statistically significant preference of Graffiti for the memo field task. A t-test for the address field did not yield a statistically significant difference in preference.

Introduction

Overview

A large trend in the recent past has been a move to ever smaller, ever more portable computing devices. Computers have become smaller over the years, shrinking from the mainframe of 30 years ago to the laptop of today. Technological and market forces have pushed today's computing devices even beyond the size of a laptop. Mobile phones have begun to encompass much more than simply phone calls, allowing email, access to the web, and text communication. A new market has also emerged for so-called personal digital assistants or PDA. These PDA, first popularized by the Apple Newton, and later driven by other, even smaller devices, offer organizational features storing telephone numbers, appointment management, and basic note taking. They are also capable of running third-party applications, including graphics programs, games, and Web browsers.
PDA's have become extremely popular due to their small size and quick access. Compact and light enough to be carried in a pocket or a purse, they allow the user to take their life with them, offering access to all the important people, dates, and other information wherever they are. Furthermore, the ability to 'sync' with other sources of information (such as email on the user's computer) has enhanced their application.

The PDA however, represents more than simply another step on the road to infinitely small computing. Unlike past miniaturizations, such as the creation of the laptop, the PDA requires a new paradigm of user-device interaction. The PDA's focus on small size and mobility means that standard equipment such as the keyboard is no longer a practical alternative. The PDA's small size precludes the inclusion of a built in keyboard. Furthermore, the PDA is often used with one hand, while standing, or on the move. In these situations, using a keyboard would be a definite burden. To overcome this, alternative methods of input had to be devised.

The Palm Pilot (henceforth the Palm) by Palm Computing is an example of a PDA. The Palm fits the description of a PDA stated above. It comes with numerous built in applications, as well as a standardized OS (PalmOS) on top of which other applications can be written. Palm OS is probably the most popular platform available today for PDAs. The PalmOS is designed to support a touch screen interface with the use of a stylus. This stylus, which is essentially a pen based input method, is used for clicking buttons, and selecting menu items. To overcome the task of text input without a standard keyboard, the makers of PalmOS equipped the device with two alternative forms of input. The first form of input is an on screen keyboard. The PalmOS is capable of displaying on the screen a small keyboard, which displays keys to be tapped by the stylus. By switching between a few basic key-sets (letters, numbers, etc) users can enter in any combination of characters. The on-screen keyboard is displayed in parallel to any application the user is currently working with. When the keyboard is closed, the entered text appears in the application.

The second form of input is a handwriting recognition scheme known as graffiti. Graffiti is a unistroke-type input language in which the user draws symbols on the screen, which are then interpreted by the Palm as characters. A unistroke language is a symbolic language where every symbol in the language consists of one continuous stroke. Once familiar with the symbol set, the users can enter the text with these Graffiti stokes as an alternative to the onscreen-keyboard.

The purpose of our experiment is to compare the speed and effectiveness of these two methods of user input for the Palm Pilot. Our goal has been to study the speed with which users, experienced with both methods of input, could enter text. In order to get an idea of how task differences affected the viability of these input methods, we tested two different forms of input -- memo and address. We sought to determine if there was a statistically significant performance difference for either of these methods, and if that difference was linked to the task performed.

Previous Research Material

There are several pieces of prior research, which have built the case for our research. A study conducted in 1997 at the University of California at Berkley looked into usage patterns of the Newton and the Palm Pilot [2]. The study looked into usage patterns as well as effectiveness of Graffiti for the Palm Pilot. The study found that 74% of palm users surveyed used the Palm more than 5 times a day. That was the highest range they allowed for, so the average rate is potentially greater than 5 [2]. This figure is relevant for two reasons. Firstly, it states that there is a large group of regular Palm users. Secondly, the large amount makes optimal input usage more important. If the Palm were used once a week, the amount of time potentially wasted on sub-optimal input methods would be far less weighty. Based on the contents of this study, we feel that there is a definite target group of users who would benefit from research and improvement of the input device.
A second study conducted in 1994 by Scott MacKenzie of the University of Guelph with several other researchers looked at numeric input for several types of input methods, including both handwriting recognition, pen based keypad, and two types of "pie pad" input methods. The study tested 16 computer-savvy volunteers, asking them to enter a set of numbers with all four input methods. The subjects' results were analyzed both for speed and number of errors. The study found that the pen-based keypad was both the fastest and most accurate form of input with 98.8% accuracy, and a speed 30.4 wpm [3]. The second best input method was handwriting recognition with 89.6% accuracy, and a speed of 18.5 words per minute. Subjectively speaking, the keypad input was preferred only slightly over the handwriting recognition. In further questioning, the study found that most users actually preferred the handwritten recognition, but simultaneously recognized that keypad was a more efficient way of entering data [3].

There are several very interesting ramifications of this study. Firstly, the study found that the keypad was the most effective method of input. The keypad tested, correlated directly to the on-screen keyboard on the Palm Pilot. However, this was compared to a full-fledged handwritten recognition system. The Palm's recognition of Graffiti, which is a unistroke symbolic language as mentioned above, stands to benefit from being both quicker to write, and easier to recognize. This may improve both speed and accuracy. Even more interesting is the fact that users preferred to use the handwriting recognition even though it was sub-optimal.

A third study, which was conducted 1995 at the University of Bristol Research Laboratories, looked into how handwriting recognition was differently suited for different tasks. The experiment took subjects through 3 separate tasks, a diary type task where entries were made in a diary, a name look up task from a database, and a task where subjects had to compose a fax, requiring use of both name and paragraph entry. Subjectively, it was found that users felt handwriting recognition was far more effective for the name lookup task than the diary entry, with the fax task falling somewhere in the middle [4]. There was also a correlation between the accuracy of the users and the effectiveness ratings they gave out, showing that as accuracy drops, users become less content in using handwriting recognition. The conclusion of the researchers is that as the ratio of errors dealt with to payoff of the completing the task decreases, users become less and less satisfied with handwritten recognition. The paper also suggests a required accuracy rate of 97% - 99% for general user acceptance of handwriting recognition [4]. Again, it is worthwhile to note two things about this study. First, that it showed users may have different levels of satisfaction depending on the task at hand. Secondly, that this study tested true handwriting recognition, as opposed to the simpler and more accurate Graffiti, is used on the Palm Pilot.

Scott MacKenzie conducted the fourth sighted research in 1995 at the University of Guelph. The experiment studied the immediate usability of Graffiti for new users. Subjects were asked to use Graffiti with one minute of studying the Graffiti reference chart, following five minutes of practice, and finally retested after a one week period from the initial tests. The results from this study were actually quite impressive. MacKenzie found that after only one minute of practice, users on average showed an 85.5% weighted accuracy (weighted by prevalence of different letters in English language.) This rate was increased to 96.9% after only 5 minutes of practice. Even more startling was the fact that one week later, the weighted accuracy rate for users was 97.2% (with an unweighted accuracy of 95.8) [1]. These results seem to add support to the effectiveness of Graffiti as a feasible input method for the Palm Pilot. By achieving the 97% target accuracy weight, found in earlier studies, this report would seem to suggest that even for novice users Graffiti could be very effective. The results suggest the improvement in usability that Graffiti offers from older, more complex handwriting recognition systems. Finally, MacKenzie suggests that with practice, user accuracy might increase to as high as 99% [1]. This rate would match even the accuracy found for the keypad in Dr. MacKenzie's earlier study [2].

In conclusion, we think that the available prior research supports a lot of the questions we are targeting with our study. While earlier methods of handwriting recognition seemed to lack the accuracy necessary to satisfy the user, Graffiti is well on the way to solving this problem. According to the above studies, there is a definite difference in user satisfaction related to task being performed, and users may have differing subjective preferences for tasks. In addition, the users seem to have chosen handwriting recognition over keypad entry even when keypad entry has provided a better objective solution. Also, users stand to benefit from improved data entry due to the Palm's substantial use. Finally, the same high usage statistics create an environment, where experienced users can feasibly choose between the keypad and Graffiti, possessing sufficient skill in both.

Relevant Theories and Observations

There are also some observations and theories which may prove helpful in understanding the advantages and disadvantages of the two input methods being tested. The on screen keyboard is assumed to require little or no actual training. While Graffiti requires the user to be aware of special strokes, which symbolize characters, the keyboard requires only the ability to recognize the Roman alphabet and tap on the appropriate characters. The keyboard is in a standard QWERTY layout. Even those not familiar with the QWERTY layout can fairly easily find the characters they are looking for by scanning. Indeed, even for typists, the ability to automatically hit a key does not necessarily translate to improved speed when taping on the keypad.

The keypad also has its disadvantages. Many of these are related to standard difficulties associated with pen-based input. One disadvantage is that the users hand blocks their view of the keypad [5]. This may require the users to remove their hands to find the next character, slowing the progress. Also, with the small size of the keyboard, the user must be careful to tap the correct key. Missing by as much as a quarter of an inch can result in entering the wrong character. It has also been noted that the onscreen keyboard requires the user to look at the keyboard, distracting them from looking at what they've typed.

Graffiti is seemingly the more radical input method for the Palm. Unlike the on screen keyboard, which requires little to no training, Graffiti users have to learn a new set of keystrokes to be able to use it effectively. While studies have shown this learning curve to be relatively shallow, it is still greater than using the keypad. Despite these differences, MacKenzie is quick to point out there are some similarities between Graffiti and standard keyboard inputs [1]:

  1. Graffiti's input is character by character.
  2. Graffiti allows the user's eyes to fixate on the application's insertion point rather than on the input device.
  3. Graffiti uses modes to access uppercase characters and special symbols

These features, which offer advantages to the keyboard user, may similarly enhance the Graffiti user's experience.

Graffiti represents a merger between the standard Roman character set and the rules of a unistroke symbolic language. By mimicking the Roman alphabet, Graffiti possesses the dual benefits of making it easier to learn, and easier to remember. This is important since Graffiti is based on recall, rather than recognition [2]. The recall requirement is actually one of the disadvantages of Graffiti because it puts a strain on the users' memory (at least initially.)

By adhering to the unistroke philosophy, Graffiti possesses two advantages over basic handwriting recognition. Firstly, it eliminates the "segmentation problem" which arises from the difficulty in recognizing when one multi-stroke character ends and a second begins [1]. Secondly, it renders spatial relationships between characters irrelevant [1]. With handwritten text, letters move generally to the right of the preceding character. With Graffiti, however, they can be written on top of each other without confusion because each character is one stroke, and one stroke only. These two factors are instrumental in making Graffiti faster to and more accurate to use than previous handwriting techniques.
At the same time, there are some definite disadvantages to Graffiti as pointed out by MacKenzie [1]. Firstly, there is the problem of misinterpretation of input. While this is a potential problem with a standard keyboard as well, it is enhanced by the fact that the range of misinterpretation is much larger than simply adjacent keys. It is highlighted that this is especially true of gestures like delete, carriage return, or cut/paste. Also, while most characters closely resemble their Roman counterparts, certain letters do not; and users are likely to encounter more trouble with them. This, however, may be less of a burden to the experienced user.

Experiments

Introduction

Our experiment tested 20 experienced Palm and Graffiti users on their ability to complete two tasks both with the on screen keyboard and Graffiti. We used a within subjects design, with each subject performing a memo and address entry task first with input method, and then performing the same tasks with similar input using the alternative input method. For each task/input method combination we recorded the time to correct completion of the data entry. Timing was tracked by a custom Palm application which automatically compared entered text to a predefined input set. After all four tasks were completed; the subjects were given a survey to assess subject satisfaction.

Our hypothesis was that Graffiti would be a faster method of data input than the onscreen keyboard. Graffiti was shown to be relatively easy to learn, and experienced Graffiti users were likely to be quite fast with it. Also, Graffiti offered advantages when it came to error detection and the ability to focus visually on the input, rather than the onscreen keyboard. We also believed Graffiti would achieve a higher advantage over the keyboard on the address entry task, as it required moving to different fields. With the onscreen keyboard, such field movement required multiple entries and exits of the keyboard.

As described above, our project utilized a two by two cell design. Our independent variables were as follows:

  1. Input Method
    Treatments: onscreen keyboard, Graffiti
  2. Input Task
    Treatments: single filed memo entry, multiple field address entry.

Our dependent variables were the time to correct completion and subject preference of subjects.

Pilot Study Results

Two subjects were tested in the pilot study. Each was timed to determine how long each component of the test took. The average total time for subjects was 18:47. Approx. 6 minutes of that was devoted to entering the memo field. It was decided that this task should be shortened to speed users test time.

It was also determined that more elaborate instructions had to be written. The previous method of a brief explanation followed by questions left the subjects too uncertain of their purpose. It was also noted that the pilot subjects had a lot of questions which we did not anticipate. These included questions about the experimental steps, how the timing program would track the subject, and how error correction should be handled by the user. The more elaborate instructions sought to answer some of these questions.

We also found that we had to more clearly emphasize certain aspects of the process. During the timed phases, subjects stopped to ask questions. This obviously skewed the timing. The practice phase was also found to contain unnecessary characters and these were eliminated. Finally, we decided to provide a chart of Graffiti characters as a reference, similar to that included with the Palm.

Subjects

As stated above, our subjects consisted of 20 experienced Palm Pilot users. Our subjects were all male, and primarily from the computer science department of the University of Maryland. Finding experienced Graffiti users was a difficult task, and despite the element of homogeneity that our test group contains, it was the best alternative available. All subjects claimed to be "experienced Graffiti users", and had used Graffiti for at least one month prior to being subjects.

Materials
The materials for this experiment consisted of:

  • A permission form
  • 2 Palm Pilots (one IIIxe, one V)
  • Written instructions
  • Printed input for each task
    • Graffiti First
    • keyboard first
  • The subject survey
  • Graffiti chart

The permission form was a standard subject authorization form which all subjects signed prior to participating in the experiment. The two Palm Pilots were each loaded with our timing software. A different set of test input was loaded into each Palm so that subjects could rapidly move between input methods. The timing software started with a begin button which the subject had to click. Upon clicking, the test began. The subjects were then presented with a series of lines on which to enter the memo field. When the users were finished with the memo field (input was correctly entered) they would be displayed with a time in seconds which was recorded. They would then be shown three smaller fields on which to enter address information. After the address information was correctly entered, a second time was displayed. The program would then return to the begin button.

The written instructions were read to each subject prior to being tested. The printed input was used by the subjects during testing. Each subject had to enter in two memo field, as well as two addresses field. Because half of the subjects would use the keyboard first, and half the Graffiti first, two versions of the input were produced. Both versions also contained the warm up input which the users entered prior to being tested. In addition, a Graffiti chart was available on hand for the subjects to use while conducting the experiment. Finally, we had a set of surveys which each subject filled out when finished with the experiment.

Procedures

Before we tested subjects we sat them at a table with the two Palm Pilots in front of them. These Palm Pilots were prepared to test the subjects, loaded with the proper input to expect. Also displayed was Graffiti reference chart, and the top of the input data sheet.

We first read the instructions to each subject. Following the instructions, any questions the subject had were answered. Next, the subject performed a warm-up exercise. The warm-up exercise consisted of entering a string of characters, numbers, and punctuation into a memo field. The subject was instructed to enter the string both with the onscreen keyboard, as well as Graffiti.

After the warm-up, the subject was again given an opportunity to ask questions. After any additional questions were answered, the actual experiment was conducted. The subject would enter in the memo input with their first input method. When correct completion was achieved, a time in seconds was displayed and recorded. Following the recording, the subject would be taken to a separate screen and prompted to enter the three field address information. After correct completion, a time would again be displayed and recorded. This would complete the first half of testing with one of the two inputs. The test would then be repeated on the other Palm Pilot with the other input method

After all testing was complete the subjects were given a survey to fill out.

Problems

There are a couple of problems the pilot study did not reveal, but which became evident during our testing. One of the primary sources of problems was in users detecting their errors. In both keyboard and Graffiti, subjects often spent a large amount of time finding their mistakes. It could be argued that this is not ideal, as most of the tasks that a PDA is used for do not necessarily require perfection. Also, it would have been helpful if the software offered some indication to the user of where mistakes were. Often, even though there were mistakes, the subject couldn't find them and began to doubt the program. This situation was further gravened by the fact that the program was white space sensitive.

It was also difficult to find subjects, and difficult to accurately determine if they were experienced or not. At least one or two people claimed to be experienced palm users, when in fact they did not know Graffiti well. Most users did not need the reference chart and flew right through the Graffiti test. A more extensive survey would have helped uncover these differences.

Thirdly, some users were so unfamiliar with the on-screen keyboard that they did not understand they had to exit it for different fields. This caused confusion with some users. Also, the address entry was somewhat confusing because there were 3 sets of 3 lines, and 3 lines in the address. People often tended to put the address on the first three lines. This also led to questions about the allowance of things like cut and paste. We decided to allow cut and paste.

Results

The raw data can be viewed in Appendix A.

The experiment measured time to correct completion of four tasks - Memo field using Graffiti, Address Fields using Graffiti, Memo Field using Onscreen Keyboard and Address Fields using Onscreen Keyboard. The table with mean and standard deviation for the 4 tasks is shown in [figure 1]. The graphical representation of this data is shown in [figure 2].

Since there are 2 independent variables - input method and form of input, a 2 way Analysis of Variance (ANOVA) with replication has been calculated. The excel spreadsheet with the calculations can be seen in [figure 3].

  Memo Field Address Field
Keyboard    
Graffiti    
example of 2x2 cell design

H0 input method: There is no difference in average times to correct completion with the different input methods.

Critical value for testing the H0 at level of significance of .05 is 3.49. Since F of 1.389811< F critical of 3.9667, we do not reject the null hypothesis that there is no difference in average times to correct completion with the input method. (Graffiti v. Keyboard) The test has shown no statistically significant difference between Graffiti and Keyboard.

H0 input form: There is no difference in average times to correct completion with the different form inputs.(Memo Field v Address Fields) Critical value for testing the H0 at level of significance of .05 is 3.9667. Since 22.79983 > 3.966761, we reject the null hypothesis in favor of the assertion that time to completion varies with the type of input form. There is significant statistical difference in performing a memo field task versus an address field task.


Subjective Satisfaction
In order to assess the subjective satisfaction of the two input methods, 2 t-tests were performed on question 1 (effectiveness of Graffiti in memo field Task) & question 2 (effectiveness of Keyboard in memo field Task) and question 3 (effectiveness of Graffiti in Address fields Task) & question 4 (effectiveness of Keyboard in Address field Task) respectively. The results from the Excel calculation can be seen in figure [4a,b] below. The user preference for Graffiti in the memo field showed to be statistically significant (t = -3.4, p=0.00088) at alfa=0.05 significance level. User preference for Graffiti in the address fields however, has not shown to be statistically significant (t= -1.95, p=.06579).


The table with the mean and standard deviation for the 4 questions can be seen in [figure 5]. The graphical representation of the data is shown in [figure 6].

Discussion

In the analysis of our experiment, the 2WAY ANOVA test revealed no statistical significance in the difference between Graffiti and Keyboard input methods. Therefore, there was no found statistically signifigant advantage to for the onescreeen keyboard or Graffiti as a method of input.

The test did yielded statistical significance between different form inputs. This second result was anticipated because the two different kinds of input forms required different amounts of time and were not equivalent in character length. However, the lack of statistical significance for the different input methods conflicts with our original hypothesis.

We believe that there were several key factors in the experiment that have lead to such results. The lack of statistical significance can be mainly attributed to the fairly small number of subjects tested (n=20) and the high variance of the results. In order to accurately record "correct completion," the experiment utilized a program, as described in the above section of the paper. The program, showed no indication of where an error may be; it simply did not allow the user to exit until the problem was fixed. Several subjects were able to type in all the input without any glitches, showing instrumental speeds with Graffiti. However some equally skilled subjects lost substantial amounts of time trying to track down the error they may have made early on.

Opening our experiment to "expert Palm users" we inadvertently left things up to interpretation. One month use of Graffiti was set as a minimal requirement, but this may have been an insufficient determinant of what expert users were. We discovered that there were complex differences in the way an "expert" may utilize the Palm. We have found that the 1-6 Month "experts" tended to favor the keyboard and had not had utilized graffiti as much as others. As the time of use increased, we encountered the opposite. The more experienced "experts" mainly used Graffiti and were unaware of some relatively simple concepts related to the keyboard (such as exiting the keyboard at each address field). This further skewed the data. The survey results revealed the overall trend in preference towards Graffiti for the memo field, which the numbers did not reflect due to the loss of time subjects had in tracking down their mistakes. We were able to quantify some of this information only at the conclusion of the experiment, see table below for subjects' primary input method and the time span of using the Palm.

As shown in [figure 7] 80% (16/20) subjects use Graffiti as their primary method of input. This further supports the statistical significance in the preference toward Graffiti in the Memo task. As mentioned above, our one month required familarity time with Graffiti standard proved to be insufficient.

Subject comments
The subjects were very cooperative in participating in the experiment. A lot of them were frustrated after not being able to track down their mistakes and losing time. They felt under pressure when they could not easily find the mistakes. The program also tracked white space, which made it even more difficult to find mistakes such as missing carriages returns and tabs.

Interestingly, subjects also tended to comment on their likes or dislikes of Graffiti or the onscreen keyboard. Many of these comments matched points dicovered in background research. For example, several subjects commented on how Graffiti allowed them to look at the text they were writing, rather than the PDA itself. It was also metioned that the onscreen keyboard offered the advantage of immediate feedback. When you click a key, it immediately shows you which key you clicked.

Conclusions

Impact for Practitioners

Our test revealed no statistical significance between Graffiti and Keyboard input methods. Hence, one can say that practitioners have no need to learn graffiti if they are used to using the onscreen keyboard or vice versa since there is no statistically significant difference. However, learning both methods may reveal that a user may prefer different input methods for specific tasks.

Suggestions for future researchers

The first suggestion for future researchers is to confirm the fact that subjects are experienced Palm Pilot users. To make sure that all subjects have a common level of skill pertaining to the Palm onscreen keyboard and graffiti, they should be administered a test to determine if they can participate in the experiment. To pass the test, the subjects must complete a mini-experiment within a certain time. The mini-experiment will make sure that all subject have the minimum amount of experience required. Also, more subjects should be tested; instead of 20 subjects, 40 subjects seem more appropriate because our data from subjects was so varied. The second suggestion has to do with the Palm Pilot program, which we used to test our subjects. The idea to use a Palm Pilot program, which records the amount of completion time automatically, is very useful. However, our program was too rigid with spacing when validating the data entered. The subject's time was affected by this ambiguity. Also, our program tested a large memo field and 3 small fields. This is a good representation of the type of forms people fill out using Palm Pilots, but it is not nearly comprehensive. The Palm usually has more applications that have more than one input fields. This is important because the more input fields you have, the longer it takes to enter data using the onscreen keyboard because the keyboard covers the screen, once loaded, and must be unloaded to move to another field. Future researchers should consider this fact and make an experiment that tests more input fields.

Refine the theory or develop a new one

Originally we had proposed that Graffiti would be faster than the onscreen keyboard with the experienced user. However, our results did not yield statistically significant difference in the speed of performance to correct completion. The definition of an experienced Palm user is extremely broad and practitioners may specialize in specific tasks. Depending on the nature of the task and the user's preferences, the user may become more proficient with either Graffiti or on-screen keyboard.

Other suggestions

Perhaps it would be worthwhile to recruit novice users for a similar experiment and guide them through a period of training in both Graffiti and on-screen keyboard, devoting the same amount of time to both. With that approach, the variability in the subjects' skills could be reduced. To learn more about the situational preferences of each input a variety of surveys may be conducted and experiment tasks can be set up accordingly. This may be useful in improving the input methods to better tailor the customer needs.