Shore '00: Student HCI Online Research Experiments

University of Maryland

Abstract
Introduction
Experiment
Results
Discussion
Conclusions

Acknowledgements
References
Appendices
Credits
Feedback

Back To Main

The Effect of Direct Annotation on Speed and Satisfaction

Experiment

Introduction and Hypothesis

We conducted an experiment to find which methods of annotation would be the most efficient to annotate a photo on a display. To do the experiment, we modified the source code for the existing Photofinder prototype to conduct our experiment. For the methods of annotation, we came up with three different methods: direct annotation, click & type, and text box.


In our 1 x 3 experiment, the independent variable was the annotation method of each picture. The three different methods thus became the treatments for the independent variable, with each subject being exposed to all three methods (referred to as a within subjects experiment). Briefly explained, the direct annotation method is similar to what is commonly known as "drag & drop," a method in which a user clicks on a name from the list of names and drag it to its proper location. The click & type method was done by clicking on a certain place on a photo and typing in a name. Lastly, the text box method was a method that involved typing in all the names in a text box provided, typing the names as it appears on a photo from left to right.


For the dependent variables, we had time of completion and subjective preference for each independent variable. Our group's hypothesis was that the direct annotation would have the best overall result: better time and better subjective satisfaction compared to the click & type and the text box methods. For the click & type method, it would have second subjective satisfaction but the slowest time due to its combination of using keyboard & mouse. Lastly, the text box should have a lowest subjective satisfaction due to its dull nature; however, since the text box only requires a usage of a keyboard, this method should have a comparable completion time with direct annotation.

Pilot Study Results

We used five subjects for our pilot study. The result of pilot study was a little mixed; we could not point out which three methods was fastest. The subjective preference of three methods was also mixed, as there was no definite pattern we could see in the pilot study result. We concluded that since the number of subjects was so small, the results were non-conclusive; thus it was rather hard to have any idea about whether our hypothesis would be correct or not.


There were some design changes in the user interface to reduce errors by subjects and the number of instructions was shortened so that the subject would not have to read a lot of text. Overall, it was a successful pilot study as the experiment went according to what we hoped for and we found some weakness that we could work on.

Subjects

For our subjects, we asked for volunteers from computer science classes here at the University of Maryland. It turned out we were fortunate to have such a large group of subjects. After the experiment was done, we had a total of 48 subjects. We had some concerns that they were predominantly male (~80% male) and at about the same age group (18~21). We thought this might create some bias toward our experiment since the 80% representation of males is not proportional to reality. The fact that the subjects were from computer science classes might add to the bias, since most of the subjects would be experienced computer users. One way to vastly improve the validity of the experiment would be to sample a random group of people from the entire campus, or even better, from the general population.

Materials

The experiment was done by using a modified version of the Photofinder prototype. Everything was done on a computer, from handling a survey to measuring time. Before starting our experiment, we provided a background survey followed by training. The background survey contained following questions: "How often do you use a computer?" "How familiar are you with the windows interface?", whether the subject was male or female, and what age group the subject was in by selecting from age group ranges.


In the first two questionnaires, a subject was asked to rate their frequency of use of computers from 1 to 9. The first question regarding frequency of using computers, we provided an ordinal scale with 1 being "less than once a month", 5 being "couple of times a week", and 9 being "several times a day". For the familiarity with the windows question, we provided a range of 1 being "completely new", 5 being "I know my way around" and 9 being "I've mastered it".


After the background survey, a subject then proceeded to the training part. In the training part, a subject could spend a time playing around with three methods of annotating a photo: direct annotation, click & type, and text box. For each method, we provided a sheet with some example photos to annotate. In addition, we provided the instruction sheet and a sheet with photos of people in the demo photos, with theirnames captioned. On-line help was also available to a subject by clicking on a help button. When a subject was satisfied to begin the real experiment, a subject was then asked to enter in an ID number that determined the order in which they did the three methods.


For the actual experimental timed portion, we gave out a sheet of new set of named photos of people. We made three sheets of named photos with each page titled "direct annotation", "click & type", or "text box". The order in which the subject was pre-set so that with 1 being direct annotation, 2 being click and type and 3 being the textbox method, the subject would get an order from one of the six permutations of the three numbers (123, 132, 213, 231, 312, 321). For every permutation, an even number of subjects were set. This was done to prevent some bias that might happen if we used some fixed ordering to do our experiment. Also to prevent any sort of bias due to the name length ending up affecting the experiment, the total number of characters that the subjects had to type were exactly the same.


During the experiment, the computer would keep track of time and recorded any mistakes that might be made by a subject. There was no way for a subject to know how fast they were doing, however, since the time was not displayed. The program won't tell a subject when they annotated wrong, either. Every subjects did all three methods albeit in different order. After the experiment was over, a subject was then asked for the questionnaires to rate the each three annotating methods. This was done in scale of 1 to 9 with 1 being the worst subjective preference and 9 being the best subjective preference.

Procedures and Problems

The administration of the process is detailed in the above section. Some of the problems that we encountered included the absence of subjects who signed up; the total number of subjects that signed up was over 70, but only 48 came. Also, each trial was supposed to not last more than 20 minutes but some subjects took quite a long time in the training environment, resulting in some of the subjects running over their allotted time and us falling behind in our schedule.


There also was a problem with the subjects not reading all of the instructions, perhaps due to its length. Fortunately the demo/training period handled most of the problems of the instruction length; the subjects, for the most part, asked for help if they did not know how to proceed. Since there was a lot of instructions that were necessary to read, this was a problem that could not easily be solved.



Department of Computer Science: Direct questions and comments to the student editorial team

University of Maryland