Shore '00: Student HCI Online Research Experiments

University of Maryland

Abstract
Introduction
Experiment
Results
Discussion
Conclusions

Acknowledgements
References
Appendices
Credits
Feedback

Back To Main

Image Size vs. Scrolling in Photo Thumbnail Browsers

Experiment

Introduction and Hypothesis

In this experiment, we tested hypotheses related to the design of the thumbnail image browser in the University of Maryland PhotoFinder prototype [5]. This browser loads thumbnail images from a collection of photos into a fixed-size window so that users can search and browse through them. By default, the thumbnails are loaded into the window at the smallest size that allows all the images to be displayed without having to scroll the window. There is currently a fixed maximum thumbnail size, so when there are not enough images to fill the entire window, empty space is left. However, there is no minimum thumbnail size, so large collections of photographs result in very small thumbnails. This design has the advantage of consistency in window design, always presenting the user with a single screen of images with no scrolling required. However, it has the disadvantage of presenting very small images for large collections of photographs. An alternative design would trade consistency in window design with consistency in thumbnail size. Such a design would always present the user with thumbnails of the same, relatively large size so that thumbnails would never shrink to become too small to see. This design has the disadvantage of requiring scrolling or paging for large collections when the number of photographs exceeds the maximum number of thumbnails that fit in the window.

Our experiment compared these two design choices for image search tasks with three different sized collections of photos, with the goal of determining which design was best for each collection size. Each search task consisted of a fixed-size screen window with a collection of thumbnail images arranged in a grid. Different tasks contained different numbers and sizes of images, some with vertical scrollbars and some without. Subjects were given a written description of a photograph in the collection and asked to find it and click on it. Our belief was that larger collection sizes would favor the scrolling design because the thumbnails in the non-scrolling design would be too small, while smaller collection sizes would favor the non-scrolling design. Designers of thumbnail browsers could use this result to change the presentation of images in the thumbnail browser depending on the size of the collection being viewed. Alternatively, they could choose the smallest thumbnail size that was still useful to most users and present all collections using this size.

The experiment was a 2 by 3 design with 6 different treatments. We had two independent variables: image size and number of images. The image size had two treatments: a variable size that always shrunk to fill the screen without requiring scrolling, and a fixed size that required scrolling for more than 16 images. The image number had three treatments: 24, 36, and 48, corresponding to common film quantities. We measured three dependent variables: time to correct completion of a search task, number of errors made in a search task, and subjective satisfaction. We hypothesized that regardless of the image size treatment, both performance time and errors would increase as collection size increased from 24 to 36 to 48.

We also had a number of hypotheses about image size treatment in each collection size. Our hypothesis for collections of size 24 was that there would be no significant difference in time or errors between the fixed and variable treatments since the image sizes were not that different and not much scrolling was required. However, we thought that there would be a significant preference for the variable size since it would not require scrolling at all. Our hypothesis for collections of size 36 was that users would perform significantly faster and have a significant preference for the variable treatment. Again, the variable treatment would not require scrolling and the thumbnails, while smaller than in the collection of size 24, would still be easy to see for most images. However, we thought users might make a few more errors with the variable treatment because a few of the smaller images might be harder to see. Our hypothesis for collections of size 48 was that users would make significantly more errors with the variable size treatment since the images would be small and difficult to see, and would thus prefer the fixed size treatment. We were not sure if we would see a significant time effect since scrolling in the fixed treatment and image differentiation in the variable treatment would both likely increase the time to locate an image.

Pilot Study Results

We ran four pilot subjects using our initial experiment design, which revealed a number of problems that we then fixed. Our subjects, three of whom were experienced experiment designers, noted that our instructions were too long. Subjects were asked to read the instructions on screen, so we replaced long paragraphs with short, bulleted items. One of our background questions asked subjects if they wore corrective lenses. We clarified this question to ask if they required corrective lenses to use a computer. Since the entire experiment was carried out on a computer, we changed all of our windows to be non-resizable and made all of our error and message boxes modal to prevent users from accidentally closing or ignoring a window. Our pilots found a few spelling errors and awkwardly worded search tasks, which we corrected. Some of the pilots did not initially see the scrollbars on the windows that required scrolling, so we made sure to point this out to subjects during their practice tasks.

Two larger problems that the pilots discovered required more major modifications. First, the tasks required subjects to move a mouse and point at a target, so undesirable effects from Fitts' Law were possible if subjects were not making relatively similar movements in all of the tasks. In the original experiment design, we did not control for the amount of scrolling and mouse movement required in each task. We changed the final design to require subjects to start each task with the mouse at the same point on the screen. Subjects performed 3 tasks in the fixed treatment and 3 tasks in the variable treatment for each collection size. There were two different collections of images for each collection size, and subjects were counterbalanced equally according to which collection of images they used for each image size treatment. Thus, for each collection size, we made sure that the combined mouse movements for each set of 3 tasks were about the same in the two different collections of that size.

The second problem involved the post-trial questionnaire that we used to measure subjective satisfaction. We wanted to assess which image size treatment (fixed or variable) subjects preferred for each of the three collection sizes. However, to prevent any biasing, we did not indicate the collection sizes or image treatments to subjects during the experiment. After completing the experiment, subjects knew they had seen different sized collections and different sized images, but had no idea exactly what the treatments were. Thus, the original, single screen questionnaire that asked subjects about their preference of the different treatments was too hard. We changed the questionnaire to a three-screen version, one screen for each of the collection sizes, with illustrative screen shots to make the distinction clearer.

Subjects

We recruited 24 subjects to participate in the experiment. The pre-screening requirements were that subjects were familiar with using a computer and a mouse, wore their corrective lenses if necessary during the experiment, were not severely color-blind, and were not members of the Human-Computer Interaction Lab (HCIL). The last requirement was necessary because the pictures used in the experiment were mostly of HCIL people and events. Roughly half of the subjects were computer science graduate students recruited from our building. The other half were undergraduate and graduate students that we recruited from photography, digital photography, and computer graphics classes. We believe these two populations represent a reasonable sampling of the early users of digital photography hardware and software: computer scientists familiar with the technology and people interested in the areas of photography and graphics. A larger, more diverse group of subjects would be needed to verify our results on a more general population.

Six women and eighteen men agreed to participate, a not unreasonable disproportion in the male-dominated computer science area. All but three of the subjects were between 20 and 30; two were between 31 and 40 and one was between 41 and 50. Eighteen subjects used a computer more than 20 hours a week, three used one 16 to 20 hours a week, two used one 11 to 15 hours a week, and one used one less than five hours a week. One subject indicated that he was colorblind, but did not report any problems seeing the images or performing the tasks. One of the authors is color blind, so the experiment was designed carefully to prevent any confusion. Half the subjects wore corrective lenses during the experiment. Subjects were split relatively evenly according to these characteristics across the experiment groups described below.

We used a within subjects design, so all subjects saw all 6 treatments. We used a different collection of photographs, numbered 1 through 6, for each treatment. We counterbalanced the collections so that half of the subjects used collections 1 through 3 in the fixed treatment and 4 through 6 in the variable treatment and the other half used the opposite arrangement. We also counterbalanced the subjects so that half saw the three fixed treatments first and half saw the three variable treatments first. Subjects were randomly assigned to one of the four resulting groups:

Group 1 Group 2 Group 3 Group 4
24 (1) Fixed 24 (1) Variable 24 (4) Fixed 24 (4) Variable
36 (2) Fixed 36 (2) Variable 36 (5) Fixed 36 (5) Variable
48 (3) Fixed 48 (3) Variable 48 (6) Fixed 48 (6) Variable
24 (4) Variable 24 (4) Fixed 24 (1) Variable 24 (1) Fixed
36 (5) Variable 36 (5) Fixed 36 (2) Variable 36 (2) Fixed
48 (6) Variable 48 (6) Fixed 48 (3) Variable 48 (3) Fixed


Finally, we counterbalanced the six subjects assigned to each group into six subgroups according to the order in which they saw the collections sizes. Subjects saw both fixed and variable image sizes in the same order:

Subgroup 1 Subgroup 2 Subgroup 3 Subgroup 4 Subgroup 5 Subgroup 6
24 24 36 36 48 48
36 48 24 48 24 36
48 36 48 24 36 24

Materials

The entire experiment was written as a Java 1.2 program and administered on two Windows NT computers using similar 17 inch, 1280 x 1024 resolution monitors with similar refresh rates and brightness and contrast settings. The initial screen in the program allowed the experimenter to select the group number and subgroup number for each subject. The program then used these numbers to administer the image search tasks in the appropriate order for each subject. The experiment was a "self-guided" series of windows and tasks and consisted of four parts: introduction and background, training, tasks, and post-experiment questionnaire. The instructions (see Appendix A.1) briefly explained the experiment and provided a motivation in the form of a prize for the subject with the fastest performance and the fewest errors. The background information questions (see Appendix A.2) asked users about their age, gender, and computer usage, and whether or not they were colorblind or wore corrective lenses. The training and experiment search tasks all involved finding a particular image in a collection of thumbnails based on a written description. We created the collections of images from a library of color digital photographs available in the HCIL. Each collection consisted of different images, and there were no duplicates in any single collection. We tried to include a similar variety of images in each collection. The most common types of pictures were headshots of individuals, outdoor scenes, and conference speakers and audiences.

The thumbnails were created by the program on the fly from the original photographs for each of the six treatments. They were displayed in a fixed size window (560 x 560 pixels) in various sized grids. For the fixed size thumbnail treatments, the grid cells were 140 x 140 pixels for all three collection sizes and were displayed so that only a 4 x 4 grid was visible at a time. The rest of the thumbnails were visible via vertical scrolling (see Appendices A.4 - A.6). For the variable size thumbnail treatments, the grid cells were 112 x 112 pixels in a 5 x 5 grid, 93 x 93 pixels in a 6 x 6 grid, and 80 x 80 pixels in a 7 x 7 grid for the collections of size 24, 36, and 48, respectively (see Appendices A.7 - A.9). Our photographs consisted of both horizontal and vertical images. To preserve the aspect ratio when we created each thumbnail, the larger of the width and height dimensions of the original photograph was shrunk to the grid cell size and the smaller dimension was shrunk to preserve the aspect ratio. We chose the thumbnail sizes based on the results of the experiment in [4], which indicated that images of about 90 x 60 pixels might be optimal, while smaller images were harder to see.

For both the training and experiment tasks, subjects were presented with a window that contained a written description of a photograph and a "Start" button with none of the thumbnails visible (see Appendix A.3). After reading the descriptions, subjects had to press the button, always located in the same place, to display the thumbnails and begin the timing of the task (see Appendices A.4 - A.9). Because the "Start" button was always in the same place, this ensured that subjects always started a task with the mouse in the same location. It also allowed us to control for different reading speeds and allowed subjects to ask questions without being penalized with longer times if they did not understand the task. If the subject clicked on the correct image after searching through the grid, the timer was stopped, the elapsed time recorded, and the next task window was presented. If they clicked on an incorrect image, an error message box appeared with a "Please try again" message and the error count for the task was incremented. After closing the box, subjects could then try again. Subjects performed 6 training tasks, 3 on a collection of size 15 using the fixed size image treatment followed by 3 on a collection of size 38 using the variable image treatment. Subjects then performed 18 timed tasks, 3 in each of the 6 treatments using a different collection for each treatment.

At the end of the experiment, subjects filled out a questionnaire to assess their subjective satisfaction for image size treatment for each of the three collection sizes (see Appendix A.10). Each of the three questionnaire screens explained that subjects had seen two sets of collections of a particular size (24, 36, or 48), one with large images in a scrolling window and one with small images in a non-scrolling window. Below the explanation were sample screen shots of the two collections as they appeared to subjects. Subjects were then asked to indicate which arrangement they preferred on a 1 to 9 scale, with 1 indicating strong preference for the fixed treatment, 5 indicating no preference, and 9 indicating strong preference for the variable treatment. The program recorded the preferences for each collection size.

Procedures and Problems

The administration of the experiment was quite simple. The experimenter launched the program and selected a group and subgroup for a subject. Subjects from different classes and computer science graduate students all signed up to participate in the experiment at relatively random times, so subjects were assigned to a group and a subgroup in the order in which they had signed up. The subject read and signed an experiment consent form and was then seated at the computer where the instruction screen was displayed. No other windows or icons were visible on the screen to prevent distractions. Our experienced pilot subjects warned us that subjects didn't always read instructions carefully, so the experimenter gave a quick summary of the instructions before the subject began reading. The subject was then guided through the experiment, either by pressing "Next" buttons to move between windows or by clicking the correct thumbnail in a task. After completing the practice trials, a window advised subjects that the practice session was over and the real tasks were next. A window also advised subjects when they were halfway through the real tasks. The experimenters were on hand to observe and answer questions throughout the process.

We encountered three problems in the experiment, only one of which may have affected our results. First, a number of subjects, in fact probably a majority, did not notice the vertical scrollbar in the window for the fixed size image treatment during the practice trials. We were careful to require scrolling in the practice session so that we could point out the scrollbar to these subjects then, rather than during the timed tasks where they might be slowed down. After being shown the scrollbar, none of the subjects had a problem in the timed tasks. Second, three subjects, for whom English was a second language, were confused by the description in one task because they did not know what an easel was. However, all three asked the experimenter for clarification before clicking the "Start" button to begin the timing of the task, so their timing results were not affected.

The third problem we encountered was more serious. Two or three subjects indicated that when they were completing the questionnaire, they did not realize that there would be three different questions for the three different collection sizes. They initially thought that there was only one question, asking them to choose between a scrolling or non-scrolling design no matter what the size of the collection. Once they realized there was more than one question, they understood that they could have a different choice for each collection size and they wanted to go back and change their answer to the first question. Unfortunately, our program did not allow this, so the responses for some of the first questions, which asked about collections of size 24, may not be accurate.



Department of Computer Science: Direct questions and comments to the student editorial team

University of Maryland