2. Experiment

2.1 Independent Variables

The independent variable in this experiment is the size of thumbnails presented. Five treatments of this variable are studied:

  • Treatment 1 : 50 x 35 Pixels (1750 Pixels2)
  • Treatment 2 : 70 x 39 Pixels (2730 Pixels2)
  • Treatment 3 : 90 x 63 Pixels (5670 Pixels2)
  • Treatment 4 : 110 x 78 Pixels (8580 Pixels2)
  • Treatment 5 : 130 x 92 Pixels (11960 Pixels2)
  • 2.2 Dependent Variables

    The experiment has two measured dependent variables; time to recognition, and accuracy of identification. The time "to recognition" is measured between the point when a thumbnail appears to the point the participant clicks on a button to indicate that he or she recognizes (or does not recognize) the image. The accuracy of identification is whether or not the participant correctly identified an image (subjectively done post-experiment). If, for example, a participant correctly identified three out of the four images at a given image size, the accuracy of identification for that subject at that image size would correspond to a 75% rating.

    Fitt's Law allows us to remove the time subjects use to the point at the button with the mouse, thus giving a more accurate reflection of the subjects' recognition times.

    Time for precision pointing = C1 + C2 (index of difficulty) + C3(log2( (C4 / W)

    Time to point = C1 + C2 (index of difficulty)

    Index of difficulty = log2(2 D/W)

    D = distance to a specific target

    W = the width of the target

    We assert that for the purposes of this study, only the relative time taken to complete the given task (recognition) is important. This fact obviates any need to calculate the actual time of "recognition." Henceforth, we refer to the time to recognition as including the time taken to manipulate the pointing device without any loss of generality.

    2.3 Experimental Design

    In the experiment, the subject is asked to recognize images. An instruction screen shown to participants explains what we mean by recognition. Recognition takes into account the purpose of thumbnails, that is, a convenient way to get an idea of what the picture is about. Say the picture the participant is considering is of a leopard in a tree, we want the participant to say the thumbnail is an animal (or feline) in a tree. We do not require that the subject be able discern whether the animal is a cheetah or a cougar, etc.; being able to get the idea of an animal in a tree is enough to be considered a successful identification. In choosing the picture content, we attempted to remove any cultural, gender, and locale biases, by choosing neutral pictures most all people can recognize.

    The computer program begins with a form accepting the participants' basic demographic data. Then the program presents the participant with a sequence of thumbnails. There are twenty different pictures in the database, and each picture has five thumbnails corresponding to each treatment. The order of picture content and size is randomized in order to minimize order effects. For each thumbnail the participant will be asked if they believe they can infer the contents of the image from the thumbnail. The program saves all pertinent information regarding the results, which includes the image's index number, the response, and the response time.

    2.4 Hypothesis

    We hypothesize that larger image sizes will show an increase in correct recognition, which plateaus at a certain level. Further, the data might indicate a "sweet spot;" where time to recognize is little, and the accuracy of identification is high. This level is what we hope to propose as the standard for the size of thumbnails.

    2.5 Pilot Study Results

    A pilot study for the experiment was run between the period of April 1 and April 5. During this period, six participants ran through the experiment using prototype materials. Though the course of the pilot study a major error and several places for improvement were discovered in the test administration program.

    It was discovered that there was a major error in how the test adminstration program randomly selected the image sizes. The test administration program is supposed to randomly select image sizes while ensuring that four images of each size are presented in each test session. This was not working correctly but was corrected immediately after the pilot studies.

    The design of the actual testing screen was refined. In the prototype version of the program, when the participant clicked "Yes" or "No" to the "Do you know what this image is?" question the image still remained visible. It was noticed that the participants would click on "Yes" as soon a new image appeared and then take time to analyze the image. This essentially made the timing data that was collected useless. To correct this problem the image was made to disappear as soon as the "Yes" or "No" option was clicked on.

    It was also noticed that significant mouse movement was required to operate the form in the prototype version. The participant would have to move the mouse to the "Yes" or "No" option ,then to the user response text box, and then to the "OK" button, and then repeat this loop. This allowed the confounding variable of mouse movement time to affect our timing measurements. It also made the system more difficult to use for the participant then it needed to be. In order to correct this problem the interfacing for the screen was redesigned. Now, immediately after the "Yes" or "No" option is selected focus switches to the text entry field, so keyboard strokes are automatically entered into that field. Now upon pressing the enter key, the OK button is automatically activated. This design allows minimal mouse movement. The participant need only move the mouse between the "Yes" and "No" options.

    Also as a result of the pilot study, the terminology used in the introduction screen was improved.

    2.6 Subjects

    We used sixteen college students from the Dorchester Hall of the University of Maryland as subjects. All of the participants had varying degrees of computer experience. Some of the students used were computer science majors or from fields with heavy computer usage and some had little usage with computers. All the students used were familiar with the use of the mouse; the sole learning curve involved was following the directions of the automated system. All the subjects expressed enthusiam while performing the study and were curious about the findings.

    2.7 Materials

    The materials generated for this experiment were fairly comprehensive. The materials were all electronic and are available online in a zip format. The materials consist of an image collection, a test administration program written in Microsoft Visual Basic, and a Microsoft Access Database for compiling and analyzing the test results.

    2.7.1 Test Environment

    The testing was run in small room with a mix of natural and halogen light. The participant was seated in front of a desk with a 15" monitor, keyboard, and mouse. The system display was set to 800 X 600 pixels resolution with Hi-Color (16 bit) color depth. The participant was seated in chair that they could move nearer or farther from the monitor to achieve a comfortable viewing distance. The room was kept quiet and a test proctor was in the room with the participant.

    2.7.2 The Image Collection

    One hundred thumbnail images were generated for use with this experiment. This image set includes twenty different photographic images each reduced to five different thumbnail sizes. The original images were all over 300 pixels X 200 pixels in size. The original images were also all taken from current web content. The image reduction was done using Adobe Photoshop using bicubic resampling. The five different thumbnail sizes were:

    Image Size

    1

    2

    3

    4

    5

    Pixels

    50 x 35

    70 x 49

    90 x 63

    110 x 78

    130 x 92

    This image set is included in thumbnail.zip. The naming convention used for these images is "i_#M_#N.bmp," where #M is the image number(1-20) and #N is the size(1-5). For example "i-12-2.bmp" is the filename for the thumbnail containing image number 12 at size 2. Click here to see the image collection.

    2.7.3 The Test Administration Program

    The test administration for this experiment is completely computerized. The test administration is performed by a Visual Basic test administration program written specifically for this experiment. An executable version of this program is included in thumbnail.zip. The program consists of four main forms: an introduction screen, a demographic entry screen, a testing screen, and test completed notification screen.

    Introduction Screen

    This screen appeared at the beginning of each test session. Its purpose is to prepare the participants for the tasks ahead and to inform them of their right to exit the experiment at their own will.

    Screen capture of intro

    Demographics Screen

    The demographics screen collects demographic information about each test participant. Participants are asked to enter their age, gender, hours of weekly computer use, whether are not they use glasses or contact, and if so whether or not they are currently using them. The data collected from this form is immediately populates the "Demo Table" of the database. Also entered in the "Demo Table" are the date and time of the test. These fields are taken from the system clock on the testing system.

    Screen capture of demographics

    Testing Screen

    After pressing "OK" on the Demographics Screen, the test begins. The first image will apear on the Testing Screen. In the test the participant is presented with twenty images. For each image a screen appears with the image centered near the top of the form (the timer begins). Below the image is a box containing the question "Do you know what the image is?" Next to the question are two radio buttons "Yes" and "No", for response to question. The image of the Testing Screen at this point:

    Screen capture of main testing form

    If "Yes" chosen the image disappears (the timer stops) and another box appears asking the question "What is it?" Next to the question is a text box for the participant’s response. At the bottom of the screen an "OK" button also appears. Selecting the button will cause the next image to be presented. Below is an image of the Testing Screen at this point.

    The interfacing for this Testing Screen was specially designed to allow the mouse to stay located near the radio buttons. Once a radio button is selected the cursor is automatically moved to the text box for the participant’s response. On pressing enter, the "OK" button is automatically activated. This allows the mouse movement for the participant’s to be minimized. Thus reducing the confounding variable of mouse movement time.

    The Testing Screen also captured data that was fed directly into the "Results" and "Ids" tables of the database. The "Ids" table stores a record of each image’s number and size, and the participant’s description of the image. The "Results Table" stores a record for each image size of the sum time to evaluate and the number believed correct. The time to evaluate is measured in milliseconds from the instant the image is shown until the instant the participant clicks on a "Yes" or "No" radio button. The "Results Table" stores the sum of the time to evaluate of the four images presented of each size. The number believed correct is simply the number of times the participant selected the "Yes" radio box for each image size.

    There was also coding in the Testing Screen form that was responsible for determining which images to present in what order. It was decided that all participants would see the same image set. It was also decided the all the participants would see the same images in the same order. What would vary would be the sizes of the images. The test administration program would select a random size (image size 1,2,3,4, or 5) for each image, while ensuring that exactly four of each image size was presented in the series of images. This randomization was done to reduce any order biases in the results. It was also done to ensure that a nearly even amount of participants got to see any given image at any given size.

    Test Done Screen

    Once the twentieth image has been presented and the participant clicks on the "OK" button, a small Test Done Screen appears. The purpose of this screen is basically to inform the participant that the test is complete and to thank them for participating.

    2.3.4 The Database

    Also, included in thumbnail.zip is the database used to collect and analyze all the data captured using the Test Administration program. The database has three tables: "Demo"," Ids"," and "Results". The "Demo" table stores the demographic information for each test participant. The fields in this table are: Date, Time, Gender, Hours of Computer Use, Needs to use Glasses, and Is Using Glasses Now. The Data and Time are entered into the database table as soon as the demographic screen on the test administration program appears. The other fields are entered upon completion of demographic screen. The "Ids" table stores for each image presented to a participant the image number, the image size, and the participant’s text description of each image. If the participant did not recognize an image then "I don’t know" is put into the description. There is an additional field in this a table, a Yes/No checkbox called Correct. The data for this field must be entered later by the test administrator by manually examining the text descriptions. The final table in the database system is the "Results" Table. This table stores for each participant the sum time to recognition, and the number of image believed to be correctly identified for each image size.

    Also included in the database are two queries: the "Ids Query" and the "Timing Query". The "Ids Query" analyzes the "Ids Table" and sums for each image size the number of correct recognitions. The "Timing Query" analyzes the "Timing Table" and returns the average sum time to recognition for each image size. Of course, other queries could also be generated, if desired.

    2.8 Procedure

    The experiment is highly automated. The procedure is as follows: first, the participant will be briefed about the experimental procedures and definitions via an instructions screen. Each experiment run is done with the same machine specifications and moni tor specifications. Since 50.47% of Web users have a 14" or 15" screen (Pitcow and Kehoe, 1996), we test on 14" and 15" screens. Due to a number of factors, larger screens with more negative space do not permit the recognition of smaller pictures as readi ly. The participants are run through a program we have created and tailored specifically for this experiment. The program is self standing, and was written in Visual Basic.

    2.9 Results

    The following chart shows the local minima of the mean time to recognition residing near size 3.

    ChartObject Time to Recoginition

    Statistics Summary

    Size 1 Size 2 Size 3 Size 4 Size 5
    Mean 2.68 2.55 2.15 2.42 2.58
    Standard Deviation 1.54 1.61 0.99 1.54 1.43

    The standard deviation for the experiment was too high to make many conclusions from this data. Note the least deviation, however.

    The following chart shows the overall percentage of correctly identified thumbnails over each size. Note the dramatic increase of correctly identified thumbnails between size 2 and size 3.

    ChartObject Percent Correctly Identified

    Since we have small samples with unknown population variances, we employ a two sample t-test in order to reject the null hypothesis (locally). The following assumptions are made with a t-test:

    1. Both populations are normal, so that X1, X2, ..., Xm is a random sample from a normal distribution and so is Y1, ..., Yn (with all the X's and Y's independent of one another).
    2. The values of the two population variances are equal, and their common value is unknown (this assumption works when the variances are roughly the same order of magnitude).

    t-Test: Size 3 vs. Size 4

      Size 3 Size 4
    Mean 2.15 2.42
    Variance 0.99 2.37
    Observations 16.00 16.00
    Pearson Correlation 0.77  
    Hypothesized Mean Difference 0.50  
    df 15.00  
    t Stat -3.05  
    P(T<=t) one-tail 0.00  
    t Critical one-tail 1.75  
    P(T<=t) two-tail 0.01  
    t Critical two-tail 2.13  

    At the hypothesized mean difference of 0.5 seconds (recognition speed), the null hypothesis is rejected at the .05 confidence level for size 3 vs. size 4.

    t-Test: Size 3 vs. Size 2

      Size 3 Size 2
    Mean 2.15 2.55
    Variance 0.99 2.60
    Observations 16.00 16.00
    Pearson Correlation 0.81  
    Hypothesized Mean Difference 0.25  
    df 15.00  
    t Stat -2.59  
    P(T<=t) one-tail 0.01  
    t Critical one-tail 1.75  
    P(T<=t) two-tail 0.02  
    t Critical two-tail 2.13  

    At the hypothesized mean difference of 0.25 seconds (recognition speed), the null hypothesis is rejected at the .05 confidence level for size 3 vs. size 2.

    An ANOVA test was preformed on the data. The single-factor ANOVA shows whether there are differences in true averages associated with the different treatments (sizes) of the factor. The null hypothesis staes that there are no differences between any of the population means, and the alternative hypothesis says that at least two means differ from one another.

    Single-Factor ANOVA

    Source of Variation SS df MS F P-value F crit
    Between Groups 2.68 4.00 0.67 0.32 0.86 2.49
    Within Groups 156.02 75.00 2.08      
    Total 158.69 79.00        

    Because F=0.32 is not at least F.05, 4, 75=2.49, the null hypothesis is not rejected at significance level .05. The sizes appear to be indistinguishable with respect to recognition time.

    ANOVA Data Summary

    Groups Count Sum Average Variance
    Size 1 16.00 42.86 2.68 2.39
    Size 2 16.00 40.82 2.55 2.60
    Size 3 16.00 34.42 2.15 0.99
    Size 4 16.00 38.67 2.42 2.37
    Size 5 16.00 41.33 2.58 2.05

    The following are Histograms for each of the thumbnail sizes.

    ChartObject Size 1 Histogram

    ChartObject Size 2 Histogram

    ChartObject Size 3 Histogram

    ChartObject Size 4 Histogram

    ChartObject Size 5 Histogram

    Some distributions conform more than others to a normal distribution (e.g. size 5 more than size 2), which might cast doubt on the preliminary assumption stated above for the ANOVA test. Under further analysis a scatter plot of the data suggested there was a valley near size 3, and which calls for a polynomial regression model of analysis. This analysis is too complex, however, for the current scope of this project, and calls for more data.

    Histogram Distrubution, Size 1-5

    Stacked Histogram

    2.10 Problems

    The problem of recognition is addressed lightly in this study. The notion of understanding content from a single picture is not addressed. We look for the minimum thumbnail size needed to "imagine" the content. Many times over, our visual system can correct for many anomolies in image content, but without basic geometrical information afforded with sequences of images (an active observer/ i.e. not a still picture), much understanding derived from a thumbnail will come from a top-down (model-fitting) approach to image recognition. That is, subjects will look for what they've seen before, not build a concept of the picture shown from scratch. The time to recognition, then, is the quantitative measurement of this process of model-fitting.

    We pursue the notion of an optimal size for thumbnails. What we are really asking is, "Is there an optimally sized thumbnail that can elicit the quickest model-fitting thought process?" The intrinsic answer is no, because the experience of the participant to the data directly affects the time to recognition.

    The Visual Basic program we designed for the study removes the thumbnail from the participant's screen upon a "Yes/No" response. The question immediately following asks the participant to describe what they saw, without any further visual cue. This design therefore requires active recall of the information presented in the thumbnail (see Pilot Study, above). The time involved between the information presentation and the request for recall fits within the short-term memory model. The 7 +/- 2 memory chunks (Miller, 1964) that a person can easily remember without special associations or techniques could normally be considered a factor in our subjective evaluation of the participant's recall ability. However, we accepted almost anything that came close to the description of the picture presented, and therefore allievated any memory recall concerns.

    All assumptions have been stated in the experimental design, and results.

    Next Section: Conclusions