4. Discussion

The goal of this experiment was to determine the effects of two different video browsing designs, storyboard (SB) and slide show (SS), and two different information seeking tasks, gist determination (GD) and object recognition (OR), on user performance and subjective satisfaction. It was hypothesized that performance with the SS interface would be better than SB for the GD task because SS retained the temporal component of the original video, a potentially important factor in understanding gist. It was also hypothesized that the SB interface would boost performance for the OR task over SS because users could rescan each of the stills for target objects. Furthermore, based on previous studies, it was hypothesized that users would derive greater satisfaction from SB over SS overall.

4.1. Task and Interface Design -- Performance Measures

User performance resulted in no statistically significant main effects or interaction effect between the interface design-task type variables. The mean performance accuracy for each treatment was in the mid-70% range. In order to test whether covariates such as time to completion or spatial visual ability might be masking the effects, ANCOVAs were conducted. Because there was no upper limit to the amount of time that could be spent by users in carrying out the assigned task (i.e., in using a video browsing interface), time to completion was considered a potential covariate. For example, it would be expected that subjects who spent a greater amount of time viewing the video browser would have a better score. Spatial visual ability (SVA) is a measure of people's ability to form mental models of images in three-dimensional space. SVA might also work with images in the temporal dimension. It was possible that subjects with higher SVA would perform better with the interface or task that requires "mental manipulation of time" than those with lower SVA. Hence, SVA was also considered a possible covariate. However, controlling for time to completion and SVA did not explain any additional variability.

One reason for the lack of statistically significant differences may be the small sample size actually used for the data analysis. Of the 34 subjects who participated in the experiment, data collected electronically from only 20 of the subjects were complete (i.e., results were available for all four task x interface design treatments). One data file was damaged and could not be recovered (subject 004). Twelve subjects' transaction logs could not be used due to a programming error in the randomization module of the test system. Finally, the record of subject 017 could not be used because of missing data (no answer was recorded for the object recognition task using the storyboard interface design). The fact that slightly over a third of the data could not be used severely decreased the power of this portion of the study. In general, a larger sample size would have provided more statistical power and would have helped to detect even a small effect.

Another potential problem was the level of difficulty of the tasks. Ideally the tasks should have provided a wide range of scores to help differentiate any true differences in interface design. However, as implemented, accuracy scores in the mid-70% range seem to indicate that the tasks were too simplistic and not truly representative of the variable to be measured. For example, only eight people were consulted in creating "concept statements" for the GD task. A greater number of people in the "control group" would have resulted in more "representative" concept statements. Greater validity testing of the tasks (GD and OR questions) may have helped increase the resolution of the test instruments.

4.2. Immediate Subjective Satisfaction

An advantage of capturing subjective satisfaction immediately after each experimental trial is that the experience is fresh in the subjects' minds and reflects their unbiased impressions. The results showed a total of three statistically significant differences.

The first, for the question "Completing this task was..." showed that users felt that the slide show (SS) design (overall mean = 6.1) was more difficult than the storyboard (SB) interface (overall mean = 5.0) across both tasks. [Note: the overall scale was 1 (easy) to 9 (difficult), with 5 being the midpoint.] This result is similar to that reported previously by Ding et al. (1997) that user satisfaction drops considerably as key frame rate increases in spite of a smaller decrease in user performance at the corresponding key frame rates. Many users commented in the questionnaire that the display rate (3 key frames per second) was perceived to be "too fast" in spite of the lack of difference in performance between the two display types.

The other two statistically significant differences were in response to the question "The display technique for the given task was..." For the display designs, SS (overall mean = 6.35) was perceived to be easier to use than SB (overall mean = 4.15). This result contradicts the result found earlier in question 1, where SS was perceived to be more difficult than SB. The explanation may rest with the differences in perception of tasks: OR (overall mean = 5.6) was felt to be better than GD (overall mean = 5.0). As opposed to the first question, which asked about the task only, this question asked about how well a particular display design worked for a specific task. Thus, it is not necessarily a contradiction. This would also be consistent with the hypothesis that GD is facilitated by retaining some of the temporal relationships in the SS display. A simpler explanation is that the results are anomalous, due to the way the question was structured: the lower numbers in the Likert scale corresponded to "hard to use." In question 1, which respondents most likely answered first, the scale was in the opposite direction -- "difficult" corresponded to the higher numbers. It is possible that users were influenced by question 1 and answered question 3 intending for the higher numbers to indicate greater difficulty. This second explanation is consistent with the results in the Overall User Satisfaction section below, where SS was rated "more difficult" than SB in all six of the questions.

4.3. User Satisfaction Analysis (Post-Test)

A questionnaire with general demographics information and subjective satisfaction with the different interface types was given at the end of the experiment to capture subjects' overall reactions after experiencing all four treatment conditions. For each of the six questions, subjects found the SB interface statistically significantly "better" (e.g., wonderful, satisfying, easy, flexible, easy-to-learn, and straight- forward) than the SS design. Comments that were elicited support these results: One subject (007) did point out differences in the usefulness of the designs for the different tasks: "Knowing that I had to answer specific questions made the storyboard option more appealing; whereas, simply just browsing around I would prefer the slideshow interface." Consistent with the findings of Ding et al. (1997), in spite of no performance differences between tasks for each interface, subjects were less satisfied with the slide show interface.
Continue

[ Abstract | Credits | 1. Introduction | 2. Experiment | 3. Results | 4. Discussion | 5. Conclusions | Acknowledgements | References | Appendices ]