4. Discussion
The goal of this experiment was to determine the effects of two different
video browsing designs, storyboard (SB) and slide show (SS), and two
different information seeking tasks, gist determination (GD) and object
recognition (OR), on user performance and subjective satisfaction.
It was hypothesized that performance with the SS
interface would be better than SB for the GD task because SS retained the
temporal component of the original video, a potentially important factor in understanding
gist. It was also hypothesized that the SB interface would boost performance
for the OR task over SS because users could rescan
each of the stills for target objects. Furthermore, based on
previous studies, it was hypothesized that users would derive greater satisfaction
from SB over SS overall.
4.1. Task and Interface Design -- Performance Measures
User performance resulted in no statistically
significant main effects or interaction effect between the interface
design-task type variables. The mean performance accuracy for each treatment
was in the mid-70% range. In order to test whether covariates such as time to
completion or spatial visual ability might be masking the effects, ANCOVAs
were conducted. Because there was no upper limit to the amount of time that
could be spent by users in carrying out the assigned task (i.e., in using a
video browsing interface), time to completion was considered a potential
covariate. For example, it would be expected that subjects who spent a greater amount of time
viewing the video browser would have a better score.
Spatial visual ability (SVA) is a measure of people's ability to form mental
models of images in three-dimensional space. SVA might also work with images in
the temporal dimension. It was possible that subjects with higher SVA would
perform better with the interface or task that requires "mental manipulation
of time" than those with lower SVA. Hence, SVA was also considered a possible
covariate. However, controlling for time to completion and SVA did not
explain any additional variability.
One reason for the lack of statistically significant differences
may be the small sample size actually used for the data analysis.
Of the 34 subjects who participated in the experiment, data collected
electronically from only 20 of the subjects were complete (i.e., results were
available for all four task x interface design treatments). One data file was
damaged and could not be recovered (subject 004). Twelve subjects' transaction
logs could not be used due to a programming error in the randomization module
of the test system. Finally, the record of subject 017 could not be used
because of missing data (no answer was recorded for the object recognition task using the storyboard
interface design). The fact that slightly over a third of the data could not
be used severely decreased the power of this portion of the study. In general,
a larger sample size would have provided more statistical power and would
have helped to detect even a small effect.
Another potential problem was the level of difficulty of the tasks. Ideally
the tasks should have provided a wide range of scores to help differentiate
any true differences in interface design. However, as implemented, accuracy
scores in the mid-70% range seem to indicate that the tasks were too simplistic
and not truly representative of the variable to be measured. For example, only
eight people were consulted in creating "concept statements" for the GD task.
A greater number of people in the "control group" would have resulted in more
"representative" concept statements. Greater validity testing of the tasks
(GD and OR questions) may have helped increase the resolution of the test
instruments.
4.2. Immediate Subjective Satisfaction
An advantage of capturing subjective satisfaction immediately after each
experimental trial is that the experience is fresh in the subjects' minds
and reflects their unbiased impressions. The results showed a total of
three statistically significant differences.
The first, for the question "Completing this task was..." showed that
users felt that the slide show (SS) design (overall mean = 6.1) was more
difficult than the storyboard (SB) interface (overall mean = 5.0) across
both tasks. [Note: the overall scale was 1 (easy) to 9 (difficult), with 5
being the midpoint.] This result is similar to that reported previously by
Ding et al. (1997) that user satisfaction
drops considerably as key frame rate increases in spite of a smaller decrease
in user performance at the corresponding key frame rates. Many users
commented in the questionnaire that the display rate (3 key frames per second)
was perceived to be "too fast" in spite of the lack of difference in
performance between the two display types.
The other two statistically significant differences were in response to the
question "The display technique for the given task was..." For the display
designs, SS (overall mean = 6.35) was perceived to be easier to use than SB
(overall mean = 4.15). This result contradicts the result found earlier
in question 1, where SS was perceived to be more difficult than SB.
The explanation may rest with the differences in perception of tasks:
OR (overall mean = 5.6) was felt to be better than GD (overall mean = 5.0).
As opposed to the first question, which asked about the task only, this
question asked about how well a particular display design worked for a
specific task. Thus, it is not necessarily a contradiction. This would also
be consistent with the hypothesis that GD is facilitated by retaining some of
the temporal relationships in the SS display. A simpler
explanation is that the results are anomalous, due to
the way the question was structured: the lower numbers in the Likert scale
corresponded to "hard to use." In question 1, which respondents
most likely answered first, the scale was in the opposite direction --
"difficult" corresponded to the higher numbers. It is possible that users
were influenced by question 1 and answered question 3 intending for the higher
numbers to indicate greater difficulty. This second explanation is consistent
with the results in the Overall User Satisfaction section below, where SS
was rated "more difficult" than SB in all six of the questions.
A questionnaire with general demographics information and subjective
satisfaction with the different interface types was given at the
end of the experiment to capture subjects' overall reactions
after experiencing all four treatment conditions. For each of the six
questions, subjects found the SB interface statistically significantly "better"
(e.g., wonderful, satisfying, easy, flexible, easy-to-learn, and straight-
forward) than the SS design. Comments that were elicited support these results:
- "In the slide show interface, the speed of changing a frame was too fast
and the screen was too small and unclear."
- "The slide show was too fast."
- "The slide show is too inflexible -- too little time for each image. The
storyboard is much better, but image close-ups would be nice..."
- "I prefer to see more at once, as in the storyboard UI. I can
look at the parts that I want to. It was hard to wait for the same image to
come around again when I wanted to study a single picture in the slide show
UI.
- "Storyboard interface is a lot easier to use."
- "The slide show made me feel rushed, plus I couldn't relate the pictures
as well as with the storyboard. Felt pressured by the slide show as well."
- "Storyboard allows user to conceptualize the topic in a meaningful way.
The slide show interface seems random without providing the user enough time
to get a firm 'fix' of the topic."
- "The slide show went so fast -- I could not really see all of the slides --
even when looking at them over and over, they did not register. I thought
the storyboard is more effective in that it was easier to remember -- had
time for images to register."
- "The storyboard would probably allow me to more quickly assess the video
because I do not have to pay such close attention. With the slide show I also
had to keep looking down to see which slide was first; with the storyboard I
knew the exact order."
One subject (007) did point out differences in the usefulness of the designs
for the different tasks: "Knowing that I had to answer specific questions
made the storyboard option more appealing; whereas, simply just browsing
around I would prefer the slideshow interface." Consistent with the findings
of Ding et al. (1997), in spite of no performance differences between tasks
for each interface, subjects were less satisfied with the slide show interface.
Continue
[ Abstract | Credits |
1. Introduction | 2. Experiment |
3. Results | 4. Discussion |
5. Conclusions |
Acknowledgements |
References | Appendices ]