4. Discussion

4.1 Object Recognition

The main objective of this study is to define elements that may be used in building user-oriented interfaces that employ key frame surrogates for browsing video data. Results of this study show that use of simultaneous viewing of screens are a possible design consideration, and may be used for both browsing for specific objects and for gist comprehension of the original video.

One interesting finding of this study was that subjects in conditions with two and three simultaneously displayed video screens, had basically the same performance on the object recognition task. Although they had lower accuracy scores than subjects who only viewed one video, both these conditions were able to identify objects with much greater accuracy than with four video screens displayed at once. Overall, their performance (especially for three videos at once) was not as poor as originally expected. Therefore, it may be postulated that human capabilities for divided attention among the video screens, especially for object recognition do not limit viewing to only one screen. However, since performance degrades so dramatically when four videos are displayed at once, it can be assumed that attention resources are insufficient to compensate for the demand required to identify objects in this case.

In the Ding(1996) study, it was found that slower speeds were required for identifying individual objects. One contradiction in subject perception of video display and their actual abilities was found by comparing evaluation results to those of the object recognition task. In the evaluation data, subjects in conditions 2,3, and 4 perceived the videos as very much "faster" than those in video condition 1. Adding up the number of videos seems to increase the perceived speed, however, subjects still performed better on the "faster" two and three conditions than condition 4. From this we may deduce that although there is the perception of "faster" speeds, it does not affect actual performance in dividing attention between the video displays.

The low accuracy scores obtained by subjects in condition 4 for the object recognition tasks indicate that either distrators are being selected more often or that subjects are not identifying many objects. In the first case, it can be said that subjects are relying on scripts or schemas to identify objects in the film. That is, they are viewing mainly for comprehension and then build a schema for what they believe to be the story behind the video. This schema is used to identify objects that fit this "story". In the second case, subjects simply cannot attend to all videos at once well enough to identify objects in the videos or grasp enough meaning to construct a schema. The sentence analysis indicates that probably both are occurring as subjects complain of not being able to identify the meaning of the video and of those that do identify a meaning, it is an incorrect and incomplete analysis of the actual video content.

Interestingly, after viewing the video key frames a second time, subjects did not improve on the object recognition task. Because of their closeness to the results found after one time through the video it can be assumed that this is the best human performance that can be achieved for each type of display (1 through 4 simultaneously). We may be able to say that an "upper limit" for divided attention was achieved by the subjects or that the maximum amount of resources ( Kahneman, 1973) were allocated for each task.


4.2 Multiple Choice Comprehension Questions

The comprehension questions used for this study were found to be insignificant. One reason for this result is that the comprehension question, given as multiple choice, may not really be a good measure of subjects ability to "get the gist" of the video. Instead, subjects are choosing from several predetermined selections which does not indicate what they believe is the meaning of the video. The fact that after the second time through no differences were found between the groups confirms that the questions produced equally poor performance from all conditions. Looking at the sentence analysis, however, depicts a different picture of subject comprehension which may further indicate that these questions are not a good measure of comprehension. Instead, we will use the content analysis of the sentences written to provide more insight to the subject's comprehension of the videos. It is important to note that the multiple choice question was displayed to subjects after they wrote sentences describing what was believed to be the meaning of the videos. This eliminated any bias that would have been introduced otherwise.


4.3 Sentence Content Analysis

One of the most striking aspects of the sentence analysis is the large number of incorrect people, places, actions/concepts, and objects "identified" or talked about by subjects in the conditions with three and four simultaneous video clips. Due to fact that attention is divided between so many screens, subjects in these conditions may have constructed erroneous schemas relating to the gist of the video. A greater number of attention resources available for comprehending the videos may actually be "used up" than for the object recognition where performance was not found to be as diminished in these conditions. More correct analyses of the content of videos were found by subjects in conditions with one and two video screens.


Continue
Return to Comprehension and Object Recognition Capabilities for Presentations of Simultaneous Video Key Frame Surrogates