The main objective of this study is to define elements that may be used in building user-oriented interfaces that employ key frame surrogates for browsing video data. Results of this study show that use of simultaneous viewing of screens are a possible design consideration, and may be used for both browsing for specific objects and for gist comprehension of the original video.
One interesting finding of this study was that subjects in conditions with two and three simultaneously displayed video screens, had basically the same performance on the object recognition task. Although they had lower accuracy scores than subjects who only viewed one video, both these conditions were able to identify objects with much greater accuracy than with four video screens displayed at once. Overall, their performance (especially for three videos at once) was not as poor as originally expected. Therefore, it may be postulated that human capabilities for divided attention among the video screens, especially for object recognition do not limit viewing to only one screen. However, since performance degrades so dramatically when four videos are displayed at once, it can be assumed that attention resources are insufficient to compensate for the demand required to identify objects in this case.
In the Ding(1996) study, it was found that slower speeds were required for identifying individual objects. One contradiction in subject perception of video display and their actual abilities was found by comparing evaluation results to those of the object recognition task. In the evaluation data, subjects in conditions 2,3, and 4 perceived the videos as very much "faster" than those in video condition 1. Adding up the number of videos seems to increase the perceived speed, however, subjects still performed better on the "faster" two and three conditions than condition 4. From this we may deduce that although there is the perception of "faster" speeds, it does not affect actual performance in dividing attention between the video displays.
The low accuracy scores obtained by subjects in condition 4 for the object recognition tasks indicate that either distrators are being selected more often or that subjects are not identifying many objects. In the first case, it can be said that subjects are relying on scripts or schemas to identify objects in the film. That is, they are viewing mainly for comprehension and then build a schema for what they believe to be the story behind the video. This schema is used to identify objects that fit this "story". In the second case, subjects simply cannot attend to all videos at once well enough to identify objects in the videos or grasp enough meaning to construct a schema. The sentence analysis indicates that probably both are occurring as subjects complain of not being able to identify the meaning of the video and of those that do identify a meaning, it is an incorrect and incomplete analysis of the actual video content.
Interestingly, after viewing the video key frames a second time, subjects did not improve on the object recognition task. Because of their closeness to the results found after one time through the video it can be assumed that this is the best human performance that can be achieved for each type of display (1 through 4 simultaneously). We may be able to say that an "upper limit" for divided attention was achieved by the subjects or that the maximum amount of resources ( Kahneman, 1973) were allocated for each task.
Continue
Return to Comprehension and Object
Recognition Capabilities for Presentations of Simultaneous Video Key Frame
Surrogates