Experiment

2.1  Introduction and Hypothesis

The experiment explores different coding schemes in Spotfire on the performance of human test subjects.

First, the database used in this experiment is a collection of movies and films, with each data item having identifiable characteristics such as year of release, length, title, subject, popularity, etc. It is similar to an earlier dynamic query interface involving movie/film searches, called FilmFinder [2]

We wanted to see how user performance(time to complete a search query) would be affected by manipulating the characteristics on the starfield display's Y- and X-axes. On the starfield display itself, data items are marked a distinct color according to a user-specified characteristic (e.g., subject). Action movies may be marked red, and Westerns marked blue. Alternatively, data items may be marked by a shading scheme rather than distinct colors. Suppose we have length as the shading characteristic. The shortest movies may be marked dark blue--the longer the movie, the lighter the shade of blue.

Before we discuss the hypothesis, we need to describe the test cases and variables. We performed six test cases using the Spotfire 'Film' sample data set.

The first set of cases(1-3) involve variations of the characteristics(Subject, Length, and Popularity) assigned to the Y- and X-axes. Each data item is marked a distinct color according to subject, as described above. The last set of cases(4-6) use the same characteristic variations as in the first set of cases, respectively (i.e. 1&4, 2&5, 3&6). The difference in this last set is that each data item uses a graded shading instead of distinct colors. In addition, the shading represents the one characteristic that is not displayed on either axis(e.g. if Subject and Popularity are on the Y- and X-axes, then the shading of the data items is graded according to Length.

The cases and their respective starfield displays are shown below:


Case 1 & 4 - Starfield display:
  • Y-axis: Popularity
  • X-axis: Length
    Relevant query devices for filtering: Length, Subject, Popularity, and Zoom-in
    Case 1: Color coding by Subject Case 4: Graded shading by Subject

    Case 2 & 5: - Starfield display:
  • Y-axis: Subject
  • X-axis: Length
    Relevant query devices for filtering: Length, Popularity, and Zoom-in
    Case 2: Color coding by Subject Case 5: Graded shading by Popularity

    Case 3 & 6: - Starfield display:
  • Y-axis: Subject
  • X-axis: Popularity
    Relevant query devices for filtering: Length, Popularity, and Zoom-in
    Case 3: Color coding by Subject Case 6: Graded shading by Length

    Independent variables: Starfield display setup; the three possible combinations of pairs (Popularity & Length, Subject & Length, Subject & Popularity).

    Dependent variables: Time to accomplish the task(measured with a stopwatch), and subjective satisfaction, collected by an informal post-experiment survey.

    Hypothesis: We hypothesize that Case 3 will yield the quickest results. Since the requirement is to have an exact subject(drama) and popularity(most popular), to meet the requirement for length, users simply have to manipulate the length slider within the 1 1/2 - 2 hour range. Then, since all Dramas are along one horizontal line and clearly marked yellow, a movie that meets the query requirements is along the rightmost side of the starfield display.

    Test subjects must make use at least two query devices in each case, in order to positively identify the most popular drama film between 1 1/2 to 2 hours long. Subjects will not be allowed to randomly 'click-and- compare' on film icons to find the desired film. Note that Case 1 has four relevant query devices rather than three as in the others, which may seem like more work. However, Case 2 and 3 place all eligible data along one axis, which makes it harder for users to see data clearly, but eliminates the need to filter the subject('drama' in this case).

    2.2  Pilot Study Results

    Pilot studies involving two subjects showed that we needed to give more time to teaching subjects the Spotfire interface, and to make further clarification of the tasks involved. Both pilot study subjects had to stop and ask questions during the first case.

    2.3  Experiment Subjects

    13 students from a junior-level undergraduate Computer Science course volunteered to participate in this experiment. All had considerable experience with personal computers and graphical user interfaces.

    2.4  Materials

    2.5  Procedures

    2.6  Problems

    One concern is of skewed data due to subjects becoming more adept with the Spotfire interface by the time they perform the last case. To fix this we could have randomized the order in which we tested the cases with test subjects. With only 13 subjects however, we were afraid to randomize test cases. Ideally, we would have had many trial runs, each run with a different case testing order.

    Our other concern is of relevance and usefulness of the results. We think that although the specific results of this experiment may be practically useless in the real world, the findings may say something about the nature of data in an application such as Spotfire. For example, the attribute with the greatest frequency of data may be most desirable placed on one axis in a 2-D starfield display--or, it may not be, but better to have it only in a range slider. More details will be provided in section 5.1.


    Continue --> Results

    Return to Main
    Return to SHORE '98