Discussion |
The outcome of our results are such that we cannot make conclusive
remarks about optimal coding schemes in Spotfire. Multiple factors
contributed to this. These include subjects' learning of the query
tasks (creating a possible bias for faster performance at the end of
the experiment), large variances (standard deviations) in our
statistical data. In addition, our results are based on only
13 subjects, of which may or may not reflect results we would
have gotten with a larger subject pool.
Some subjects said that they preferred to look only at the query devices, ignoring the starfield display completely. Because the task was the same in each treatment(case), subjects most likely memorized or became familiar with the sequence of mouse movements, clicks, etc. to accomplish the task. We attribute the improvement in performance to this learning phenomenon.
The summaries below discuss the interactions between the rows and columns
of the 2x3 Anova.
Additionally, our MS Powerpoint Discussion slides are available.
Click here to see the [t-test] interaction
between the two treatments of Popularity & Length on the 2x3 Anova.
| Cases w/ starfield display: Popularity & Length | Mean | Variance |
| Case 1 (color-coded) | 70.23 | 609.86 (Standard Deviation = 24.70) |
| Case 4 (graded-shading) | 53.31 | 321.90 (Standard Deviation = 17.94) |
The fact that the data points are closely packed together on the starfield display may negate any benefits of either color or shading schemes. The large difference between means and variances makes it difficult to draw significant conclusions. We feel that the faster time for case 4 is attributed to subjects learning the interface/task.
Click here to see the [t-test] interaction
between the two treatments of Subject & Length on the 2x3 Anova.
| Cases w/ starfield display: Subject & Length | Mean | Variance |
| Case 2 (color-coded) | 57.77 | 732.03 (Standard Deviation = 27.06) |
| Case 5 (graded-shading) | 50.00 | 576.83 (Standard Deviation = 24.02) |
The small difference between both means and variances suggest that color coding and graded shading has minimum affect on user performance.
Click here to see the [t-test] interaction
between the two treatments of Subject & Popularity on the 2x3 Anova.
| Cases w/ starfield display: Subject & Popularity | Mean | Variance |
| Case 3 (color-coded) | 50.31 | 326.90 (Standard Deviation = 18.08) |
| Case 6 (graded-shading) | 37.62 | 148.92 (Standard Deviation = 12.20) |
Ideally the difference between this setup and previous setup(case #2 & #5) should lead to similar results. However, the significantly faster performance of case #6 can be attributed largely to subject learning tasks through repetition.
| Mean | Variance | |
| Case 1 | 70.23 | 609.86 (Standard Deviation = 24.70) |
| Case 2 | 57.77 | 732.03 (Standard Deviation = 27.06) |
| Case 3 | 50.31 | 326.90 (Standard Deviation = 18.08) |
Case 1 (Popularity & Length) yields a significant higher mean value (slower performance) than case 2 (Subject & Length) -- 70.23 & 57.77. The variances for the 2 cases appear to be close --( Standard Deviation 24.7 & 27.06). The high mean difference and small variance differences allow us to attribute the difference between the 2 means to the difference in query setup( Popularity & Length vs. Subject & Length).
Case 2 (Subject & Length) yields a slight higher mean value (slower performance) than case 3 (Subject & Popularity) -- 57.77 & 50.31. The variances for the 2 cases however appear to be significant -- (Standard Deviation 27.06 & 18.08). The small mean difference and the large variance differences shows that it is plausible that the difference between the means are attributed to the chance variation.
| Mean | Variance | |
| Case 4 | 53.31 | 321.90 (Standard Deviation = 17.94) |
| Case 5 | 50.00 | 576.83 (Standard Deviation = 24.02) |
| Case 6 | 37.62 | 148.92 (Standard Deviation = 12.20) |
Case 1 (Popularity & Length) yields a slight higher mean value (slower performance) than case 2 (Subject & Length) -- 53.31 & 50.00. The variances for the 2 cases appear to be significant (Standard Deviation 17.94 & 24.02). The small mean difference and large variance differences suggest the difference between the 2 means could be affect by chance variation.
Case 2 (Subject & Length) yields a significant higher mean value (slower performance) than case 3 (Subject & Popularity) -- 50.00 & 37.62. The variances for the 2 cases also appear to be significantly. The large mean difference and the large variance differences suggest thereis a chance that variance affects the means greatly in this particular case.
The basic 2 x 3 setup up to now makes 2 assumptions. The three query setups have the same magnitude of affect on the each of the 2 color/shade schemes. The same additive affect works vice versa.
However, the other possibility exists that there is interaction between each of the 2 x 3; that is each individual query has different magnitude of affect on individual color/shade scheme, and vice versa. So each item in the 2 x 3 combination does not necessary have correlation with the others.
Click here to see the [row-wise] interaction analysis in our 2x3 Anova.
The table below contains the means of our test results in 2 x 3 format.
| Popularity & Length | Subject & Length | Subject & Popularity | |
| Color-coded | 70.231(mean) | 57.769(mean) | 50.308(mean) |
| Graded-shading | 53.308(mean) | 50.000(mean) | 37.615(mean) |
To simplify the explanation, assume each number between (1) and (6) corresponds to the 2 x 3 matrix above.
| Popularity & Length | Subject & Length | Subject & Popularity | |
| Color-coded | (1) | (2) | (3) |
| Graded-shading | (4) | (5) | (6) |
Ideally, the trend of curves between [(1)(2)(3)] and [(4)(5)(6)] should be similar due to the similar query setups. However, our actual data depicts a graph with (1) much higher (slower performance) than the "hypothetical" trend and (6) much lower (faster performance) than the "hypothetical" trend. The results of these 2 cases may be due to individualized interaction between each of the instances in the 2 x 3 matrix above.
The nature of closely packed data points on the starfield display in (1) may account for the slow performance time. On the other hand, it may be because this was the first task, and the users were simply unfamiliar with the task.
Case (6) yielded significantly faster performance since the requirement is to have an exact subject (drama) and popularity (most popular), to meet the requirement for length, users simply have to manipulate the length slider within the 1 1/2 - 2 hour range. On the other hand, the results could also be attributed largely to subjects learning the tasks through repetition.