Abstract
Introduction
Experiment
Results
Discussion
Conclusions
References
Appendices
Acknowledgements
Credits
Feedback
SHORE 2001 : Layout and Readability :
The "Degree Navigator" Nightmare: Taming an Overly Graphical User Interface

Experiment

Introduction and Hypothesis

In this experiment, we investigated ways of improving the layout and use of color in the user interface of the Degree Navigator application. Our experiment explored the idea that a suitable layout and judicious use of color in a user interface can improve user performance. To test this, we developed four alternate versions of the Degree Navigator interface, varying the type layout and use of color. Subjects were asked to use one version of the interface and answer questions about the information presented to them. The questions pertained to how much progress the student had made in completing various requirements, which courses were taken to complete various requirements, whether or not certain requirements had been fulfilled, and which courses the student was currently taking. A complete list of questions is available in the Appendix.

Null Hypothesis: There will not be any statistically significant difference in the measured efficiency between the four versions.
Hypothesis I: Users will answer questions more rapidly and with fewer errors when using the progress-bar interface as compared to the original island-based interface.
Hypothesis II: Users will answer questions more rapidly and with fewer errors when using a 3-color interface as compared to an 8-color interface.
Hypothesis III: The retention of information will be better with progress-bar layout using 3-colors as compared to the other layouts.
Hypothesis IV: Users will prefer the 3-color progress-bar interface to the other 3 interfaces.

Independent and Dependent Variables

Independent variables

  1. Layout. This variable has two treatments: the original island layout and the progress-bar layout
  2. Amount of Color. This variable also has two treatments. One version uses eight colors, as in the original application. The other version reduces the number of colors used to three.

Thus, this is a 2x2 experiment design, where the four treatments are as follows:

  1. The original degree navigator interface using eight colors.
  2. The original degree navigator interface using three colors.
  3. The bar layout interface using eight colors.
  4. The bar layout interface using three colors.

Dependant Variables

  1. Performance time: time to correct completion of task questions
  2. Error rate: error rate in answering task questions
  3. Retention time and accuracy: time to correct completion, and accuracy rate of retention questions.
  4. Subjective Preference

This will be a between-groups experiment.

Pilot Study Results

For our pilot test, we used four test subjects, one per treatment. We used a 1024x768 screen resolution. We gave two-minute long detailed instructions about the experiment to the subjects. Each subject took about five minutes to answer the seven retention questions based on one set of data, six minutes to answer eight main task questions based on another set of data and three minutes to complete the subjective satisfaction questionnaire. Preliminary results showed that the design of the experiment was effective, and was a valid metric. We found that the dimension of 8-color/3-color was implemented correctly, using similar color schemes for the island and no-island conditions, so this dimension does not need to be changed. Subjects expressed frustration at the island layout, as we had hypothesized that they would. However, we found a need for the redesign of several features in the experiment:

  1. Resized the test question program window so that it did not overlap over the window for the experiment
  2. Edited the graphics for each island so that the class names were visible. (This is a shortcoming of the original Degree Navigator that we will correct in our version, and we will certainly recommend that the Degree Navigator change their version, too.)
  3. Made the central pie chart of the island version a clickable icon, to show that the CS degree has a requirement of 120 credits in order to graduate. This information is also not available in the implementation of the Degree Navigator; however, it is one of the most important questions that a student must answer when viewing a transcript.
  4. Used capitalization for "credits" and "courses" to highlight them in the questions asked since all the subjects confused the two terms while performing the tasks.

Subjects

The subjects for this experiment were selected as to minimize the subject base and limit the statistical interference that would be caused by a broad base. The subjects were junior or senior University of Maryland computer science majors. We tested 32 subjects, resulting in 8 per-treatment ratio. All subjects participated on a voluntary basis. The subjects were between the ages of 19 to 24; ten were female and twenty-two were male. Subjects were first asked if they had used the Degree Navigator application, and if they had, they were assigned to treatments 3 and 4 (the progress bar layout), so that their previous use of the application would not affect their performance. Each subject completed the tasks using only one version of the interface, as in the pilot study. They were presented with different sets of data for retention and main questions.

Materials

Training

Since the subjects were junior or senior University of Maryland computer science majors, they were familiar with the various requirements of a degree in Computer Science. Our experiment required the subjects to answer questions about information displayed in the interface. Thus no additional training was necessary for our experiment.

Tasks

Subjects used the version of the application given to them loaded with a sample student's transcript information. They were given a few minutes to look at the information, and after a three-minute delay, were asked questions about the information that was presented. Next, the application was loaded with another sample transcript, and the subjects answered another set of questions. All questionnaires were administered by a separate Visual Basic application, which calculated the time to answer each question, as well as the error rate.

Screenshots and executables of the four treatment applications and the questionnaire application are available in the Appendix. The testing was conducted in the AV Williams Microsoft lab, due to the availability of large, high-resolution monitors. The experiment software was installed on four computers so that up to four subjects could be tested at one time. All of the computers had the same type of hardware.

Two paper-based forms were prepared for the experiment: the consent form and the subjective satisfaction questionnaire. These are available in the Appendix.

Procedures and Problems

A typical session of the experiment consisted of the following steps:

  1. The subject read and signed the consent form.
  2. The subject was given instructions about the tasks to be performed in the experiment.
  3. The subject was presented with one version of the application loaded with a student's transcript information. After a three-minute delay, the questionnaire window popped up and the subject was asked to close the application.
  4. The subject answered seven retention questions about the information that had been displayed in the application.
  5. The subject was presented with the same version of the application, loaded with another student's transcript information. The subject used the application and answered eight questions while it was displayed.
  6. The subject filled out the subjective satisfaction questionnaire.

We used two sample student transcripts as data for our experiment. One student had halfway completed his degree while the other was near completion. We alternated use of the data sets for the retention questions and the main questions; half of the subjects used the first data set for the retention questions and the second data set for main questions, the other half used the second data set for the retention questions and the first data set for the main questions. This was done to prevent any bias that may have resulted from the sample data that was used.

Problems

We encountered one problem in running the experiment. We had provided tooltips in our application telling the user to click for more information about the requirement. However, tooltips are only displayed on the application that has the input focus, and the question application always had the input focus. So users did not realize they should click for more information, and had difficulty answering the questions. Once we realized this problem, we discarded all the data that had been collected up to that point, and informed all future subjects of the information that was displayed in the tooltip before they began their task.

 

Results

Statistical analysis of the raw data (see Appendix) was performed using Microsoft Excel. Here we present our 2-way ANOVA results. The ANOVA analysis will provide statistical information about the nature of the correlation between all four sets of samples. We also present graphs of various averages. The tables are separated in 5 sections: Main Questions, Main Question Errors, Retention Questions, Retention Question Errors and Subjective Satisfaction. Each section has been analyzed with the same set of tools.

Main Questions: Time to Correct Completion

ANOVA:

Source of Variation

SS

df

MS

F

P-value

F crit

Sample

239370.91

7

34195.84

2.37

0.07

2.66

Columns

21653.85

1

21653.85

1.50

0.24

4.49

Interaction

60672.50

7

8667.50

0.60

0.75

2.66

Within

231288.36

16

14455.52

Total

552985.62

31

 

 

 

 

 

 

The analysis of the ANOVA reveals that all of the computed F-values are less than the corresponding F-criticals. This tells us that there was no statistically significant difference between the groups in question. Hence, even though the means show that there was a difference, it was not statistically valid. Thus, in this instance we must accept the Null Hypothesis – there was no statistically significant difference between the samples.

Main Question Error Rate

ANOVA:

Source of Variation

SS

df

MS

F

P-value

F crit

Sample

42.50

7

6.07

0.71

0.66

2.66

Columns

0.50

1

0.50

0.06

0.81

4.49

Interaction

8.50

7

1.21

0.14

0.99

2.66

Within

136.00

16

8.50

Total

187.50

31

 

 

 

 

 

 

Examining the ANOVA results we can see that for all the tests, F-value was once again less than F-critical. Here, we will have to accept the Null Hypothesis as well.

Retention Questions: Time to Correct Completion

ANOVA:

Source of Variation

SS

df

MS

F

P-value

F crit

Sample

28081.56

7

4011.65

1.13

0.40

2.66

Columns

1354.86

1

1354.86

0.38

0.55

4.49

Interaction

21733.20

7

3104.74

0.87

0.55

2.66

Within

57037.31

16

3564.83

Total

108206.93

31

 

 

 

 

 

 

Here, the same case occurs. All ANOVA F-values are less than the corresponding F-critical. Null Hypothesis accepted.

Retention Question Error Rate

ANOVA:

Source of Variation

SS

df

MS

F

P-value

F crit

Sample

44.47

7

6.35

1.53

0.23

2.66

Columns

0.03

1

0.03

0.01

0.93

4.49

Interaction

32.22

7

4.60

1.11

0.40

2.66

Within

66.50

16

4.16

Total

143.22

31

 

 

 

 

 

 

Again, all ANOVA F-values are less than the corresponding F-critical value. Null Hypothesis accepted.

Subjective Satisfaction

ANOVA:

Source of Variation

SS

df

MS

F

P-value

F crit

Sample

548.50

7

78.36

0.62

0.73

2.66

Columns

722.00

1

722.00

5.70

0.03

4.49

Interaction

817.00

7

116.71

0.92

0.52

2.66

Within

2028.00

16

126.75

Total

4115.50

31

 

 

 

 

 

 

At this point, ANOVA produces an F-value that is greater than the F-critical (5.70 > 4.49). Also the corresponding P-value is less than our significance factor, which for our experiment was taken to be 0.05 (0.03 < 0.05). This tells us that there is a statistically significant variation between the columns. This means that for this part of the experiment, there was a difference between the 3-color and 8-color schemes. Examining the Subjective Satisfaction Averages graph, we can conclude that for the 3-color treatment, the bar layout was superior, by a margin of ~10%. In the case of the 8-color treatment, the island layout was preferred by a margin of ~13%.

 

Discussion

User Performance

We had hypothesized that the progress bar layout versions of the degree navigator would result in users' answering questions faster and with fewer errors than the island layout version, in both the task questions and the retention questions. Though the mean time to completion and error rate for this treatment were less than for the others, the results were not statistically significant, due to the high variance of the data.

There are likely several reasons why we were unable to show any statistically significant improvement in user performance. First, after the experiment was over, we realized that we had neglected to choose only subjects who did not have color blindness. One subject commented in his subjective satisfaction questionnaire that he was color blind, but since the questionnaires were not linked to the task data, we were unable to discard his data. Another issue was that it seemed that non-native speakers of English performed the tasks noticeably slower than native speakers of English. This is quite understandable, as there was a great deal of terse requirement information to read in the user interface (the wording of requirements in the original Degree Navigator was retained). Thus, it was likely that the reading speed of these subjects affected their performance.

Yet another issue that may have influenced our results was that even the final progress bar layout version did not have, in our opinion, a perfect user interface. Due to the fact that we wished to isolate the variables of layout and color, we made no other changes between the original Degree Navigator and our progress bar layout version. One change that we would have liked to make, for example, would be to label each requirement box with a short descriptive title. Another good change would be to provide a help button, and to make sub-requirements within a requirement clickable to provide more information.

In addition, the subject base was junior and senior computer science majors. We chose this group of subjects because juniors and seniors are more familiar with degree requirements than are freshman and sophomores. However, had we included freshman and sophomores, we may have been able to prove our hypothesis. It is likely that juniors and seniors who used the island layout were able to answer questions as quickly as those who used the bar layout, because the degree program was already so familiar to them that they were able to make sense of the islands. Finally, it may be interesting to see whether the performance of students of other majors would be the same. Computer science majors are typically expert computer users, and can, in general, learn a new application more quickly than other users. The difference in layout and color in the different treatments may have been more noticeable had less computer-experienced users been chosen.

Subjective Preference

We had predicted greater user preference for the 3-color, progress bar layout over all the other treatments, but the results show only a preference for the 3-color progress bar layout over the 3-color island layout. In fact, for the 8-color treatments, the island layout was preferred over the progress bar layout. The reason for this is likely the fact that the colors in the 8-color bar layout version seem even more obtrusive than in the island layout because the layout is less graphical in nature. Moreover, a progress bar is a familiar user interface element, and users expect multiple progress bars in an application to have the same color. Islands are not a familiar user interface element, thus users did not have any expectations about how they should look.

In general, it is difficult to collect meaningful subjective preference data in a between-group experiment, because subjects do not have anything to compare the interface to. Different users have widely varying expectations, as evidenced by subject comments. One subject gave especially high preference scores to treatment 1 (island layout, 8 colors), which we had hypothesized would receive low scores. When we asked him informally why he thought the interface was so good, he replied that it was better than the paper form outlining requirements given by the department. When we showed him treatment 4 (bar layout, 3 colors), he agreed that he thought it was far superior to the version he had initially been shown. At the other end of the spectrum were subjects with high expectations. One subject, who had been assigned to a progress bar layout treament, wrote a page's worth of comments, suggesting a much more complex application with more functionality. This subject gave very low scores to the application.

While the subjective preference data did not confirm our hypothesis, the user comments were quite helpful. Almost all users who received an 8-color treatment commented that the colors were unnatural and straining. They also commented that though there were color duplications, there was no logic behind them, one major flaw that we also noticed in the color scheme of the original Degree Navigator. Many subjects who used the island layout indicated their frustration with the user interface.

Comments about island layout:

  • I think the format would be considered very confusing to most people
  • It was difficult to determine requirements for each "island"
  • I had trouble figuring out what each island block represented
  • Circular layout unclear, I'd prefer a chart/linear graph. Headings should stand out more.

Comments about 8-color versions:

  • Good application, but colors are garish
  • Colors are very distracting
  • Too many colors

Comments about bar layout:

  • The blocks are a nice way to see when something has been completed
  • Clear, easy to understand layout

In addition, several subjects commented that in the 3-color bar layout versions, the difference in color (dark blue vs. light blue) between current and completed courses was not great enough.

 

Conclusions

User interface metaphors can be employed to increase users' initial familiarity of the target domain, but they also play a more important role--metaphors aid users in understanding the target domain by enabling them to develop a mental model for it.

As noted in the discussion of the results, there was no statistically significant difference in user performance between the degree navigator with a new metaphor and the original degree navigator. Color also had no statistically significant effect. However, we did find that when the number of colors were reduced, users preferred the new layout. Individual user comments also confirmed that the progress bar layout with less color was preferred.

Since there are no tools that can mechanically generate effective metaphors for user interfaces, metaphors must be created, analyzed and evaluated on a case by case basis. Though we did not show statistically significant performance improvements in comparing two metaphors, if more metaphors were used, performance improvements may have resulted.

Advice for Practitioners

Though there was not a statistically significant improvement in user performance when the user interface metaphor was changed, users did prefer the progress bar layout when the number of colors were reduced. Since there are a number of widely accepted reasons for limiting the number color in GUIs, this result shows that designers should use progress bar indicators in applications similar to this one.

Suggestions For Future Researchers

A primary consideration for future researchers is to increase the number of subjects used in the experiment. A further suggestion is to make the subject group either more heterogeneous by including subjects from all departments, or more completely representative of the students at large. Also, the same experiment could be repeated, but the tasks assigned be changed. Future researchers may want to examine the performance of users after a longer learning time; these experienced subjects may perform differently than first-time users.

This experiment serves as a good basis for future studies. Future researchers should build on this experiment and explore the many issues that accompany it:

  1. Explore using other metaphors. We tested only one metaphor, progress bars, to display transcript information.
  2. Compare a graphical layout to text. Due to a limited number of subjects, we were unable to compare graphical interfaces with a textual tabular layout.
  3. Redesign further. The Degree Navigator violates many accepted design principles. We evaluated it only by isolating the design metaphor and use of color as variables. However, other shortcomings, such as poor labeling and lack of online help may have influenced our results.
  4. Conduct a within-subjects study. We were unable to show a statistically significant subjective preference for the modified layout, but this was probably due to the fact that each subject only used one version of the application. We are confident that testing subjective preference in a within-subjects experiment would yield statistically significant preference for the progress bar layout interface with reduced color, over all other treatments.

Several subjects commented that they felt the colors chosen for current and completed courses were too similar to one another and were hard to differentiate. This indicates that more research should be done to see which color choices would be optimal. We believed that the colors for current and completed courses should be similar in hue but vary in brightness, but based on these user comments, perhaps a difference in hue would be preferable.

In addition, the Degree Navigator interface is meant to give a quick overview of a student's transcript; for actual graduation audits, a full text-based printout consisting of several pages of data is used. The Degree Navigator software also offers several report format options, with differing levels of detail. Future research could be directed toward determining which report formats are appropriate for which tasks, and tracking the preferences of different levels of users (students, advisors, administrators, and so on). Also, the multiple colors and non-geometric shapes used in the graphical layout may lead to fatigue and eyestrain for users such as advisors, who must look at many different transcripts during advising office hours, which can last for two hours or more. Future research in this area may prove helpful to these users.

Refined Theory

Though subjective statisfication results show that users preferred the progress bar layout over the island layout when fewer colors were used, user performance did not improve when colors were reduced or when the metaphor was improved. The Degree Navigator's interface seems garish to many users, but its design does not seem to interfere with the function of the program: to allow users to quickly absorb information about their transcript. According to the results of this study, the Degree Navigator's unconventional interface design seems to defy the conventional logic about the interaction between interface design and user performance.

 

References

  1. Shneiderman, Ben. Designing the User Interface: Strategies for Effective Human-Computer Interaction. Reading, MA: Addison-Wesley, 1998.

  2. Elissa D. Smilowitz, "Do Metaphors Make Web Browsers Easier to Use?" (http://www.baddesigns.com/mswebcnf.htm). Accessed on April 24, 2001.

  3. Mayer, R. "Different problem-solving competencies established in learning computer programming with and without meaningful models." Journal of Educational Psychology, 67, 725-734.

  4. Halasz, F. and Moran, T. "Mental Models and problem solving in using a calculator." Proceedings of the CHI'83 Conference on Human Factors in Computer Systems, 212-216.

  5. Foss, D., Rosson, M. B., and Smith, P. "Reducing manual labor: An experimental analysis of learning aids for a text editor." Proceedings of Human Factors in Computer Systems Conference, National Bureau of Standards, Gaithersburg, Maryland, March.

  6. T. D. Erickson, "What Metaphors Mean", in On Metaphor, S. Sacks (ed.), University of Chicago, Illinois, 1979, pp. 29-45.

  7. Brad A. Myers: "The Importance of Percent-Done Progress Indicators for Computer-Human Interfaces" in Proceedings CHI'85 Human Factors in Computing Systems (San Francisco, April 14-18, 1985), ACM, New York, pp.11-17.

  8. Aaron Marcus, Graphical User Interfaces, Chapter 19, Handbook of Human-Computer Interaction, 2nd Edition.

  9. Thomas S. Tullis, Screen Design, Chapter 23, Handbook of Human-Computer Interaction, 2nd Edition.

  10. Lynda Weinman, "Color Aesthetics for the Web". (http://www.webtechniques.com/archives/1998/03/desi/) Accessed on April 19, 2001.

  11. Carroll, J., Mack, R., and Kellog, W. Interface Metaphors and User Interface Design. In Helander, M., ed., Handbook of Human Computer Interaction. Amsterdam: Elsevier Science Publisher, 1988, pp. 67-85.