|
|
|
|
|
Experiment
Introduction and Hypothesis
In this experiment, we investigated
ways of improving the layout and use of color in the user interface
of the Degree Navigator application. Our experiment explored the
idea that a suitable layout and judicious use of color in a user
interface can improve user performance. To test this, we developed
four alternate versions of the Degree Navigator interface, varying
the type layout and use of color. Subjects were asked to use one
version of the interface and answer questions about the information
presented to them. The questions pertained to how much progress
the student had made in completing various requirements, which courses
were taken to complete various requirements, whether or not certain
requirements had been fulfilled, and which courses the student was
currently taking. A complete list of questions is available in the
Appendix.
Null Hypothesis: There will not
be any statistically significant difference in the measured efficiency
between the four versions.
Hypothesis I: Users will answer questions more rapidly and with
fewer errors when using the progress-bar interface as compared to
the original island-based interface.
Hypothesis II: Users will answer questions more rapidly and with
fewer errors when using a 3-color interface as compared to an 8-color
interface.
Hypothesis III: The retention of information will be better with
progress-bar layout using 3-colors as compared to the other layouts.
Hypothesis IV: Users will prefer the 3-color progress-bar interface
to the other 3 interfaces.
Independent
and Dependent Variables
Independent variables
- Layout.
This variable has two treatments: the original island layout and
the progress-bar layout
- Amount
of Color. This variable also has two treatments. One version uses
eight colors, as in the original application. The other version
reduces the number of colors used to three.
Thus, this is
a 2x2 experiment design, where the four treatments are as follows:
- The original
degree navigator interface using eight colors.
- The original
degree navigator interface using three colors.
- The bar
layout interface using eight colors.
- The bar
layout interface using three colors.
Dependant
Variables
- Performance
time: time to correct completion of task questions
- Error
rate: error rate in answering task questions
- Retention
time and accuracy: time to correct completion, and accuracy rate
of retention questions.
- Subjective
Preference
This will be a between-groups experiment.
Pilot Study
Results
For our pilot test, we used four
test subjects, one per treatment. We used a 1024x768 screen resolution.
We gave two-minute long detailed instructions about the experiment
to the subjects. Each subject took about five minutes to answer
the seven retention questions based on one set of data, six minutes
to answer eight main task questions based on another set of data
and three minutes to complete the subjective satisfaction questionnaire.
Preliminary results showed that the design of the experiment was
effective, and was a valid metric. We found that the dimension of
8-color/3-color was implemented correctly, using similar color schemes
for the island and no-island conditions, so this dimension does
not need to be changed. Subjects expressed frustration at the island
layout, as we had hypothesized that they would. However, we found
a need for the redesign of several features in the experiment:
- Resized the test question program
window so that it did not overlap over the window for the experiment
- Edited the graphics for each
island so that the class names were visible. (This is a shortcoming
of the original Degree Navigator that we will correct in our version,
and we will certainly recommend that the Degree Navigator change
their version, too.)
- Made the central pie chart of
the island version a clickable icon, to show that the CS degree
has a requirement of 120 credits in order to graduate. This information
is also not available in the implementation of the Degree Navigator;
however, it is one of the most important questions that a student
must answer when viewing a transcript.
- Used capitalization for "credits"
and "courses" to highlight them in the questions asked
since all the subjects confused the two terms while performing
the tasks.
Subjects
The subjects for this experiment
were selected as to minimize the subject base and limit the statistical
interference that would be caused by a broad base. The subjects
were junior or senior University of Maryland computer science majors.
We tested 32 subjects, resulting in 8 per-treatment ratio. All subjects
participated on a voluntary basis. The subjects were between
the ages of 19 to 24; ten were female
and twenty-two were male. Subjects were first asked if they had
used the Degree Navigator application, and if they had, they were
assigned to treatments 3 and 4 (the progress bar layout), so that
their previous use of the application would not affect their performance.
Each subject completed the tasks using only one version of the interface,
as in the pilot study. They were presented with different sets of
data for retention and main questions.
Materials
Training
Since the subjects were junior
or senior University of Maryland computer science majors, they were
familiar with the various requirements of a degree in Computer Science.
Our experiment required the subjects to answer questions about information
displayed in the interface. Thus no additional training was necessary
for our experiment.
Tasks
Subjects used the version of the
application given to them loaded with a sample student's transcript
information. They were given a few minutes to look at the information,
and after a three-minute delay, were asked questions about the information
that was presented. Next, the application was loaded with another
sample transcript, and the subjects answered another set of questions.
All questionnaires were administered by a separate Visual Basic
application, which calculated the time to answer each question,
as well as the error rate.
Screenshots and executables of
the four treatment applications and the questionnaire application
are available in the Appendix. The testing
was conducted in the AV Williams Microsoft lab, due to the availability
of large, high-resolution monitors. The experiment software was
installed on four computers so that up to four subjects could be
tested at one time. All of the computers had the same type of hardware.
Two paper-based forms were prepared
for the experiment: the consent form and the subjective satisfaction
questionnaire. These are available in the Appendix.
Procedures and
Problems
A typical
session of the experiment consisted of the following steps:
- The subject read and signed
the consent form.
- The subject
was given instructions about the tasks to be performed in the
experiment.
- The subject
was presented with one version of the application loaded with
a student's transcript information. After a three-minute delay,
the questionnaire window popped up and the subject was asked to
close the application.
- The subject
answered seven retention questions about the information that
had been displayed in the application.
- The subject
was presented with the same version of the application, loaded
with another student's transcript information. The subject used
the application and answered eight questions while it was displayed.
- The subject filled out the subjective
satisfaction questionnaire.
We used
two sample student transcripts as data for our experiment. One student
had halfway completed his degree while the other was near completion.
We alternated use of the data sets for the retention questions and
the main questions; half of the subjects used the first data set
for the retention questions and the second data set for main questions,
the other half used the second data set for the retention questions
and the first data set for the main questions. This was done to
prevent any bias that may have resulted from the sample data that
was used.
Problems
We encountered one problem in running
the experiment. We had provided tooltips in our application telling
the user to click for more information about the requirement. However,
tooltips are only displayed on the application that has the input
focus, and the question application always had the input focus.
So users did not realize they should click for more information,
and had difficulty answering the questions. Once we realized this
problem, we discarded all the data that had been collected up to
that point, and informed all future subjects of the information
that was displayed in the tooltip before they began their task.
Results
Statistical
analysis of the raw data (see Appendix)
was performed using Microsoft Excel. Here we present our 2-way ANOVA
results. The ANOVA analysis will provide statistical information
about the nature of the correlation between all four sets of samples.
We also present graphs of various averages. The tables are separated
in 5 sections: Main Questions, Main Question Errors, Retention Questions,
Retention Question Errors and Subjective Satisfaction. Each section
has been analyzed with the same set of tools.
Main
Questions: Time to Correct Completion
ANOVA:
|
Source of Variation
|
SS
|
df
|
MS
|
F
|
P-value
|
F crit
|
|
Sample
|
239370.91
|
7
|
34195.84
|
2.37
|
0.07
|
2.66
|
|
Columns
|
21653.85
|
1
|
21653.85
|
1.50
|
0.24
|
4.49
|
|
Interaction
|
60672.50
|
7
|
8667.50
|
0.60
|
0.75
|
2.66
|
|
Within
|
231288.36
|
16
|
14455.52
|
|
|
|
|
|
|
|
|
|
|
|
|
Total
|
552985.62
|
31
|
|
|
|
|

The
analysis of the ANOVA reveals that all of the computed F-values
are less than the corresponding F-criticals. This tells us that
there was no statistically significant difference between the groups
in question. Hence, even though the means show that there was a
difference, it was not statistically valid. Thus, in this instance
we must accept the Null Hypothesis there was no statistically
significant difference between the samples.
Main
Question Error Rate
ANOVA:
|
Source of Variation
|
SS
|
df
|
MS
|
F
|
P-value
|
F crit
|
|
Sample
|
42.50
|
7
|
6.07
|
0.71
|
0.66
|
2.66
|
|
Columns
|
0.50
|
1
|
0.50
|
0.06
|
0.81
|
4.49
|
|
Interaction
|
8.50
|
7
|
1.21
|
0.14
|
0.99
|
2.66
|
|
Within
|
136.00
|
16
|
8.50
|
|
|
|
|
|
|
|
|
|
|
|
|
Total
|
187.50
|
31
|
|
|
|
|

Examining the ANOVA results we can see that for
all the tests, F-value was once again less than F-critical. Here,
we will have to accept the Null Hypothesis as well.
Retention
Questions: Time to Correct Completion
ANOVA:
|
Source of Variation
|
SS
|
df
|
MS
|
F
|
P-value
|
F crit
|
|
Sample
|
28081.56
|
7
|
4011.65
|
1.13
|
0.40
|
2.66
|
|
Columns
|
1354.86
|
1
|
1354.86
|
0.38
|
0.55
|
4.49
|
|
Interaction
|
21733.20
|
7
|
3104.74
|
0.87
|
0.55
|
2.66
|
|
Within
|
57037.31
|
16
|
3564.83
|
|
|
|
|
|
|
|
|
|
|
|
|
Total
|
108206.93
|
31
|
|
|
|
|

Here, the same case occurs. All
ANOVA F-values are less than the corresponding F-critical. Null
Hypothesis accepted.
Retention
Question Error Rate
ANOVA:
|
Source of Variation
|
SS
|
df
|
MS
|
F
|
P-value
|
F crit
|
|
Sample
|
44.47
|
7
|
6.35
|
1.53
|
0.23
|
2.66
|
|
Columns
|
0.03
|
1
|
0.03
|
0.01
|
0.93
|
4.49
|
|
Interaction
|
32.22
|
7
|
4.60
|
1.11
|
0.40
|
2.66
|
|
Within
|
66.50
|
16
|
4.16
|
|
|
|
|
|
|
|
|
|
|
|
|
Total
|
143.22
|
31
|
|
|
|
|

Again, all ANOVA F-values
are less than the corresponding F-critical value. Null Hypothesis
accepted.
Subjective Satisfaction
ANOVA:
|
Source of Variation
|
SS
|
df
|
MS
|
F
|
P-value
|
F crit
|
|
Sample
|
548.50
|
7
|
78.36
|
0.62
|
0.73
|
2.66
|
|
Columns
|
722.00
|
1
|
722.00
|
5.70
|
0.03
|
4.49
|
|
Interaction
|
817.00
|
7
|
116.71
|
0.92
|
0.52
|
2.66
|
|
Within
|
2028.00
|
16
|
126.75
|
|
|
|
|
|
|
|
|
|
|
|
|
Total
|
4115.50
|
31
|
|
|
|
|

At
this point, ANOVA produces an F-value that is greater than the F-critical
(5.70 > 4.49). Also the corresponding P-value is less than our
significance factor, which for our experiment was taken to be 0.05
(0.03 < 0.05). This tells us that there is a statistically significant
variation between the columns. This means that for this part of
the experiment, there was a difference between the 3-color and 8-color
schemes. Examining the Subjective Satisfaction Averages graph, we
can conclude that for the 3-color treatment, the bar layout was
superior, by a margin of ~10%. In the case of the 8-color treatment,
the island layout was preferred by a margin of ~13%.
Discussion
User Performance
We had hypothesized that the progress
bar layout versions of the degree navigator would result in users'
answering questions faster and with fewer errors than the island
layout version, in both the task questions and the retention questions.
Though the mean time to completion and error rate for this treatment
were less than for the others, the results were not statistically
significant, due to the high variance of the data.
There are likely several reasons
why we were unable to show any statistically significant improvement
in user performance. First, after the experiment was over, we realized
that we had neglected to choose only subjects who did not have color
blindness. One subject commented in his subjective satisfaction
questionnaire that he was color blind, but since the questionnaires
were not linked to the task data, we were unable to discard his
data. Another issue was that it seemed that non-native speakers
of English performed the tasks noticeably slower than native speakers
of English. This is quite understandable, as there was a great deal
of terse requirement information to read in the user interface (the
wording of requirements in the original Degree Navigator was retained).
Thus, it was likely that the reading speed of these subjects affected
their performance.
Yet another issue that may have
influenced our results was that even the final progress bar layout
version did not have, in our opinion, a perfect user interface.
Due to the fact that we wished to isolate the variables of layout
and color, we made no other changes between the original Degree
Navigator and our progress bar layout version. One change that we
would have liked to make, for example, would be to label each requirement
box with a short descriptive title. Another good change would be
to provide a help button, and to make sub-requirements within a
requirement clickable to provide more information.
In addition, the subject base was
junior and senior computer science majors. We chose this group of
subjects because juniors and seniors are more familiar with degree
requirements than are freshman and sophomores. However, had we included
freshman and sophomores, we may have been able to prove our hypothesis.
It is likely that juniors and seniors who used the island layout
were able to answer questions as quickly as those who used the bar
layout, because the degree program was already so familiar to them
that they were able to make sense of the islands. Finally, it may
be interesting to see whether the performance of students of other
majors would be the same. Computer science majors are typically
expert computer users, and can, in general, learn a new application
more quickly than other users. The difference in layout and color
in the different treatments may have been more noticeable had less
computer-experienced users been chosen.
Subjective Preference
We had predicted greater user preference
for the 3-color, progress bar layout over all the other treatments,
but the results show only a preference for the 3-color progress
bar layout over the 3-color island layout. In fact, for the 8-color
treatments, the island layout was preferred over the progress bar
layout. The reason for this is likely the fact that the colors in
the 8-color bar layout version seem even more obtrusive than in
the island layout because the layout is less graphical in nature.
Moreover, a progress bar is a familiar user interface element, and
users expect multiple progress bars in an application to have the
same color. Islands are not a familiar user interface element, thus
users did not have any expectations about how they should look.
In general, it is difficult to
collect meaningful subjective preference data in a between-group
experiment, because subjects do not have anything to compare the
interface to. Different users have widely varying expectations,
as evidenced by subject comments. One subject gave especially high
preference scores to treatment 1 (island layout, 8 colors), which
we had hypothesized would receive low scores. When we asked him
informally why he thought the interface was so good, he replied
that it was better than the paper form outlining requirements given
by the department. When we showed him treatment 4 (bar layout, 3
colors), he agreed that he thought it was far superior to the version
he had initially been shown. At the other end of the spectrum were
subjects with high expectations. One subject, who had been assigned
to a progress bar layout treament, wrote a page's worth of comments,
suggesting a much more complex application with more functionality.
This subject gave very low scores to the application.
While the subjective preference
data did not confirm our hypothesis, the user comments were quite
helpful. Almost all users who received an 8-color treatment commented
that the colors were unnatural and straining. They also commented
that though there were color duplications, there was no logic behind
them, one major flaw that we also noticed in the color scheme of
the original Degree Navigator. Many subjects who used the island
layout indicated their frustration with the user interface.
Comments about island layout:
- I think the format would be considered very confusing
to most people
- It was difficult to determine requirements for
each "island"
- I had trouble figuring out what each island block
represented
- Circular layout unclear, I'd prefer a chart/linear
graph. Headings should stand out more.
Comments about 8-color versions:
- Good application, but colors are garish
- Colors are very distracting
- Too many colors
Comments about bar layout:
- The blocks are a nice way to see when something
has been completed
- Clear, easy to understand layout
In addition, several subjects commented that in
the 3-color bar layout versions, the difference in color (dark blue
vs. light blue) between current and completed courses was not great
enough.
Conclusions
User interface metaphors can be
employed to increase users' initial familiarity of the target domain,
but they also play a more important role--metaphors aid users in
understanding the target domain by enabling them to develop a mental
model for it.
As noted in the discussion of the
results, there was no statistically significant difference in user
performance between the degree navigator with a new metaphor and
the original degree navigator. Color also had no statistically significant
effect. However, we did find that when the number of colors were
reduced, users preferred the new layout. Individual user comments
also confirmed that the progress bar layout with less color was
preferred.
Since there are no tools that can
mechanically generate effective metaphors for user interfaces, metaphors
must be created, analyzed and evaluated on a case by case basis.
Though we did not show statistically significant performance improvements
in comparing two metaphors, if more metaphors were used, performance
improvements may have resulted.
Advice for Practitioners
Though there was not a statistically
significant improvement in user performance when the user interface
metaphor was changed, users did prefer the progress bar layout when
the number of colors were reduced. Since there are a number of widely
accepted reasons for limiting the number color in GUIs, this result
shows that designers should use progress bar indicators in applications
similar to this one.
Suggestions
For Future Researchers
A primary consideration for future
researchers is to increase the number of subjects used in the experiment.
A further suggestion is to make the subject group either more heterogeneous
by including subjects from all departments, or more completely representative
of the students at large. Also, the same experiment could be repeated,
but the tasks assigned be changed. Future researchers may want to
examine the performance of users after a longer learning time; these
experienced subjects may perform differently than first-time users.
This experiment serves as a good
basis for future studies. Future researchers should build on this
experiment and explore the many issues that accompany it:
- Explore using other metaphors. We tested
only one metaphor, progress bars, to display transcript information.
- Compare a graphical layout to text. Due
to a limited number of subjects, we were unable to compare graphical
interfaces with a textual tabular layout.
- Redesign further.
The Degree Navigator violates many accepted design principles.
We evaluated it only by isolating the design metaphor and use
of color as variables. However, other shortcomings, such as poor
labeling and lack of online help may have influenced our results.
- Conduct a within-subjects study. We were
unable to show a statistically significant subjective preference
for the modified layout, but this was probably due to the fact
that each subject only used one version of the application. We
are confident that testing subjective preference in a within-subjects
experiment would yield statistically significant preference for
the progress bar layout interface with reduced color, over all
other treatments.
Several subjects commented that
they felt the colors chosen for current and completed courses were
too similar to one another and were hard to differentiate. This
indicates that more research should be done to see which color choices
would be optimal. We believed that the colors for current and completed
courses should be similar in hue but vary in brightness, but based
on these user comments, perhaps a difference in hue would be preferable.
In addition, the Degree Navigator
interface is meant to give a quick overview of a student's transcript;
for actual graduation audits, a full text-based printout consisting
of several pages of data is used. The Degree Navigator software
also offers several report format options, with differing levels
of detail. Future research could be directed toward determining
which report formats are appropriate for which tasks, and tracking
the preferences of different levels of users (students, advisors,
administrators, and so on). Also, the multiple colors and non-geometric
shapes used in the graphical layout may lead to fatigue and eyestrain
for users such as advisors, who must look at many different transcripts
during advising office hours, which can last for two hours or more.
Future research in this area may prove helpful to these users.
Refined Theory
Though subjective statisfication
results show that users preferred the progress bar layout over the
island layout when fewer colors were used, user performance did
not improve when colors were reduced or when the metaphor was improved.
The Degree Navigator's interface seems garish to many users, but
its design does not seem to interfere with the function of the program:
to allow users to quickly absorb information about their transcript.
According to the results of this study, the Degree Navigator's unconventional
interface design seems to defy the conventional logic about the
interaction between interface design and user performance.
References
- Shneiderman,
Ben. Designing the User Interface: Strategies for Effective
Human-Computer Interaction. Reading, MA: Addison-Wesley, 1998.
- Elissa
D. Smilowitz, "Do
Metaphors Make Web Browsers Easier to Use?" (http://www.baddesigns.com/mswebcnf.htm).
Accessed on April 24, 2001.
- Mayer,
R. "Different problem-solving competencies established in
learning computer programming with and without meaningful models."
Journal of Educational Psychology, 67, 725-734.
- Halasz,
F. and Moran, T. "Mental Models and problem solving in using
a calculator." Proceedings of the CHI'83 Conference on
Human Factors in Computer Systems, 212-216.
- Foss,
D., Rosson, M. B., and Smith, P. "Reducing manual labor:
An experimental analysis of learning aids for a text editor."
Proceedings of Human Factors in Computer Systems Conference,
National Bureau of Standards, Gaithersburg, Maryland, March.
- T.
D. Erickson, "What Metaphors Mean", in On Metaphor,
S. Sacks (ed.), University of Chicago, Illinois, 1979, pp. 29-45.
- Brad
A. Myers: "The Importance of Percent-Done Progress Indicators
for Computer-Human Interfaces" in Proceedings CHI'85 Human
Factors in Computing Systems (San Francisco, April 14-18,
1985), ACM, New York, pp.11-17.
- Aaron
Marcus, Graphical User Interfaces, Chapter 19, Handbook of
Human-Computer Interaction, 2nd Edition.
- Thomas
S. Tullis, Screen Design, Chapter 23, Handbook of Human-Computer
Interaction, 2nd Edition.
- Lynda
Weinman, "Color
Aesthetics for the Web". (http://www.webtechniques.com/archives/1998/03/desi/)
Accessed on April 19, 2001.
- Carroll, J.,
Mack, R., and Kellog, W. Interface Metaphors and User Interface
Design. In Helander, M., ed., Handbook of Human Computer
Interaction. Amsterdam: Elsevier Science Publisher, 1988,
pp. 67-85.
|
|
|
|