Human-Computer Interaction: Principles of Interface Design
Student No. 12397901
Project Supervisor: Andrew Marriott
Table
of Contents
1.2 What
is Human-Computer Interaction?
1.3 Communicating
with Users: The importance of interaction design
1.3.1 Mental and Conceptual Models
1.3.2 Interaction paradigms, idioms and
metaphors
1.4.1 Graphical User Interfaces
1.4.3 Multi-modal User Interfaces
2.0 Issues in Human-Computer Interaction
2.1.1 Understanding Accessibility Barriers
2.1.3 Accessibility Design Guidelines
2.1.4 Personal Assistive Technologies
2.2.2 Internationalisation and Localisation
2.3.2 Usability Inspection Methods
3.0 Graphical User Interface Design
3.1 Graphic
Design Principles for Computers
3.1.2 Visual Variables: Scale, Contrast
& Proportion
3.1.3 Perceptual Organisation and Visual
Structure
3.1.4 Module and Program: Grid-based Design
3.1.5 Semiotics: Image and Representation
3.2 Widgets:
the Building Blocks of Graphical User Interfaces
3.2.4 Smartening Up Applications
Figure 1. Some of the disciplines
involved in the field of Human-Computer Interaction
Figure 3. Mental model vs
Implementation model.
Figure 4. Conceptual model
and mental models.
Figure 6. A simple example
of a Persona for Betty the Warehouse Manager
Figure 8. 1973 Xerox Alto
(left).
Figure 9. 1984 Apple
Macintosh (right).
Figure 10. Early Apple
Macintosh advertisement
Figure 12. Avatar-Conference,
a multi-modal interface using avatars to represent humans
Figure 13. The ishihara plate is commonly used to test for
red-green colour blindness.
Figure 14. The differences between Deuteranopia, Protanopia and
Tritanopia.
Figure 15. The effects of tunnel vision 12
Figure 16. Pictures showing the effects of eye conditions
commonly found in older people
Figure 17. Chromostereopsis. The red should appear closer to the
eye than the blue
Figure 18. Lightness differences between foreground and
background colours.
Figure 19. Contrast differences between hues in hemispheres of
the colour wheel 15
Figure 20. Contrast differences between adjacent colours on the
colour wheel 15
Figure 21. A sample of
some international keyboard layouts from Microsoft
Figure 22. Simplicity and
Elegance.
Figure 23. Microsoft's Bob
interface.
Figure 24. Bertin's
retinal variables.
Figure 31. Two
Complementary colour schemes based on red and green.
Figure 32. Effects of
desaturation and greyscale conversion of colour images.
Figure 33. VisiBone’s
web-safe palette of 216 web-safe colours
Figure 34. Categories of
type (after Williams 1994)
Figure 35. A Multiple
Document Interface application in Microsoft Word (left).
Figure 36. A multipaned
window and Tabbed Document Interface in Microsoft Excel (right).
Figure 37. A tabbed
document interface employed in Mozilla.
Figure 38. A cascading
menu in Microsoft Word.
Figure 39. Microsoft
Word’s expanding menu system,
Figure 40. Icons and
accelerator keys in Paint Shop Pro v7.
Figure 41. Accelerator
keys in Microsoft Word.
Figure 42. Some of GTK’s
predefined dialog boxes.
Figure 43. A sample of
button layouts from the TestGTK suite.
Figure 46. The drop-down
list used in Microsoft Word.
Figure 47. Bounded entry
controls.
Figure 48. A collapsible
pane structure in Adobe’s Acrobat Reader.
Figure 51. Mozilla
Firefox’s “About” box.
Figure 52. Splash screen
from Lavasoft’s Ad-Aware.
Figure 53. The KDE desktop
environment provides an excellent example of intelligent programming.
Figure 54. A selection of
icons commonly found on toolbars.
Figure 55. A customised
toolbar in Microsoft Word.
Humans
interact with computers in many ways, and the interface between humans and the
computers they use is crucial to facilitating this interaction. Desktop
applications, internet browsers, handheld computers, and computer kiosks make
use of the prevalent Graphical User Interfaces (GUI) of today. Voice User
Interfaces (VUI) are used for speech recognition and synthesising systems, and
the emerging multi-modal and gestalt User Interfaces (gUI) allow humans to
engage with embodied character agents in a way that cannot be achieved with
other interface paradigms. This project broadly investigates these paradigms,
and the importance of, and issues associated with, interaction design, and then
focuses on the GUI design of desktop applications and websites.
Human-Computer Interaction (HCI) is both an art and a science. The interdependence of a software system’s functionality and its interface means that software designers cannot afford to favour one over the other. If the interface is well designed, it will allow the system’s functionality to support the user’s task. However, if the interface is poor, the functionality is obscured and users will have trouble accomplishing their task (Dix et al. 2004:p.110). The IEEE/ACM curriculum council also include HCI as one of the core knowledge focus groups in their computing curriculum (IEEE/ACM 2001).
This
project is a broad investigation into the issues surrounding Human-Computer
Interaction. It will:
· Identify the ways in which humans interact with computers, and the roles of different types of user interfaces within these contexts.
· Examine some current issues in HCI and their impact on interface and interaction design.
· Investigate fundamental principles for effective interface design.
· Investigate in further detail, the principles for good graphical user interface design.
The theoretical overview will be accompanied by a tutorial outline (Section 5.0) that investigates one of today’s prominent interaction paradigms – the world wide web. It investigates graphical user interface design in websites, and looks at some issues associated with accessibility and usability. It also explores the use and importance of Cascading Style Sheets in web design.
The desire to build more effective weapons during World War II inspired a heightened interest in the study of interaction between humans and machines, a challenge that was readily taken up by researchers of the day. The Ergonomics Research Society, founded in 1949, was primarily concerned with the physical characteristics of machines and systems, and their effect on user performance. The field of Ergonomics (or Human Factors) is concerned with user performance in the context of any mechanical, computer, or manual system. With use of the computer becoming more widespread, more researchers began to specialise in studying the interaction between people and computers, concentrating on the physical, psychological and theoretical aspects of this interaction (Dix et al. 2004). Thus, Human-computer interaction was born.
The field of information science has also influenced the development of the field of HCI. Information science is an old discipline that came into existence before technology, and is concerned with the management and manipulation of information within an organisation. The introduction of technology has greatly influenced the way in which information can be stored, accessed and utilised, and has led to the rise of systems analysis, which matches the technology in the workplace to the requirements and constraints of the task (Dix et al. 2004).
Although many other disciplines are concerned with HCI, it is especially important in computer science and systems design, because it involves the design, implementation and evaluation of interactive computer systems in the context of the user’s task and work (Dix et al. 2004). The ideal designer of an interactive system would have expertise in such diverse fields as (Dix et al. 2004):
· Psychology and cognitive science for knowledge of the user’s perceptual, cognitive and problem-solving skills.
· Ergonomics for the user’s physical capabilities.
· Sociology to understand the wider context of the interaction.
· Computer science and engineering to be able to build the necessary technology.
· Graphic design to produce a pleasing visual interface.
· Technical writing to produce the manuals.
· Business to be able to market the product.
HCI is clearly a multi-disciplinary subject (see Figure 1), and designing an effective interactive system from a single discipline in isolation is almost impossible. Computer scientists, however, are particularly interested in the practicalities of how they can use the principles and methods from each HCI discipline to assist them in designing better systems. Acquiring an understanding of the theory is important, but knowing how to apply the theory to the problem at hand is equally valuable (Dix et al. 2004).

Figure 1. Some of the disciplines involved in the field of Human-Computer Interaction[1]
The first section of this document covers the more theoretical aspects of human-computer interaction. The document then proceeds to illustrate the application of these theories to interface design principles, in order to enhance the usability of a software product.
One of the greatest challenges facing a software designer is understanding what a user requires from a product. To do this, the designer must have at least a basic understanding of mental models and other psychological theories and their application to software design. Since the user is interacting with the computer in order to accomplish something, the software interface is crucial to facilitating the user’s goals and tasks. The interface, which typically comprises nearly half of the lines of code of a software product (Milewski 2004), is where the designer must consider the implications of how the software influences and anticipates the user’s thought processes during their interaction.

Figure 2. Software designers design products for humans to use, yet knowledge
of users and their mental processes is often elusive[2].
Mental models are psychological representations of real or imaginary situations. The mind constructs “small-scale models” of reality in order to reason, to anticipate events, and to underlie explanation (Craic, cited in Hudson 2004). The structure of the mental model corresponds to what it represents, and users acquire their mental models through interaction and explanation. In particular, a user’s mental model of a software product, and their interaction with it, is defined by the way in which users perceive the jobs they want to do and how the program helps them to do it (Cooper & Reimann 2003).
Mental models have the following characteristics (Dix et al. 2004):
· They are often partial
· They are unstable and subject to change
· They can be internally inconsistent
· They are often unscientific and may be based on superstition rather than evidence
· They are often based on incorrect interpretation of the evidence
Mental models often take into consideration existing conventions that humans commonly use to interpret the world. Ideally, these conventions should be followed in design. However, if they are to be contravened, explicit support should be provided to enable people to form the correct mental models for the product (Dix et al. 2004).
The example of dining out at a restaurant is often used to describe how someone might form a mental model. For instance, if a person mentions that they went to a restaurant for lunch, the listener may assume a certain sequence of events occurred during that interaction. The mental model may go something like this:
1. The person walked into the restaurant through a door, and was greeted by the Maitre d’.
2. The Maitre d’ showed the person to a table.
3. Another waiter presented the person with a menu.
4. The person ordered their food and drink.
This example illustrates the subjectiveness of mental models, and their basis on personal experience instead of scientific method. If someone is accustomed to a formal, sit-down restaurant, then a visit to a buffet restaurant may be confusing at best, or unpleasant at worst. This indicates a mismatched mental model, which the person must then alter to incorporate their new experience.
Most software reflects the implementation model of its design (i.e. the logical structure of the program), instead of the user’s goals and the tasks required to accomplish them. However, the user’s mental model of a software product is often distinctly different from the software’s implementation model, because the complexity of the software’s implementation can obscure its functionality from the user’s perspective. This gives rise to a third model in the digital world: the conceptual model (Hudson 2004), which is also known as the represented model (Cooper & Reimann 2003). These terms have been used to refer to the mental model a designer intends their user to follow when using their product. That is, they reflect the way designers choose to represent the workings of the program to the user. Conceptual design techniques aim to specifically assist a user in understanding a system (Newman & Lamming 1995). If the conceptual model of the system is substantially different from the user’s mental model, the user may find the system difficult to use. User interfaces that are based on the software’s conceptual model and assist the user to form a matching mental model, make a software product easier to use (Cooper & Reimann 2003).
Figure 3 illustrates the difference between the implementation model and a user’s mental model. The user forms a mental model of how they expect a software product to work. In an ideal situation, the user will be able to approach a new program and complete their desired task in exactly the manner that they expect the program to work. The conceptual (or represented) model reflects the way software designers choose to represent to the user how the program works. This is an aspect of design that developers have great control over, and the closer it matches the mental model of the user, the easier the user will find the program to use (Cooper & Reimann 2003).

Figure 3. Mental model vs Implementation model.
The
closer the represented (or conceptual) model of a software product is to the
user’s mental model, the easier the program will be to use (Cooper & Reimann
2003:p.23).
Figure 4 illustrates how Jasc’s Paint Shop Pro allows the user to see, both in preview and in real-time, how changing the values of certain parameters actually affects the picture s/he is manipulating. The user is more likely to be thinking in terms of how the final picture will look, rather than the numerical values that need to be manipulated to achieve the desired look.

Figure 4. Conceptual model and mental models.
Jasc’s Paint Shop Pro illustrates how the product’s conceptual model can be made to more closely match a user’s mental model.
Training, documentation and interaction are all ways in which a user can acquire an appropriate conceptual model of a product; however, for general-purpose systems, interaction is by far the most realistic and effective (Hudson 2004). To create an easy to use system, the conceptual model must be:
· Deliberately designed.
· Simple enough to be understood through interaction.
· Appropriate to the users’ tasks.
It should also use familiar terms, provide adequate feedback, and be consistent with users’ expectations (Hudson 2004).
Conceptual models provide many benefits to software design (Hudson 2004). They:
· Provide an opportunity for simplification and innovation.
· Define concepts and terms for the user interface.
· Provide a framework for implementation – the ‘core’ model is elaborated and views and other interface components are added.
· Provide a basis for object oriented development.
· Provide control over “feature bloat”.
Reverse-engineering of conceptual models can also be used as a basis for user testing of a system, to evaluate whether or not the users’ mental model matches the designer’s conceptual model of a product. Techniques for developing a conceptual model are beyond the scope of this project, but provide an interesting possibility for future study.
Interaction
paradigms serve as illustrations of the ways in which humans interact with
computers, and successful paradigms are ones commonly believed to enhance the
usability of computer systems. New paradigms often arise through exploring
current idioms, and pushing those boundaries to create innovative products (Dix et al. 2004).
Metaphors
make use of existing conceptual models (Hudson 2004), and are used to teach new
concepts in terms of those that are already understood. They have been used
successfully to describe the functionality of many interaction widgets, and
have contributed greatly to commercial successes in computing. The success of
the GUI desktop metaphor in linking computer file manipulation tasks with
filing tasks in a typical office environment, initially makes the computerised
tasks easier to understand. The spreadsheet metaphor for accounting has also
been a resounding success (Dix et al. 2004).
There has
recently been an upsurge in the use of anthropomorphic metaphors, fuelled by
increasing interest in multi-modal interfaces (see section 1.4.3). For example, MetaFace (Marriott & Beard 2004) uses the anthropomorphic metaphor
of the virtual “friend” to assist a user with web searching. Anthropomorphic
metaphors can reduce a user’s cognitive load[3]
in using a product (Marriott & Beard 2004), but they must be carefully
implemented because a personality clash with the user, as reported by Marriott (2003) in his evaluation of the Mentor system (Figure 5), can also be detrimental to their effectiveness.

Figure 5. The
(Marriott & Beard 2004).
However,
metaphors can give rise to complications when they are taken too literally (Dix et al. 2004). For example, the desktop
metaphor, common to today’s personal computer systems, allows a user to delete
a file by dragging it to the wastebasket or recycle bin. However, if the user
wants to “shred” a file for security reasons, the metaphor breaks down, since
it doesn’t make sense to drag the file to a bin for recycling if one wants to
permanently delete it. It may then appear to the user that the program does not
provide the facility for secure disposal of sensitive documents. Similarly, if
a user wants to eject a DVD from one of the Apple operating systems, they are
required to drag the icon to the wastebasket. Again, this does not make sense
in terms of the desktop metaphor, and the user may infer that they are unable
to eject their disk. Perhaps even more distressing for the user, would be
dragging a rewriteable media, such as a floppy disk, to the wastebasket. The
implication here is that the action would result in the contents of the disk
being deleted, rather than the disk being ejected. Even a more experienced user
may want to think twice before dragging such an item to the wastebasket.
On the
other hand, the inaccuracy of the desktop metaphor does not detract from its
usefulness. The importance of the metaphor is to empower the user to work with
the abstraction of a computer by providing an interface that improves on the
implementation model of the system (Marriott & Beard 2004).
Another
problem with the metaphor is that it relies heavily on cultural bias. It should
not be assumed that a metaphor can be successfully used in all cultures,
particularly with the increasing internationalisation of software (see section 2.2). Metaphors need to be chosen with care, as a
metaphor that has no meaning, or the wrong meaning, to a certain group of
people, adds an unnecessary layer of complexity to the software (Dix et al. 2004).
A model of user behaviour is necessary to understand the principles
of designing for cognitive control (Norman
1991:p.77). The
use of personas as a practical interaction design tool was introduced in
1998, and they swiftly gained
popularity in the software industry due to their power and effectiveness in
modelling users (Cooper 2003).
A persona is a precise descriptive model of the users of a product, what they wish to accomplish, and why. They are based on the behaviours and motivations of real people and represent them throughout the design process, thereby providing a way to match the behavioural patterns, mental models, and goals of users (Cooper & Reimann 2003).
The use of personas moves the design process
away from discussions that may be personal in nature or vague, to a series of
questions and answers based on a concrete example from which the team can work (Hourihan 2002). For example, vague and personal comments such
as, “I’d want it to work this way” or “users like to see all the
options on the home page” would become more factual data representative of
the end users, e.g. “Mary, the primary persona, works from home via dialup
four days a week, therefore downloading an Access database isn’t an option”
(Hourihan 2002).
Personas assist designers to (Cooper & Reimann 2003):
1. Determine what a product should do and how it should behave, by basing the design effort on persona goals and tasks.
2. Communicate with stakeholders, developers and other designers by providing a common language for discussing design decisions, and to keep the design centered on users throughout product development.
3. Build consensus and commitment to the design by reducing the need for elaborate diagrammatic models, due to the ability to understand the nuances of user behaviour through the personas.
4. Measure the design’s effectiveness by providing a tool that designers can use to address problems and allowing iteration to occur rapidly and inexpensively.
Personas are developed through research and real-world observation through interviews, contextual enquiries and other dialogues with, and observations of, actual and potential end-users. They are represented as individuals and are designed to engage the empathy of the development team towards the human target of the design. However, a persona also encapsulates a distinct set of usage patterns of the final product (Cooper & Reimann 2003).
Personas should not be reused, because they are context-specific and the focus of behaviours may differ between products. Stereotyping a persona is a danger that should be avoided, as stereotypes are based on bias rather than on factual data. Personas explore ranges of behaviour rather than seeking to establish an “average user” (Cooper & Reimann 2003).
Olsen (2004) constructed a toolkit for practical development of personas. His toolkit suggests that a persona profile should contain the following:
1. Persona type – primary, secondary, unimportant.
2. Biographic background of persona – humanises the persona by providing a backstory and matches it to market segments.
3. Persona’s relationship to the business – how valuable the persona is to the business.
4. Product/business’ relationship to the persona – may suggest emotional aspects that the product needs to address.
5. Specific goals, needs and attitudes of the persona – addressing the behavioural aspects of the product by focusing on user goals and restructuring tasks to meet those goals.
6. Specific knowledge and proficiency of the persona – persona’s overall knowledge and skills in the context of how the product is to be used.
7. Context of usage of the product – context in which a persona uses the product, focusing on the wider context surrounding the task.
8. Interaction characteristics of product usage – specific details about the task.
9. Information characteristics of product usage – how to present the appropriate content.
10. Sensory/immersive characteristics of usage – aesthetics are increasingly becoming critical to a product’s success.
11. Accessibility issues – it may be useful to build these into a persona as long as it doesn’t compromise the main purpose of that persona.
12. Design issues – think through the issues the persona is addressing.
13. Relationships between personas – similarities, generalisations and other relational issues.
Personas are still a relatively new tool for interaction design. Whilst Olsen’s toolkit provides a fairly comprehensive method for modelling a persona, often a simple paragraph, as illustrated in Figure 6 is enough to provide an effective tool for system design. Figure 7 provides an example of a more in-depth persona analysis.

Figure 6. A simple example of a Persona for Betty the Warehouse Manager
(Dix et al. 2004:p.201).

Figure 7. An example of an in-depth primary Persona for Sara Locke, user of the Daisy Bead Company website
(sic) (Robinson 2003).
Paradigms,
metaphors, mental models and personas are driving forces behind the user
interface and design employed in a particular system. There are three commonly
recognised user interfaces in use today. The Graphical User Interface, which is
possibly the most familiar to most users; the Voice User Interface, one that is
rapidly being deployed in many aspects of business; and the Multi-Modal
Interface, a relatively new area of research that combines several methods of
user input into a system.
Graphical user interfaces make computing easier by separating the logical threads of computing from the presentation of those threads to the user, through visual content on the display device. This is commonly done through a window system that is controlled by an operating system’s window manager. The WIMP (Windows, Icons, Menus, and Pointers) interface is the most common implementation of graphical user interfaces today, and will be examined in detail later in this document. The appeal of graphical user interfaces lies in the rapid feedback provided by the direct manipulation[5] that a GUI offers (Dix et al. 2004). Direct manipulation interfaces provide the following features (Dix et al. 2004:p.171):
· Visibility of the objects of interest.
· Replacement of complex command languages with actions to directly manipulate the visible objects (hence the name direct manipulation).
· Incremental action at the interface, with rapid feedback on all actions.
· Syntactic correctness of all actions, so that every user action is a legal operation.
· Reversibility of all actions, so that users are encouraged to explore the product without severe penalty.
The robustness of the direct manipulation interface for the desktop metaphor is demonstrated by the documents and folders being visible to the user as icons that represent the underlying files and directories (Dix et al. 2004:p.171). With a drag-and-drop style command, it is impossible to make a syntactically incorrect operation. For example, if a user wants to move a file to a different folder, the move command itself is guaranteed to be syntactically correct; and even though the user may make a mistake in placing the file in the wrong place, it is relatively easy to detect and recover from those errors. While the document is being dragged, continual visual feedback is provided to the user, creating the illusion that the user is actually working in the desktop world (Dix et al. 2004).
In February 2004, the National Academy of Engineering awarded the Draper prize to the Xerox Parc research team who, in 1973, unveiled the Xerox Alto, the first computer to use a direct manipulation graphical user interface (Figure 8). However, it wasn’t until 1984 that the GUI was popularised by Apple with its Macintosh personal computer (Figure 9), which became the first to commercially demonstrate the intrinsic usability of direct manipulation interfaces (Dix et al. 2004:p.171). Its relative ease of use was a major selling point in Apple’s advertising campaign with the slogan, “The computer for the bemused, confused and intimidated” (Figure 10). The immense success of these early advertising campaigns highlights an issue that is just as relevant to the computer industry today.

Figure 8. 1973 Xerox Alto[6] (left).
Figure 9. 1984 Apple Macintosh[7] (right).

Figure 10. Early Apple Macintosh advertisement[8]
Voice User Interfaces (VUIs) use speech technology to provide people with access to information and to allow them to perform transactions. VUI development was driven by customer dissatisfaction with touchtone telephony interactions, the need for cheaper and more effective systems to meet customer needs, and the advancement of speech technology to the stage where it was robust and reliable enough to deliver effective interaction. With the technology finally at the stage where it can be effectively and reliably used, the greatest challenge remains in the design of the user interface (Cohen, Giangola & Balogh 2004).
A Voice User Interface is what a person interacts with when using a spoken language application. Auditory interfaces interact with the user purely through sound. Speech is input by the user, and speech or nonverbal audio is output by the system (Cohen, Giangola & Balogh 2004).
People learn spoken language implicitly at a very young age, rather than explicitly through an education system. Therefore, speakers generally do not explicitly think about the technical means of getting a message across, they simply need to focus on the meaning of the message they wish to convey. Consequently, VUI designers must focus on understanding the conventions, assumptions and expectations of conversation instead of its underlying constructs (Cohen, Giangola & Balogh 2004).
VUIs are comprised of three main elements (Cohen, Giangola & Balogh 2004):
1. Prompts, also known as system messages, are the recorded or synthesised speech played to the user during the interaction.
2. Grammars are the possible responses users can make in relation to each prompt. The system cannot understand anything outside of this range of possibilities.
3. Dialog logic determines the actions the system can take following a user’s response to a prompt.
Additionally, auditory interfaces provide opportunity to use nonverbal audio such as background music and earcons (sounds that are used to represent a specific event) to create an auditory environment for the user, and thus creating a unique “sound and feel” for a business or application (Cohen, Giangola & Balogh 2004).
Auditory interfaces provide new ways of supplying information, enabling transactions, and facilitating communication. They have benefits for both the company and the end user. Speech systems normally save companies significant amounts of money by lowering call abandonment rates and increasing automation rates, as well as reducing toll call charges through shorter call durations. They can enable companies to extend their reach to customers who don’t have web access and those who desire mobility whilst retaining the same level of service. Additionally, auditory interfaces can solve problems or offer services that were not available or possible in the past, such as automated personal agents that use speech technology to act as a personal assistant (Cohen, Giangola & Balogh 2004).
For the end user, voice systems can be more efficient and intuitive by drawing on the user’s innate language skills and simplifying input sequences. Telephones are ubiquitous, which makes speech systems mobile. This also means that users can employ them anywhere and without occupying the user’s hands and eyes, which makes speech systems easily accessible. A well-designed speech system can thus free up the user’s time by more efficiently meeting their needs (Cohen, Giangola & Balogh 2004).
Aside from speech recognition systems, other speech technologies include Text-to-Speech (TTS) Synthesis and Speaker Verification. Speaker Verification involves collecting a small amount of a person’s voice to create a voice template, which is used to enrol a person into a system and then compare future conversation. The system can be used, for example, to replace personal identification numbers (PINs) and monitor home incarceration (Cohen, Giangola & Balogh 2004).
Text-to-Speech technology, on the other hand, synthesises text into speech. The technology has improved significantly in recent times, and although it does not yet duplicate the quality of recorded human speech, it is still a good option for creating messages from text that cannot be predicted, such as translating web pages for blind users (Cohen, Giangola & Balogh 2004). Some examples of text-to-speech implementations can be found at www.vhml.org/examples.
Understanding the nature of speech systems as well as their benefits and limitations, is necessary in order to design effective Voice User Interfaces. Unique design challenges presented by VUIs are due to the transitory and non-persistent nature of their messages (Cohen, Giangola & Balogh 2004). Messages are invisible – once the user hears the message, it is gone, with no screen to display the information for further analysis. The user must usually decide then and there how they will respond to a prompt. The cognitive limitations of the user necessarily restrict the quantity of information that can be presented by the auditory interface at any one time. Thus voice interface designs should not unnecessarily challenge the user’s short term memory, and they should provide a mechanism for the user to adjust the pacing of the interaction to better suit their own needs (Cohen, Giangola & Balogh 2004).
Multi-modal interfaces attempt to address the problems associated with purely auditory and purely visual interfaces by providing a more immersive environment for human-computer interaction. A multi-modal interactive system is one that relies on the use of multiple human communication channels to manipulate the computer. These communication channels translate to a computer’s input and output devices. A genuine multi-modal system relies on simultaneous use of multiple communication channels for both input and output, which more closely resembles the way in which humans process information (Dix et al. 2004).
In the field of psychology, Gestalt Theory is used to describe a relationship where the whole is something other than the sum of its parts[9]. This theory has recently been used to describe a new paradigm for human-computer interaction, where the interface reacts to and perceives the desires of the user via the user’s emotions and gestures (Marriott & Beard 2004). This paradigm is called the gestalt User Interface (gUI) and paves the way for a truly personalised user experience.

Figure 11. MetaFace.
MetaFace is an embodied character agent for a multi-modal interface that uses the “virtual friend” as a metaphor to help with web searches (Marriott & Beard 2004).
Marriott & Beard (2004) write that “a gUI is an interface where input to the application can be a combined stimulus of text, button clicks, and analysed facial, vocal, and emotional gestures”. The user’s emotions, captured through input devices such as video and audio, are translated into input for a gUI, and the program’s response is rendered using a markup language.
The value of an interface that can interpret a user’s emotions has applications in fields ranging from business management, safety, and productivity, to entertainment and education. For example, if a program could recognise that the user was getting frustrated, it could modify its behaviour to compensate. When evaluating the text-and-GUI-based Mentor System application used by students from Curtin University to assist with their assignments, Marriott (2003) found that “personality conflict” occurred between the system and some users. For example, one user became intensely annoyed at the beeping sound the program made, while another user found the program to be discourteous. Marriott suggests that incorporating a dynamically adjusting module into the user interface could eliminate or reduce some of these problems.
Marriott and Beard (2004) suggest that an anthropomorphic metaphor should be used to specify complete interaction with an Embodied Character Agent[10] (ECA) that incorporates emotion and gesture analysis. The Mentor System, even though it does not incorporate an ECA, also uses an anthropomorphic metaphor of a virtual lecturer (Marriott & Beard 2004).
One of the more well-known commercial examples of the ECA is the virtual newscaster at www.ananova.com. Whilst the implementation technology for Ananova has yet to meet the empathy requirements of a gestalt user interface, such as realistic facial animation, the speech synthesis software is quite good and it nevertheless provides a good example of what could be achieved when the technology is realised.
Another product that makes use of multi-modal interfaces is Avatar-Conference, an alternative to video-conferencing that uses avatars to represent the conference participants in a virtual environment. Although it does not incorporate dynamically adjusting modules, it nevertheless provides more than one mode of user interaction.

Figure 12. Avatar-Conference, a multi-modal interface using avatars to represent humans