TABLE OF CONTENTS

TABLE OF CONTENTS *

LIST OF FIGURES *

LIST OF TABLES *

ABSTRACT *

PREFACE *

1 INTRODUCTION *

1.1 The Problem *

1.2 Objectives *

1.2.1 Realism *

1.2.2 Performance *

1.2.3 Complete System *

1.2.4 Integrate with the FAE *

1.3 Supplemental CD *

1.4 Summary *

2 LITERATURE REVIEW *

2.1 Facial Animation *

2.1.1 Overview *

2.1.2 MPEG-4 *

2.2 The Facial Animation Engine (FAE) *

2.3 Hair Simulation *

2.4 Image Based Rendering *

2.5 Model-Based Face Reconstruction *

2.6 OpenGL *

2.6.1 Overview *

2.6.2 Geometry *

2.6.3 Texture *

2.6.4 Drivers *

2.7 Summary *

3 RESEARCH METHODOLOGY *

3.1 Types of Research *

3.2 Limitations and Delimitations *

4 IMPLEMENTATION *

4.1 Overview *

4.2 The Face Styler Application *

4.2.1 Acquire Images *

4.2.2 Prepare Images *

4.2.3 Place FDP *

4.2.4 Calibrate Model *

4.2.5 Generate Head *

4.2.6 Load into FAE *

4.3 The Face *

4.3.1 Views *

4.3.2 Texture Coordinates *

4.3.3 Texture Weights *

4.3.4 Cylindrical Texture Map *

4.3.5 FDPs *

4.4 The Hair *

4.5 The Eyes *

4.6 The Environment *

4.7 FAE Integration *

4.7.1 The Environment *

4.7.2 The Hair *

4.7.3 The Face *

4.7.4 Texture Mapping *

4.8 Summary *

5 RESULTS *

5.1 Overview *

5.2 Realism *

5.2.1 The Initial Head *

5.2.2 The Face *

5.2.3 The Hair *

5.2.4 The Eyes *

5.2.5 The Environment *

5.2.6 Miscellaneous Heads *

5.3 FAE Integration *

5.4 Performance *

6 FUTURE WORK *

6.1 Arbitrary Camera Locations *

6.2 Automatic Camera Calibration *

6.3 Hair *

6.4 Ears *

7 CONCLUSION *

BIBLIOGRAPHY *

APPENDIX A FDP TABLE *

APPENDIX B FDP DIAGRAM *

APPENDIX C FSP FILE FORMAT *

APPENDIX D HAIR FILE FORMAT *

APPENDIX E BERNIE'S IMAGE LIBRARY (BIL) *

LIST OF FIGURES

Figure 1.1: The FAQbot *

Figure 2.1: FDP Points *

Figure 2.2: Model Calibrated by FDP *

Figure 2.3: FAE Block Diagram *

Figure 2.4: Mike and Oscar *

Figure 2.5: OpenGL Block Diagram *

Figure 4.1: Face Styler Block Diagram *

Figure 4.2: The Input Images *

Figure 4.3: Down-Sampling using a Box Filter *

Figure 4.4: Storing Image inside Texture Map *

Figure 4.5: Example of Image Alignment *

Figure 4.6: MPEG-4 FDP Points *

Figure 4.7: The Face Styler User Interface *

Figure 4.8: Select by Region *

Figure 4.9: Select by Name *

Figure 4.10: Image does not fit the FDP points *

Figure 4.11: Face Styler showing Calibrated Model *

Figure 4.12: Mapping Orthogonal Textures *

Figure 4.13: Cylindrical Projection *

Figure 4.14: Cylinder Map Rendering Passes *

Figure 4.15: Cylinder Map showing FDPs and Geometry *

Figure 4.16: Hair FDPs *

Figure 4.17: Hair Texture *

Figure 4.18: FDP Points of the Left Eye *

Figure 5.1: The Initial Head Mode - Oscar *

Figure 5.2: Head supplied with FAE - Loris *

Figure 5.3: Head supplied with FAE - Chen *

Figure 5.4: Bernie - Input Images *

Figure 5.5: Artefacts caused by Poorly Fitting Model *

Figure 5.6: Bernie - Cylindrical Face Map *

Figure 5.7: Bernie - Textured Head with Face Mapping *

Figure 5.8: John – Input Image *

Figure 5.9: John – Cylindrical Face Map *

Figure 5.10: John – Textured Head with Face Mapping *

Figure 5.11: Bernie – Textured Head with Hair *

Figure 5.12: John - Textured Head with Hair *

Figure 5.13: Cluttered Hair Modelling Interface *

Figure 5.14: Bernie – Textured Head with Eyes *

Figure 5.15: Close Up showing Realistic Eyes *

Figure 5.16: John – Textured Head with Eyes *

Figure 5.17: Bernie - The Newsreader *

Figure 5.18: Bernie – In the City *

Figure 5.19: Groucho Marx *

Figure 5.20: Michelle and Sally - Female Models *

Figure 5.21: Lilly the Family Cat *

Figure 5.22: Bill Gates *

Figure 5.23: Bernie - Facial Expressions *

Figure 5.24: John - Facial Expressions *

LIST OF TABLES

Table 1: Video Memory Usage *

Table 2: Hair FDPs by Slice *

Table 3: Hair FDPs by Strip *

Table 4: Available Texture Memory *

Table 5: Standard FAE Results (FPS) *

Table 6: Modified FAE Results (FPS) *

Table 7: Full Screen Results (FPS) *

ABSTRACT

This research shows that it is possible to create a realistic MPEG-4 head model by texture mapping the face (nose, mouth, chin, cheeks and ears), the hair, the eyes, and the environment.

The Face Styler is an easy to use tool than can create such realistic heads. It imports images of a person's head acquired from a variety of sources, such as digital cameras, scanned books and photographs, or public archives on the Internet. Using these images, an MPEG-4 compliant head described using Facial Definition Parameters (FDPs) can be interactively created in 15 minutes.

The Face Styler generates data that can be directly imported by the Facial Animation Engine (FAE), which is an implementation of an MPEG-4 facial animation system. The FAE does not have the ability to display realistic hair or an environment. However, a loosely coupled integration layer provides this functionality without modifying the internals of the FAE.

The realistic head can be rendered in realtime on low-end consumer graphics cards that support OpenGL. Combined with realistic speech and human personality, this interactive talking head is a believable human-computer interface.

PREFACE

I would like to thank my honours supervisor, Andrew Marriott, for his assistance and patience over the last year. I know that my insistence to work at home nearly drove him insane. However, I hope that this paper shows that I have been productive, despite never showing anything at the meetings.

I would also like to acknowledge all fellow honours students, especially John Stallo, Russell Shepherdson and Quoc Huynh. Your work is invaluable to achieve a realistic and believable talking head.

Lastly, thank you very much to all the people who posed as face models. This includes John Stallo, Sally Male, Michelle D'Cunha, Bill Smyth and Simon Beard. The lack of an Asian face model can be entirely blamed on Quoc, who refused to have his picture taken. J

  1. INTRODUCTION
    1. The Problem
    2. Curtin University of Technology has joined with the University of Genoa (Italy) to produce an interactive talking head that can answer questions to natural language queries: the FAQbot (Beard et al., 1999). Genoa has supplied the Facial Animation Engine (FAE), while Curtin honours students are working on integrating it with a web browser (Yuda Levy), adding personality to the head (Russell Shepherdson), adding emotion to the synthesised speech (John Stallo), and implementing a gesture markup language (Quoc Huynh).

      The FAQbot makes it easy for novice computer users to interact with the computer. This is achieved by modelling the interface on human-to-human interaction. That is, the user can simply type in a natural language question and the FAQbot will respond both verbally and visually. For this interface to be effective it needs to act, sound and look like a real person. It needs to hide the fact that there is a computer behind the talking head.

      Russell Shepherdson and Quoc Huynh are responsible for making the FAQbot act like a real person. This is achieved via a personality module (Shepherdson, 2000) and a gesture markup language. John Stallo is responsible for making the talking head sound like a real person by adding emotion to the speech. The research presented in this paper addresses the remaining issue, it tries to make the head look like a real person. Together, all these sub-systems combine to form a believable talking head.

    3. Objectives
      1. Realism
      2. Figure .1: The FAQbot

        Figure 1.1 shows the talking head as it appears in Curtin's FAQbot (Beard et al., 1999). While this head has human qualities, it does not look like a person. It is clear that it is just a computer model. The aim of this research is to make this head indistinguishable from a real person. If the head model cannot be distinguished from a real head, then the model is realistic. This is essentially an extension of the Turing Test for artifical intelligence (Turing, 1950). The viewer is asked to determine which of two faces, one being real and the other a computer simulation, is the computer model. If the viewer cannot make a decision, then the computer simulation is sufficiently realistic.

        The Turing Test for Facial Animation has never been passed. However, this research attempts to provide some way towards this goal. To achieve this, four key areas of the head were targeted: the face, the hair, the eyes and the environment. Each of these areas significantly impacts on the appearance of the head. Therefore, by increasing the realism of each of these areas, the realism of the overall head is improved.

        The face is composed of the nose, mouth, chin, cheeks and ears. It is visually the most important part of the head because it determines the identity of the person. The face must have a realistic shape. That is, the placement and proportions of the facial features (eyes, nose, cheeks, mouth, chin, jaw and ears) must be correct, otherwise the face looks like a cartoon character (Flemming and Dobbs, 1999). An effective way to get the correct placement and proportions is to use a photograph of a person's head as reference.

        A realistic face must also be composed of a realistic texture. This can be seen in Figure 1.1, which shows a correctly shaped head that does not look realistic. To generate a realistic face texture, a synthesis technique could be used. However, it is much easier to use a photograph to acquire the texture map. Ultimately, a photograph provides more realism because it can capture subtleties of the face that a synthesised texture never could.

        Hair is visually one of the most important elements of the head (Ando and Morishima, 1995). Without hair, it is evident to the viewer that part of the head is missing. Adding hair significantly improves the visual appeal of head. Currently, most facial animation systems do not provide adequate tools to model and display realistic hair. This is especially true for realtime systems. Therefore, this research is concerned with the generation of realistic hair. The same photographs used to model the face will be used to create the hair shape and texture.

        The eyes are paramount to effective human to human interaction. Therefore, if the talking head is used as a communication tool, then realistic eyes are vital. The eyes are uniquely personal to every human. Therefore, this research needs to provide a mechanism to extract the eyes from the input photographs.

        The environment provides a background to the talking head. The current head of the FAQbot is set on a slid blue background. This makes it look like the head is floating in space, which impedes the realism of the face. The environment provides a context (a setting) and a reason (a purpose) for the talking head. For example, a newsreader requires a studio in the background to appear believable. The environment does not directly improve the realism of the talking head. However, it improves the realism and believability of the entire scene.

      3. Performance
      4. The FAQbot is targeted at the consumer level. That is, it aims to provide an interactive talking head on standard personal computer. Therefore, the realistic talking head must respect the limitations of such a consumer-level system. The head must display and animate in realtime, otherwise it is not believable. If the head takes minutes to render, then the realtime human-computer interaction is lost.

        The biggest concern is the tradeoff between realism and speed. This is a fundamental computer graphics problem. To create a realistic simulation, the system needs to process a lot of data. The more data, the more realistic the simulation. In terms of the talking head, this data is in the form of polygons and texture maps. A realistic face must be composed many polygons and high-resolution texture maps. However, the more data that needs to be processed, the slower the simulation. A realistic head may take many minutes to render using radiosity techniques. A realtime interactive system, such as the FAQbot, has at most 1/15th of a second to render the model.

        To achieve believability, a balance between execution speed and realism must be found. Our research investigates this balance by looking at the limits imposed by a personal computer equipped with a low-end consumer-level graphics card.

      5. Complete System
      6. The realistic head must be quick and easy to create. Therefore, this research aims to produce an application that allows creation of a realistic head given only a set of input images. If it takes many hours to create the realistic head model, then the system will never be used. Therefore, it is important that it takes only several minutes to create a head from scratch.

        This application must accept the input images directly from the source, such as a digital camera, a scanner, etc. Then it must provide all the tools needed to transform the input images into a realistic three-dimensional head model that is fully compliant with the FAE. That is, the application must be a complete system that encapsulates the entire workflow, from the beginning of image acquisition to the end of head generation. This is paramount to making the system easy to use. If the system is not easy to use then no one will ever use it.

      7. Integrate with the FAE

      The Facial Animation Engine (FAE) allows the display and animation of a virtual head. Curtin's facial animation work, including the FAQbot, is based on the FAE. Therefore, the realistic head generated by this research must also be compatible with the FAE.

      The FAE was developed by DIST at the University of Genoa in Italy. It is described in detail in section 2.2. However, in terms of how it works internally, the FAE must be considered a "black box". It is impossible to make major modifications to it. Any changes to the FAE could be invalidated as soon as DIST released the next version.

      The FAE does not have the ability to display a realistic talking head without any modification. Therefore, this research seeks to provide a solution that ties in with the FAE via a loosely coupled API. That is, only minor code changes are required to attach the code produced by this research to the FAE. Whenever DIST releases a new version of the FAE, this additional layer of code can be integrated quickly and easily.

      In the situation where it may not be possible to attach the integration layer to the FAE, the head must still be displayable with reduced realism. To achieve this, the head must be represented in an FAE compatible format. The FAE is an implementation of MPEG-4 facial animation. Therefore, the head must be MPEG-4 compliant. MPEG-4 defines strict guidelines on how a talking head must be represented. This severely limits the direction of the research. That is, the MPEG-4 FAE influences the way a head is described, and subsequently, how it is created.

    4. Supplemental CD
    5. This research deals with the realism and believability of a talking head. These are visual qualities, and as such, this paper contains many figures. These figures are not accurately represented in print-form (especially in black-and-white). Therefore, this dissertation is supplemented with a CD that contains all the images. It is strongly recommended that the reader refer to the images on the CD, especially in the results section.

      The supplemental CD also contains the Face Styler application. This application allows the creation of the MPEG-4 compliant heads given a set of input images. To evaluate the ease-of-use objective, it may be beneficial to try out this software. The CD also includes all the data required to create the heads presented in this paper.

      The root directory of the CD contains a file named index.html that points to the location of all relevant files.

    6. Summary

    Curtin University of Technology has created an interactive talking head that can answer natural language questions both verbally and visually. This FAQbot is based on the Facial Animation Engine (FAE) developed by the University of Genoa. Our research aims to improve the realism of this talking head by targeting the face, the hair, the eyes and the environment.

    This research will produce an easy to use application that encapsulates the entire head modelling process. This application will allow the creation of MPEG-4 compliant heads that are fully compatible with the FAE.

    To ensure the believability of the realistic talking head, it must render in realtime. However, achieving both realistic appearance and fast execution is a fundamental computer graphics problem. Therefore, a balance between realism and performance must be found.

    This dissertation is accompanied by a supplemental CD that contains all the figures in this paper. It also contains the Face Styler application, along with all the data used to create the realistic head models.

  2. LITERATURE REVIEW
    1. Facial Animation
      1. Overview
      2. Realistic facial animation is one of the most fundamental problems in computer graphics. Ever since its inception by Parke (1974), many dozens of research papers have been published on the subject. However, synthesising a realistic face remains a difficult problem - no facial animation Turing test has ever been passed. This is because "there is no landscape that we know as well as the human face." (Faigin, 1990). Even the slightest fault in a synthesised face can be perceived by anyone watching.

        The applications of facial animation are very diverse, and include fields that range from purely recreational to life enhancing. Perhaps the best known application of facial animation is in the film industry. Their systems are traditionally based on key-frame animation, with many parameters that influence the appearance of the face. For example, the models used in Pixar's Toy Story had several thousand control points each (Pighin, 1998).

        Another application of facial animation is computer games, where titles such as Full Throttle and The Curse of Monkey Island used facial animation for their 2D cartoon characters. This trend continued into 3D titles, where games such as Tomb Raider and Grim Fandango used facial animation as the key tool to communicate the story to the player (Lander, 1999).

        Facial Animation is expected to play a large role in user interface design (Morishima and Harashima, 1991). Most novice users feel very intimidated when sitting in front of a keyboard. Utilising everyday human-to-human communication as a computer interface would reduce this initial alienation. However, hardware and software still needs further development before this can become practical.

        Facial animation is also being applied in medical fields like facial surgery planning (Vannier et al. 1983; and Koch et al., 1996) and previewing the effects of dental surgery (Parke, 1982). However, these pre-operative applications would require a very accurate anatomical model of the patient's face. This is not very practical, because each face varies enormously from the next, and acquiring face data can be tedious (Waters, 1987).

        Facial Animation can also be used as a teaching aid. Talking Tiles is an application of HyperAnimation (Gasper, 1988) that aids with the teaching of language skills. Facial animation could also be used to teach the hearing impaired (Hall, 1992). A face model could demonstrate how certain words are pronounced, while cut-away views show where the tongue needs to be positioned to create the desired sounds.

        The important issue to remember is that all these varied applications, film, computer games, medicine and teaching, use facial animation as a communications medium. That is, they utilise a computer simulation of a human face in order to reach the audience more convincingly.

        Communication via a virtual human could also have applications for virtual shopkeepers (e-commerce), virtual lecturers (distance education) and virtual guides (information brokering). Instead of presenting the user with textual responses to questions, a face could respond by answering verbally. This would allow factors like speech-intonation and emotion to further enhance the response.

      3. MPEG-4

MPEG-4 goes beyond the conventional concept of the audio/visual scene being composed of a sequence of rectangular video frames and an associated audio track. Instead, the scene is composed of a set of Audio-Visual Objects (AVOs). The sender encodes these AVOs into elementary streams, and transmits them via a single multiplexed communications channel. The decoder is responsible for extracting the elementary streams and compositing the decoded AVOs to form the scene (MPEG, 1999).

MPEG-4 specified two AVOs that represent synthetic faces, a Simple Face Object and a Calibration Face Object. The following parameters are used by these objects:

In this research, we are particularly interested in the Facial Definition Parameters (FDPs), also known as Feature Points. An FDP simply describes an important a point on the face. MPEG-4 specifies 84 such FDPs. Together they can be used to define the appearance of virtually any face. Figure 2.1 shows the location of these points. Please refer to APPENDIX A for a complete list of MPEG-4 FDPs, while APPENDIX B contains a larger version of the following diagram.

Figure .1: FDP Points

An FDP is composed of the following:

FDPs must describe a face in the neutral position. All facial animation is performed based on the neutral face. This neutral face is defined as follows (Ambrosini et al., 1998):

It is up to the MPEG-4 client (the receiver) to deform the head model such that the face aligns with the given FDP points. It is important to mention that the nature of the receiver's face model is irrelevant. That is, the model could be composed of a low number of polygons to allow realtime display. However, the model could also be composed of a large number of polygons to make it look more detailed and realistic. The model need not even be composed of polygons. For example, it could be modelled using NURBS or Bezier patches. The head is a valid MPEG-4 model as long as the receiver can calibrate it so that the features coincide with the transmitted FDP points.

Along with the FDPs, the sender may also transmit a texture map to further enhance the appearance of the face. Since the FDP points must describe a neutral face, the texture map must also depict a neutral face. The receiver must then assign texture coordinates to the face model vertices using only the FDP texture coordinates as a guide.

Figure 2.2 shows the same model before and after being calibrated by a set of FDPs.

Figure .2: Model Calibrated by FDP

    1. The Facial Animation Engine (FAE)
    2. The Facial Animation Engine (FAE) is an implementation of the MPEG-4 specification on Facial Animation. It was developed in the DIST laboratory at the University of Genoa, Italy. Currently, the FAE is compliant with the Simple Face Object (defined in MPEG-4, Version 1) and with part of the Calibration Face Object (under definition for MPEG-4, Version 2) (Lavagetto and Pockaj, 1999).

      Figure .3: FAE Block Diagram

      Figure 2.3 shows the FAE as a block diagram. The Wireframe Geometry is a VRML 2.0 file that contains the geometrical information of a face: vertices and triangle topology. The Wireframe Sematic file contains information on the meaning of the model vertices. This information is used by the FAE to build the animation rules, because it indicates which vertices are affected by each FDP/FAP.

      DIST currently supplies two models with the FAE, Mike and Oscar (see Figure 2.4). Mike is a very simple model that was fully designed at DIST. It is composed of 750 polygons and 408 vertices. Oscar is a more sophisticated model which is derived from the GeoFace model by Keith Waters, available as a demo in the Graphics Library Utility Toolkit (GLUT). Oscar is composed of 2444 polygons and 1253 vertices.

      Figure .4: Mike and Oscar

      The FAE is composed of two major sub-systems: the Calibration Block and the Animation Block. The Calibration Block is fed with the FDP stream and is responsible for the deformation of the face model. It is also responsible for assigning the texture maps to the calibrated face model. Calibration is based on Radial Basis Functions (RBFs), which are used for the reshaping of most parts of the face. The eyeballs, teeth, tongue, and mouth are exceptions, and are reshaped with simple ad hoc algorithms. The texture coordinates are also calculated using the same RBFs. However, some semantic information is used to selectively apply the texture on the model, preventing undesired effects like mapping skin onto the teeth (Ambrosini et al., 1998).

      The Animation Block generates the animation rules that animate the face in response to the FAP stream. These rules are computed using the information contained in the semantic file. The animation rules are then applied to the calibrated face model to generate the facial animation. The Animation Block also deals with timing and audio-synchronisation issues.

      The realistic head model created by our research must be compliant with this FAE architecture. Therefore, the model must be represented by FDP points and a single texture map of the face. The FDP points must be stored in the format dictated by the FAE. The texture map must be stored a PPM file, as the FAE can only load images in that format.

    3. Hair Simulation
    4. Much work has been done on human modelling and facial animation. However, one area has been relatively neglected - simulation of hair. Despite hair being "visually one of the most important elements" (Ando and Morishima, 1995), researchers have avoided this topic because of the many difficulties involved.

      The first obstacle involves the sheer number of hairs on a typical human head, which can range from 100,000 to 150,000. Another difficulty is the hair's width, which is tiny compared to the rest of the face. This tiny fibrous nature of hair causes complex light behaviour. For example, hair exhibits anisotropy, where the reflected and refracted light scatters with a preferred direction. In addition, hair is partially transparent. Light is reflected from the convex front of the hair cylinder, but also from the concave back. It is also important to mention that hair is self-shadowing. Strands on top of the hair cause the underlying hair to be in shadow.

      A further problem is that of hair styling. Once hair can be accurately modelled, it must then be accurately styled. Hair is cut at different lengths in different places. Hair also grows (and is combed) in different directions all over the head. Once hair is styled, it is in constant motion. Each strand collides and rubs against neighbouring hairs.

      In 1985, Kajiya investigated the anisotropic lighting model of the hair. A few years later, together with Kay, he created the first realistic rendering algorithm for fur (Kajiya and Kay, 1989). They realised that the anisotropic lighting models could never be accurately reproduced at the geometry level. Therefore, they used a three-dimensional texture, a texel, to represent the fur. The paper describes an algorithm that successfully renders very realistic fur. Unfortunately, the algorithm is computationally expensive and is based on raytracing.

      Watanabe and Suenaga (1989) were the first to simulate hair by modelling the individual strands. They used trigonal prisms to represent a hair segment. Each hair strand was composed of multiple such segments. These strands were organised into wisps, groups of hair with similar properties. In 1992, they updated their system to include a better rendering algorithm that could reproduce the backlighting effect, the effect that causes hair to shine when placed in front of a light source. The main problem with this approach was that each hair wisp had to be manually placed onto the head. As can be imagined, this was a painstakingly slow process.

      Rosenblum et al. (1991) automated the growth of hair by simulating each strand using a spring-hinge-mass system. The dynamics of each hair were recreated using external forces to shape the hair. For example, the system could handle forces such as gravity, wind and inertia. It also allowed realistic hair motion to be calculated. Unfortunately, the equations were complex and slow to compute. The paper also brushed over the issue of collision detection, which is an important factor when simulating hair.

      Anjyo et al. (1992) provided a simpler method of simulating hair. They approximated the motion of each hair strand using cantilever beams. This approach was significantly simpler than Rosenblum's spring-hinge-mass systems, while also allowing external forces to affect the motion of the hair. Curtin honours student Mark Sheridan (1994) used this approach to add hair to Andrew Marriott's FAX facial animation system. John Usher (1997) improved this system by adding collision detection. He also extended the simulator by allowing hair to be planted and grown out of any polygonal object.

      Unfortunately, the systems created by these students suffer from the same problem as all other strand based hair simulators - geometry overload. The graphics system cannot render the many hundreds (or even thousands) of strands in realtime. This research seeks to find a different solution to simulating hair by investigating image based rendering.

    5. Image Based Rendering

Image Based Rendering (IBR) describes a set of techniques that allow three-dimensional interaction with objects and scenes that originated as two-dimensional images. Instead of using polygonal geometry to describe an object, multiple photographs are analysed and used to render the object from various points of view.

IBR has its roots in architectural visualisation. In order to speed up the interactive rendering process, several viewpoints were pre-rendered. These pre-rendered images were then used to replace geometry at runtime (Aliaga and Lastra, 1997). This evolved into using photographs instead of pre-rendered images. Debevec et al. (1996) introduced View Dependant Texture Mapping (VDTM) to render architecture that was predominantly modelled using photographs. Two years later, Debevec showed that his VDTM algorithm could be efficiently implemented using projective texture mapping (Debevec et al. 1998). Projective texture mapping was introduced by Segal et al. (1992), and is now part of the OpenGL standard. Using this technique, Debevec created the famous Campanile Movie. This movie, which can be rendered in realtime using OpenGL on consumer graphics cards, shows a swooping fly-around of Berkeley's bell tower. The final effect was a computer rendering that is "at a glance indistinguishable from the reality from which it was built" (Debevec, 1999).

IBR is rapidly gaining popularity amongst the computer graphics community. McMillan and Gortler (1999) attribute this enthusiasm to the following three points:

The major advantage of IBR is the realism that can be achieved. IBR is still largely constrained to architectural visualisation, however it is quickly finding other uses. For example, the Keanu Reeves film The Matrix used IBR techniques developed by Debevec to produce virtual camera moves.

Our research intends to build both the head model and the hair from two-dimensional images (photographs) using image based rendering techniques. In particular, projective texture mapping will be used to assign the images onto the head model. This technique will also be used to create the appearance of realistic hair.

    1. Model-Based Face Reconstruction
    2. A number of approaches have been developed to reconstruct a 3D face. Automated systems use data obtained from 3D scanners, such as those produced by Cyberware (1990). These cylindrical laser scanners acquire both range and colour data. This data can then be used to generate a face-model and a cylindrical texture map. (Lee et al., 1995). This approach, however, has many disadvantages. The scanner cannot accurately acquire the geometry of complex areas such as the ears and hair. The texture map is also typically limited to 512×256 samples, which is too low a resolution to capture a realistic face. Generating Cyberware scans is also time-consuming and expensive. Not many places have the facilities to perform such a scan.

      One of the earliest techniques for reconstructing a face model was based on photogrammetric techniques (Moffitt and Mikhail, 1980). Photogrammetry tries to reconstruct precise geometry from a set of input images. Parke (1974) employed grids that were drawn directly on the subject's face to reconstruct a facial model. However, due to these grids, the images used to construct the face model could no longer be used as texture maps.

      More recently, methods have been proposed that allow model reconstruction without these grid-lines. Ip and Yin (1996) used two orthogonal views of a head to reconstruct a face model. Kurihara and Arai (1991) also based their model reconstruction on photographs. However, both systems used a small set of predetermined features to deform the model. Their approach often led to models that poorly fit the input images. Another disadvantage was their systems required a fixed pair of precisely calibrated cameras.

      Perhaps the most successful system for creating realistic face models was created by Pighin et al. (1998). They allow several camera angles (rather than just orthogonal) to improve the model fitting process. Unlike Cyberware scanning, their system is not automatic and requires user intervention to place feature points on the photographs. Using these feature points, an accurate face model can then be reconstructed. To apply the texture maps, they use a system based on Debevec's work on architectural visualisation and image-based rendering. They first texture the face using projective texture mapping, and then generate a cylinder map from that textured face. The cylinder map can then be used for realtime rendering.

      The engine presented in this paper draws heavily on the work by Pighin (1998). It uses the same principles to create realistic realtime facial animation using the FAE.

    3. OpenGL
      1. Overview
      2. The goal of the FAE project is to deliver a talking head to the user's computer via the world-wide-web. Therefore, its primary target is not a workstation computer, but a consumer-level machine. It must be careful not to exceed the limited resources available on such a machine. This restriction extends to our research as well. The realistic talking head must render at interactive rates on a consumer-level computer using an off-the-shelf consumer-level graphics card.

        The average user will be running the Windows platform (Windows95, Windows98, WindowsNT or Windows2000). Since the FAE uses OpenGL to display the talking head, it is important to consider the state of OpenGL support on this platform.

        Perhaps the most important issue to mention is that consumer-level OpenGL support for Windows exists purely for computer games. The game Quake 2 by ID Software brought OpenGL to Windows. Prior to that, only workstation manufacturers (such as Intergraph) supplied OpenGL support for Windows NT. nVidia was the first company to supply an entire OpenGL 1.1 driver for their line of consumer graphics cards. Since then, the demand for OpenGL has grown so quickly that all manufacturers now provide OpenGL 1.1 compliant drivers. Although, some drivers are not particularly stable, especially non-game specific uses.

        Figure .5: OpenGL Block Diagram

        Figure 2.5 shows the OpenGL architecture as a block diagram (Segal et al., 1999). The earliest OpenGL compatible graphics cards, such as nVidia's Riva-TNT, only accelerated the rasterisation phase (texture memory, rasterisation, per-pixel operations and framebuffer). Newer cards, such as the nVidia geForce-256 and ATI Radeon, accelerate the majority of the OpenGL architecture. This includes display lists, per-vertex operations, primitive assembly and pixel operations.

        Since the talking head must be interactive on even the lowest level graphics card, this research must respect the limitations imposed by these accelerators. These limitations come in the form of geometry, texture and drivers.

      3. Geometry
      4. A system equipped with an nVidia Riva TNT2 can render 9 million Gouraud shaded, texture-mapped, z-buffered triangles per second (nVidia, 1999). The geForce-256, can handle up to 15 million triangles per second, while the latest generation geForce2-GTS can handle up to 20 million triangles per second (nVidia, 2000). However, these numbers were obtained in optimal laboratory conditions that do not reflect real-world performance.

        Measuring rendering performance per second is very misleading. For animation, the screen needs to be redrawn at least 15 times per second. To achieve realistic and smooth movement, a frame rate of 30fps (frames per second) is desirable. The human eye starts to see motion blur when the frame rate rises above 60fps. At this rate, the latest geForce2-GTS can only handle 333,000 triangles per frame. This is still only a theoretical maximum that can never be achieved in an actual application.

        An application that must run on consumer-level graphics cards can only use up 10,000 triangles per frame. Fast paced games, such as ID Software's Quake 3, have set their geometry budget at that level. Therefore, rendering no more 10,000 triangles will ensure an application will run on even the lowest level graphics cards at reasonable frame rates. Newer games and technology demos are starting to use up to 100,000 triangles per frame. This is obviously only attainable on the very latest graphics hardware.

      5. Texture
      6. Consumer level graphics cards have a very limited amount of memory. The number of textures that can be used in a scene is directly related to the amount of video memory. However, it is easy to overestimate the amount of texture memory available. The amount of video ram stated in the graphics card's specifications must be shared amongst the front buffer, the back buffers, the z-buffer, the stencil buffer and an accumulation buffer.

        Table 1 shows the graphics card memory usage at various configurations. It assumes that an 8bit stencil buffer is always present. Additionally, an accumulation buffer is never present, because current consumer-level graphics cards do not support the accumulation buffer in hardware. There are potentially two back-buffers, because many graphics cards can switch between double and triple buffering.

        Table 1: Video Memory Usage

        Consumer graphics cards also impose a limit on the dimensions of each texture map. The earliest graphics cards limited texture size to 256×256 pixels. However, most OpenGL cards support texture sizes of 1024×1024 pixels, while the latest generation raises this limit to 2048×2048. At these resolutions, video memory becomes limiting factor. A 1024×1024 pixel image holding RGB components requires 3MB of texture memory. As Table 1 shows, not all configurations leave that much memory available for textures.

        If the texture does not fit into video memory, then the graphics card shuffles part of the image back-and-forth between system ram and video ram. Obviously, this significantly slows rendering performance, as the system bus becomes a huge bottleneck.

      7. Drivers

Driver writers optimise the most heavily used parts of their graphics drivers. In terms of OpenGL, driver writers optimise a subset of the API for certain GL states. These optimised states and drawing commands are then referred to as fast paths. Huge performance gains can be obtained by utilising these fast paths.

It is important to utilise a fast path that is heavily used, and therefore optimised by the majority of graphics card manufacturers. ID Software documented the path used by their OpenGL game Quake 3. Due to their influence in the industry, graphics card manufacturers immediately began to optimise that particular path. Most computer games now use that path, because it is not only the fastest, it is also the most stable and bug free.

The path described by ID Software's document Optimizing OpenGL drivers for Quake3 (1999) is summarised below:

OpenGL 1.1 introduced texture objects to overcome this problem. A texture object allows the application to name all the textures that will be used by the upcoming frames. Then, during rendering, the application simply needs to activate the texture object instead of submitting the entire texture image. This allows the graphics card to cache the most used texture maps in local high-speed memory, significantly improving rendering performance.

Texture objects can be created using the following functions:

void glGenTextures(GLsizei n, GLuint *textureNames);


void glBindTexture(GLenum target, GLuint textureName);
void glTexImage2D (GLenum target, GLint level, ...);

During rendering, texture objects can be activated by calling:

glBindTexture(target, textureName);

glColorPointer (4, GL_UNSIGNED_BYTE, 0, colors);
glTexCoordPointer(2, GL_FLOAT, 0, texCoord);
glVertexPointer (3, GL_FLOAT, 16, vertices);

glLockArraysEXT(0, numVertices);

glDrawElements(GL_TRIANGLES, numIndices, GL_UNSIGNED_INT, indices);

nVidia has released a document (nVidia, 2000) which contains extensive information about the fast paths in their latest drivers. However, some of the techniques described in that document have yet to be optimised by other manufacturers. Therefore, it is best to use the ID Software path for the moment.

    1. Summary

Facial Animation has many varied applications, such as film, computer games, medicine and teaching. These applications rely on realistic animated human heads to improve communication with the viewer.

MPEG-4 specifies Audio-Visual-Objects (AVOs) that provide support for facial animation. These objects use Facial Definition Parameters (FDPs) and Facial Animation Parameters (FAPs) to synthesise a human head. FDPs describe the appearance of a head by pinpointing the location of key feature points on the face. A generic head model is then deformed to fit these feature points. FAPs animate this calibrated head model.

The Facial Animation Engine (FAE) is an implementation of an MPEG-4 facial animation system. The realistic head model produced by our research must be compatible with the FAE. Therefore, the head must be described using the MPEG-4 FDPs and a cylindrical texture map of the face.

Simulating individual strands of hair is not possible in realtime. However, hair is an essential part of a realistic head. Therefore, Image Based Rendering (IBR) techniques will be used to create the appearance of realistic hair. IBR replaces the stands of hair with a texture map by projecting photographs onto the head model. This technique is also used to create the face.

To allow the realistically textured head to be interactively displayed on a consumer-level computer system, the level of OpenGL support on Windows must be evaluated. Consumer-level graphics cards limit by the amount of geometry that can be rendered each frame, and the size and number of texture maps that can be used. To push the boundaries of these limits, OpenGL fast paths need to be utilised.

  1. RESEARCH METHODOLOGY
    1. Types of Research

The research methodology depends on the nature of the data. If the data is numerical, then a quantitative methodology can be applied. However, if the data is of a verbal nature, then a qualitative methodology must be used.

Realism cannot be measured numerically, therefore it requires a qualitative methodology. However, the objectives also state the face must be simulated in realtime. This can be measured numerically in terms of how long it takes to render a single frame. Therefore, a quantitative approach can be taken to evaluate this aspect of the research.

The following methodologies, taken from Mauch and Birch (1983), were applied to the research:

    1. Limitations and Delimitations

The primary delimitation is the talking head must be compatible with the FAE. Therefore, all data generated by this research must be in a format that can be directly used by the FAE. Existing programs, such as the FAQbot, must be able to make use of the realistic heads with only minor modification.

To improve the realism and believability of the head, the neck and shoulders should be visible. The FAE currently displays a rudimentary neck that cannot be animated independently of the head. For example, the neck cannot swallow. Swallowing would improve the believability of the head because humans frequently swallow during conversation (Flemming and Dobbs, 1999). To further improve the appearance of a real person, the neck should be attached to shoulders. The FAE does not have provisions to display any part of the body other than the head. Therefore, this research is limited to the realism and believability of the head.

Animation of the head is restricted the FAE's existing mechanisms. That is, neither the hair nor the environment will be animated. Obviously, animated hair and environment would help to increase the realism and believability of the talking head. However, that is out of the scope of this research.

The FAE is targeted at consumers. Therefore, the realistic talking head must be displayable on a consumer-level computer. To maintain believability, the head must not only look realistic, it must also act realistic. This can only be achieved by animating it in realtime. Therefore, the realistic head must render at interactive frame rates on consumer-level graphics cards. This research targets users with OpenGL accelerated cards because the FAE uses OpenGL to display the head. No provisions will be made for users that do not possess a graphics card capable of accelerating OpenGL at the rasterisation level.

  1. IMPLEMENTATION
    1. Overview
    2. To increase the realism of a talking head, four important areas must be targeted: the face, the hair, the eyes and the environment. The face, which includes the nose, mouth, chin, cheeks and ears, determines the identity of the talking head. The hair adds to the realism and helps to complete the head. Without the hair, the head appears like a robot rather than a real person. The eyes are very important, because they are the first features that a viewer will look at. Making the eyes appear more realistic has a surprising and dramatic effect on the overall realism of the head. The last area that must be addressed is the environment, the background of the talking head. The environment helps to establish the role or purpose of the head. For example, a newsreader appears far more realistic with a television studio in the background.

      The Face Styler is an application that allows the user create a realistic MPEG-4 compliant head by focusing on the face, hair, eyes and environment. To create a talking head, the user assembles a set of images that depict the desired face from several angles. Then, the user places feature points onto each image. These feature points pinpoint important parts of the face, which allows the system to recreate the shape of the face. The feature points and the images are then passed to the Face Styler engine, which generates face, eye, hair and environment data. The face and eyes can be directly imported by the FAE, however the hair and environment require an integration layer.

      The Face Styler's relationship to the FAE is shown in the block diagram below:

      Figure .1: Face Styler Block Diagram

      The subsequent section describes the process of generating a realistic face using the Face Styler. One of the objectives of this research was to create an application that encapsulates the entire workflow from acquiring the images to creating the final head model. Therefore, the following sections provide an application level overview of the program, showing how it fulfils this objective. Following that, the lower level processes that generate the face, hair, eye and environment data are explained.

    3. The Face Styler Application
      1. Acquire Images
      2. To create a talking head using the Face Styler, the user must assemble a set of images that depict the desired face from several angles (see Figure 4.2). These images may be directly imported from a variety of sources, including digital cameras, photographs and magazine scans. The Face Styler can import these images without the need for any pre-processing. It can currently handle JPEG, PNG and PPM images. However, TIFF and GIF support is planned for the future (see APPENDIX E).

        Figure .2: The Input Images

        The images must represent a face in the neutral pose (as described in section 2.1.2). Currently, the program can only accept orthogonal views of the subject (front, left and right). The Face Styler engine supports arbitrary camera locations, however the application does not expose this functionality. It is hoped that this feature will be incorporated into future versions of the Face Styler application.

        To get realistic results, the images must be high-resolution (around 1024×1024). Digital cameras and scanners can easily acquire images of this resolution. However, images sourced from books, magazines and film yield much lower resolutions, and therefore, lower realism.

        Another important issue to consider is lighting. For maximum realism, the subject must be under the same lighting conditions in each image. To solve this problem, Pighin et al. (1998) captured six images simultaneously around the subject. However, this solution may not always be feasible when resources are limited. As will be seen in the results, acceptable realism can be achieved by using only a single digital camera to take the photographs.

      3. Prepare Images
      4. OpenGL places a limit on the dimensions of a texture map (see section 2.6.3). This limit varies by manufacturer, and can range anywhere from 256×256 to 2048×2048. Therefore, if the input image dimensions exceeds this limit, it must be scaled down. To preserve image quality, this down sampling is done using a box filter that averages the merged pixels (see Figure 4.3).

        Figure .3: Down-Sampling using a Box Filter

        Instead of down sampling, the image could have been split into multiple texture tiles. This approach was tried, however it produced OpenGL filtering problems at the edges where the tiles joined. In some circumstances, the tiling was clearly visible. Therefore, the down sampling approach was taken. This is a future proof solution, because quality will automatically increase as newer generation graphics cards support higher texture dimensions.

        OpenGL stipulates that the texture map dimensions must be powers of two. To solve this problem, the input image can be enlarged to the next acceptable size. However, enlarging an image always degrades its quality because some pixels need to be interpolated. If a scheme such as cubic-interpolation is used, then the degradation is minimal and usually not noticeable.

        The Face Styler uses an innovative method to escape the need to resample the image. It creates a blank texture map at the next highest acceptable size. Then it copies the image into this texture map (see Figure 4.4). The problem now becomes one of addressing this texture. The texture can no longer be accessed in the range (0,0) to (1,1) because only a portion of the texture actually contains image data.

        Figure .4: Storing Image inside Texture Map

        OpenGL allows texture accesses to be reprogrammed using a transformation matrix. A scaling matrix can be used to map point (1,1) onto point (u,v), completely hiding the fact the image only occupies part of the texture map. That is precisely what the Face Styler engine does. However, it is important to note that this solution works only for non-repeating textures. If the image is tiled, it needs to be resized to fill the entire texture map, otherwise the gap will be visible between the tiles.

        The preparation stage also allows the images to be interactively arbitrarily rotated, scaled and translated. This allows the images to be aligned with each other (see Figure 4.5 for an example). This alignment is iterative, and is performed throughout model calibration.

        Figure .5: Example of Image Alignment

        Missing images can also be recovered in the preparation stage. For example, if the subject is only photographed from the front and the left, the right side can be obtained by flipping the left image. Obviously, some detail will be lost, as a human face is never symmetrical.

      5. Place FDP

Originally, we tried to fit the images to the model. However, it was soon realised that to achieve realistic results, the model must be fit to the images. To do this, the user must place 76 of the 84 MPEG-4 FDP points. The FDPs that describe the teeth and tongue do not need to be specified. The FAE automatically determines their position based on the other FDP points.

Figure .6: MPEG-4 FDP Points

Figure 4.6 is an excerpt of the image in APPENDIX B showing the FDP points on the face. Aligning these points with the input images is achieved interactively via the interface shown in Figure 4.7. An interactive system was chosen over an automated feature detection algorithm for two reasons. Automated systems rely on image processing techniques to detect the facial features. These algorithms are prone to errors and frequently require user assistance. (Guenter et al., 1998; and Pighin et al., 1998). The second reason that an interactive solution was chosen is that it allows creative freedom in the face design. Therefore, some interesting and unusual heads can be created. The results section demonstrates an example of this.

Figure .7: The Face Styler User Interface

The Face Styler works on a Select-and-Modify system. That is, the user must first select the FDP points that they wish to manipulate. This selection can be done using three ways:

Figure .8: Select by Region

Figure .9: Select by Name

Once FDP points are selected, the user can move and scale them as a group. This is achieved by right-clicking and selecting the desired operation. After a bit of practice, moving the points around is very quick and easy.

Along the top of the Face Styler are buttons to switch between the different views (input images). As stated earlier, the system currently has three hard-coded views (front, left and right) but the engine supports any number of views from arbitrary camera angles. Once all the points are aligned with the first image, the user can start to align them with the other views.

At this stage, the user may be confronted with the situation that the image does not line up with the existing FDP points (see Figure 4.10). Moving the FDP points would corrupt their alignment with the initial image. Therefore, the system allows the user to move, scale and rotate the image until it roughly fits the points. This is the iterative alignment process. The images and points need to be adjusted until they all align. This is the most time consuming part of the process. Again, with a bit of practice this can be achieved quite quickly and easily.

Figure .10: Image does not fit the FDP points

In the future, the Face Styler could be extended to automatically perform the alignment between all the images. This could be done by letting the user enter a separate set of FDPs for each image (rather than a single set of FDPs that are shared between all images). Then, the system could perform some form of scattered data interpolation to align the points. Pighin (1998) uses a simplified form of this by utilising a small number of feature points to determine the camera position and rotation, and hence, image alignment.

Another approach would be to use a camera calibration algorithm, such as Tsai's method (1986), to work out the positions of the cameras. Tsai's algorithm requires at least seven (optimised calibration requires 11) three-dimensional data points from two views (left-eye camera and right-eye camera) to reconstruct the camera parameters. That is, it can determine the position, direction, scaling factor, focal length and radial lens distortion of both cameras. Obviously, this technique would require modification to work with the Face Styler's multiple views. It must also be remembered that the Face Styler engine supports arbitrary camera angles, rather than just the three orthogonal views used by the Face Styler application.

      1. Calibrate Model
      2. After all FDP points have been moved into place, the system generates a calibrated model. It is then possible to make fine-grained adjustments while the calibrated model geometry is visible. An example of this is shown in Figure 4.11.

        Figure .11: Face Styler showing Calibrated Model

        The model calibration is performed by the FAE. It uses Radial Basis Functions to deform the mesh so that it closely approximates the FDP points (see section 2.2 for a description of the FAE). The FAE only supports calibration from a disk file. Therefore, the system must save the FDP points to disk before it can calibrate the model. While this is done transparently to the user, it significantly slows the calibration process. The system cannot deform the mesh in real-time. Therefore, it is only updated whenever the user changes the view.

        The solution would be to modify the FAE to accept pre-loaded FDP points. However, since the FAE was developed by DIST at the University of Genoa in Italy, it cannot be modified. Any modifications would be invalidated as soon as DIST released a new version of the FAE. The objectives also stipulate that this research must conform to the existing FAE.

        The FAE is shipped with two models: Mike and Oscar. The Face Styler allows either model to be used for final FDP adjustment. It is usually best to switch between both models to ensure that there are no errors with FDP placement.

      3. Generate Head
      4. The generation phase creates an FDP file, along with a cylindrical texture map of the face. The hair is exported as a VRML model, also supplemented with a texture map. The eyes and environment are exported solely as a texture map. This phase is the heart of the Face Styler engine. As such, it is described in more detail in the subsequent sections.

      5. Load into FAE

The face and eye data created by the generation phase can be directly imported into the standard unmodified FAE. However, as MPEG-4 defines only a single FDP to describe the hair, the unmodified FAE will not display realistic hair. To import the hair and environment data, the Face Styler supplies an addition layer to the FAE. This is covered in more detail in section 4.7.

    1. The Face
    2. The face data comes in the form of two files: the FDP file that describes the shape of the face, and the texture map that described the appearance of the face. The FAE can import these files directly, without requiring an interface layer to the Face Styler engine.

      The texture map is created using several phases. Firstly, the system generates a set of views, which establish a way of mapping the images onto the model. Then, the images are texture mapped onto the models, using texture weights to handle areas that are mapped by more than one view. Finally, the texture-mapped model is flattened into a single image using cylindrical projection.

      The FDP file is created directly from the feature points specified by the user. However, since each FDP also contains texture coordinates, this file must be created after the texture map is generated. The FDP texture coordinates are assigned using the same cylindrical projection method used to create the texture map.

      The following sections describe in detail how the texture map and FDP files are generated by the Face Styler engine.

      1. Views

The texture mapping process assigns a set of texture coordinates to every vertex of the face model (one for each image). This is achieved by orthogonally projecting each image onto the model (see Figure 4.12). Pighin et al. (1998) used perspective projection (rather than orthogonal) to map the images onto the model. However, that approach requires exact knowledge of the camera positions, in particular the distance the camera is away from the model. Their algorithms were based on those by Debevec et al. (1998), who dealt with architectural visualisation. Architecture images exhibit a great range of depth, therefore perspective projection is vital to achieve realistic mapping. However, in our situation, orthogonal projection has been found adequate, resulting in no perceptive loss of visual quality.

Figure .12: Mapping Orthogonal Textures

The FaceStyler engine supports any number of images, mapped from arbitrary camera positions around the model. To facilitate this, the system stores each image in a structure called a View. A view encapsulates all the information required to project the image onto the face model. For example, Figure 4.12 depicts five views: Front, Left, Right, Back and Top.

A view stores the following:

The view's camera position, image translation and image scale can be stored in a transformation matrix that maps image-space coordinates to model-space coordinates. That is, the matrix maps the image onto the model. This view matrix (V) is calculated by combining the following matrices:

  • Orientation (O): The camera position is used to construct an orientation matrix. This matrix billboards the image towards the camera.

Note: This is an inverted matrix. Without the inversion, the matrix transforms from model space to image space. However, we need to transform from image space to model space.

  • Image Translation (T): The image translation coordinates (tu,tv) are used to create the translation matrix T.

  • Image Scale (S): The image scaling factors (su,sv) are used to create the scaling matrix S.

  • Image Rotation (R): The image rotation, specified by q degrees (anti-clockwise), is stored in matrix R.

The final matrix (V) can be constructed by concatenating the above matrices in the following order:

V = OTSRC

      1. Texture Coordinates
      2. A vertex may be represented in multiple views. For example, the left side of the nose is visible in both the front and left images. Therefore, each vertex is assigned a set of texture coordinates (one for each view). These coordinates are obtained by transforming the model into view space (rather than the view into model space, as described in section 4.3.2). The required transformation matrix is simply V-1 (the inverse of V).

        The texture coordinates (t) for vertex (v) can then be calculated as follows:

      3. Texture Weights

Each vertex is assigned a set of weights. These weights determine how much influence each view has on the given vertex. For example, the left image can capture the left side of the nose more accurately than the front image. The weights simply quantify this property into a value.

For each view, a weight (w) is calculated based on the following factors:

Once the weights for all views have been calculated, they are normalised to sum to one. The result is a set of blending factors that indicate the how much each view contributes to the final texture. This operation, which must be done for each vertex, is described by the equation below:

      1. Cylindrical Texture Map

Once the texture mapping process is complete, each vertex of the model contains a set of texture coordinates and weights (one for each view). The FAE supports only a single face texture, therefore each view must be combined into a single texture. This is achieved by creating a cylindrical texture map (cylinder map).

Lee et al. (1997) created their cylinder map by projecting each orthogonal image directly onto a cylinder. It was assumed that the front view covered an area from -90o to 90o, the left view -180o to 0o, and the right view 0o to 180o. The overlapping regions were then blended together to create a texture map that covered an area ranging from -180o to 180o.

Their assumption that each view covers a certain region of the face is flawed. Lee et al. (1999) acknowledge this by stating that their approach creates a cylinder map that "has a strong boundary between front and right/left parts. No matter how carefully the photographic environment is controlled, boundaries are always visible." As was discussed in section 4.3.3, different parts of the face are captured by different views. The side of the nose, although in the region of -90o to 90o, cannot be accurately described by the front view. This coverage information is stored in the weights at each vertex of the face model. Therefore, the cylinder map needs to be constructed using the model as a guide.

Figure .13: Cylindrical Projection

To create the cylinder map, we first project the face model onto a cylinder. Then, the cylinder is flattened (as shown in Figure 4.13). The resulting mesh can now be rendered using texture mapping to obtain the correct cylinder map.

The Face Styler engine uses OpenGL to render the cylindrical mesh. Since OpenGL uses a z-buffer to determine the topmost fragment of overlapping polygons, the model must contain some depth information. Therefore, the cylindrical projection of the mesh is not completely flattened. The z component of each vertex is assigned the distance between the centre of the head and the vertex. This ensures that the tip of the nose (which is the furthest away from the centre) is not obscured by a face polygon. An orthogonal projection is used to capture the texture map.

Each view is rendered as a separate pass, consisting of the following operations:

Since OpenGL blending is used, the issue of overlapping polygons becomes very important. We must ensure that the polygons are drawn from front to back. This is relatively easy, since the model is split up into multiple groups, such as nose, eyes, mouth etc. The system must simply render all features in the correct order.

Texture fidelity is another issue that must be considered. Our aim is to generate a cylindrical texture map that contains the highest amount of detail. However, by default, OpenGL considers rendering speed more important. That is, texture minifying uses the nearest pixel in the texture map, rather than averaging neighbouring pixels together. Texture magnification similarly chooses the nearest pixel, instead of interpolating to approximate a better colour. A solution is to enable GL_LINEAR blending for both minifying and magnifying.

The use of mipmapping is not encouraged in this application because we are dealing with large images. There is simply not enough texture memory to hold multiple 1024×1024 images along with their 512×512, 256×256, 128×128, etc, mipmap textures. In any case, the cylinder map is usually generated at approximately the same resolution as the input images. Therefore, mipmapping is not required.

Figure 4.14 shows the cylinder map after each rendering pass.

Figure .14: Cylinder Map Rendering Passes

      1. FDPs

Once the cylindrical texture map is created, the Face Styler engine generates an FDP file that is compatible with the FAE. Each point in that file contains both a location and a texture coordinate.

The location is already available to the system because the user positioned the feature points when aligning them with the input images. These points can be directly used as valid FDP locations because they lie in model space.

Since the location of each FDP is in model space, the system can simply project the points onto a cylinder to obtain the texture coordinates. Instead of mapping the vertices of the face model onto the cylinder, the system simply maps the FDP points onto the same cylinder. The resulting coordinates can then be directly used as the texture coordinates.

Figure 4.15 shows how the model geometry and FDPs map onto the cylindrical texture.

Figure .15: Cylinder Map showing FDPs and Geometry

    1. The Hair
    2. The MPEG-4 specification reserves only a single FDP to describe the shape of the hair. This is clearly insufficient, since hair can come in an unlimited number of different styles. The presents us with the serious problem that no matter how the hair is implemented, it will not be MPEG-4 compliant. Just as importantly, it will not work with the standard FAE.

      To prevent straying too far from the existing MPEG-4 paradigm, the hair is modelled using FDPs. To describe the complex and wide-varying shape of hair, a large number of FDP points are required. The Face Styler uses 101 such FDP points to model the hair. Compare this with the 84 points required to specify an entire face.

      These new FDP points are initially arranged to form a hemi-sphere. The user then adjusts each point using the same methods used to move the standard face points. The FDPs are named such that the major number ranges from 20 to 26, and the minor number from 1 to 20. The major number represents a slice of the hemisphere (latitude). Each minor number identifies a strip of the hemisphere (longitude). This number scheme is clarified in Figure 4.16.

      Figure .16: Hair FDPs

      Figure 4.16 shows that each slice has a different number of points. For example, slice 20 contains 20 points, while slice 25 contains 6 points, and slice 26 only has one. This was done to reduce point congestion around the higher slices. It also reduces the number of points that need to be adjusted to calibrate the hair. Unfortunately, removing points from each slice upsets the strip numbering. To clear confusion, the missing points cause a gap in the numbering. That means point 20.1 aligns with all the other X.1 points, 20.2 with all X.2, etc.

      Table 2 shows the number distribution of each slice:

      Slice

      Present Points

      Missing Points

      20

      1-20

       

      21

      1-20

       

      22

      1-20

       

      23

      1-9, 11, 13-20

      10, 12

      24

      1-3, 5-6, 8-9, 11, 13-14, 16-17, 19-20

      4, 7, 10, 12, 15, 18

      25

      1, 2, 5, 8, 11, 14, 17, 20

      3-4, 6-7, 9-10, 12-13, 15-16, 18-19

      26

      1

      2-20

      Table 2: Hair FDPs by Slice

      Table 3 shows the slices in each strip (same data as in previous table, just arranged by strip instead of slice):

      Strip

      Present Points

       

      Strip

      Present Points

      1

      20-26

       

      11

      20-26

      2

      20-26

       

      12

      20-23

      3

      20-25

       

      13

      20-25

      4

      20-24

       

      14

      20-26

      5

      20-26

       

      15

      20-23

      6

      20-25

       

      16

      20-25

      7

      20-23

       

      17

      20-26

      8

      20-26

       

      18

      20-24

      9

      20-25

       

      19

      20-25

      10

      20-23

       

      20

      20-26

      Table 3: Hair FDPs by Strip

      Unlike the face FDPs, each hair FDP corresponds to a single vertex of the hair model. In future versions, the system may be extended to calibrate a pre-existing hair model. This would unify the way the face and hair are reconstructed. However, in the mean time, the system generates new geometry given the hair FDPs.

      To assign the texture to the hair model, the same procedure as the face is followed. That is, the system generates a cylindrical texture map by projecting the hair geometry onto a cylinder. Texture coordinates are then assigned directly to each model vertex (see Figure 4.17).

      Figure .17: Hair Texture

      The hair model is saved as a VRML '97 file (VRML, 1997). The FAE Integration layer loads this VRML file to display the hair. Currently, this loader can only handle VRML files that strictly follow the structure described in APPENDIX D. That is, it cannot load VRML files generated by another program (unless it follows the set file format).

    3. The Eyes

The FAE natively supports both an iris and an eyeball texture. The iris texture must be named iris.ppm, and the eyeball map must be named eyeball.ppm. Both these files must reside in the same directory as the application, otherwise the FAE will use system default texture maps. Both the left and right eyes are textured using the same maps.

To create the iris texture, the Face Styler extracts the iris from the left eye of the front image. This is performed using the following steps:

Figure .18: FDP Points of the Left Eye

The Face Styler engine only extracts the iris texture. It does not extract the eyeball texture. This is because the eyeball texture does not vary dramatically between people. The system default texture is acceptable for all people.

    1. The Environment
    2. The FAE fills the background of the talking head using a solid blue colour. This makes it look like the head is floating in space. To increase realism, this background should be adjustable to contain an image. That would allow a newsreader to have a television studio in the background, the president the White House, a prisoner a cell, etc. Such a background image provides an environment for the face. It establishes the context, and allows the viewer to understand who the face is.

      The Face Styler provides this mechanism using the FAE integration layer. Any JPEG, PNG or PPM image can be used as the environment. The system automatically scales the image to fill the entire window, while maintaining the correct aspect ratio.

      To prevent the background from overpowering the talking head, the Face Styler allows it be blurred. This slight blurring simulates the focus of the human eye. Having a blurry (out of focus) background means the viewer will concentrate on the face rather than being distracted by the environment. This blurring is achieved by providing the graphics card with a low-resolution texture map. The graphics card then stretches the image using bilinear filtering, which provides adequate blurring.

    3. FAE Integration
    4. The objectives state that the realistic head produced by the Face Styler must integrate with the FAE. However, the FAE does not have the native ability to display the realistic head. Since the FAE cannot be modified, a separate Face Styler FAE integration layer was written. This layer provides access to some of the Face Styler services. The loosely coupled API is very simple, and is explained in the subsequent sections.

      1. The Environment
      2. The FAE integration layer provide two routines that deal with the environment:

        int FS_FAE_LoadEnvironment (const char *fileName);
        void FS_FAE_DisplayEnvironment();

        FS_FAE_LoadEnvironment loads a JPEG, PNG or PPM image. These images can be any size, as the system makes sure that they fit within the graphics card's limits. This function returns zero if the loading failed, otherwise it returns non-zero.

        FAE_Display_Environment uses OpenGL to display a previously loaded environment. This function must be called before the head is rendered. It makes sure that the OpenGL state is not trashed. It also leaves the z-buffer alone. It is safe to call this function if even if FS_FAE_LoadEnvironment failed. In that case, no environment will be rendered.

      3. The Hair
      4. The following API functions are provided by the FAE integration layer to deal with the hair:

        int FS_FAE_LoadHair (const char *fileName);
        void FS_FAE_DisplayHair();

        These functions work similarly to environment API. That is, FS_FAE_LoadHair loads the VRML file that contains the hair geometry. FS_FAE_DisplayHair only renders the hair if it was loaded successfully.

      5. The Face
      6. The FAE integration layer provides three routines to deal with the face:

        int FS_FAE_LoadFace (GeometryData *geometry, const char *fileName);
        int FS_FAE_LoadFdp (GeometryData *geometry, const char *fileName);
        void FS_FAE_DisplayFace(GeometryData *geometry);

        These routines are optimised versions of those provided by the FAE. They operate on exactly the same data structures. As such, they can be used as drop-in replacements. However, their use is optional (especially the load functions, which exist only for convenience).

        The FAE can use it's own routines to display the face. However, it is suggested that FS_FAE_DisplayFace be used instead, as performance will be significantly higher due to the texturing issues listed in section 4.7.4.

      7. Texture Mapping

The standard FAE engine performs texture mapping very poorly. It sends the texture maps to the graphics card at the start of every frame. This severely lowers the frame rate, as the system bus is a performance bottleneck.

The solution is to cache the texture in the graphics card. This can be done in two ways:

The FAE integration layer automatically detects when OpenGL 1.1 is available. In that case, it uses texture objects. Otherwise, it reverts to using OpenGL 1.0 display lists.

The following two functions provide access to this optimised texture caching.

void FS_FAE_SetTextureImage(TextureImageData *TextureImage);
void FS_FAE_ResetTextures();

FS_FAE_SetTextureImage is a replacement for the FAE's own SetTextureImage function. It sets the current texture map using the caching techniques described above.

The FS_FAE_ResetTexture function flushes the texture cache. This is required whenever a new model is loaded. It effectively removes the existing textures from the graphics card's on-board memory. However, most graphics cards automatically unload unused textures from graphics memory. Therefore, the use of this function is not essential, simply recommended.

    1. Summary

The face Styler is an application that allows the user to create a realistic MPEG-4 compliant head by focusing on the face, the hair, the eyes and the environment. This is achieved by acquiring a set of images that depict a person's head at orthogonal angles (front, left and right). The Face Styler automatically converts JPEG, PNG or PPM images into a suitable format. The user then interactively aligns MPEG-4 FDP points with all the images. The system then generates an FAE compatible FDP file that contains these feature points.

To generate the cylindrical texture map of the face, the Face Styler engine first projects the input images onto the head model that was calibrated using the FDPs. Then, this head model is projected onto a cylinder and rolled out into a 2D plane. This flattened model is then rendered via OpenGL using multiple passes. The result is a cylinder map of the face that can be directly imported by the FAE.

The hair is modelled using custom FDP points. Since this is not part of the MPEG-4 standard, the hair is exported as a VRML file. This can then be loaded and displayed by the FAE integration layer.

The texture map of the iris is automatically extracted from the left eye of the front view. The iris is located using the FDP points around the eye. The iris texture map can then be loaded by the standard FAE.

The environment adds an image to the background of the talking head. This is achieved via the FAE integration layer. This layer sits on top of the FAE to provide any required Face Styler services. It has a loosely coupled API that required minimal modification to the FAE to implement.

  1. RESULTS
    1. Overview
    2. The results of the research presented in this paper are very subjective. Realism cannot be measured by numbers, it is a visual quality. In addition, everybody has a different definition of realism. In this research, realism was measured by comparing the heads obtained using the Face Styler with the actual people they are supposed to represent. Therefore, if the head looks just like the target person, then it is realistic. This is obviously a very strict definition of realism, since a head may look realistic even if it doesn’t look like the desired person.

      The subsequent sections present many images that show the quality of the heads obtained using the Face Styler software. However, the images differ in print-form to what they appear on screen. Since the facial animation is targeted at the monitor, it may be beneficial to view the images on a computer screen. The supplemental CD contains all the images presented in this paper. See section 1.3 for more information on where these images can be found on the CD.

      The supplemental CD also contains the Face Styler application, along with the project files used to create the head models. To evaluate the ease-of-use objective, it may be valuable to try out the software. The Face Styler is compiled for Win32 and requires hardware OpenGL acceleration.

      The head produced using the Face Styler application must integrate with the existing FAE. The FAE was developed by DIST at the University of Genoa in Italy. Since the FAE is in constant development, it is important that modification to the source is minimal. Any major changes to the FAE are invalidated as soon as the next revision is released. To show that this objective was achieved, a section of the results show the head models animated using the standard unmodified FAE.

      The final objective states that the head model must render at interactive frame rates on consumer graphics cards. This aspect of the research can be quantitatively measured in terms of frame rate. Therefore, a section evaluates the rendering performance of the texture head.

    3. Realism
      1. The Initial Head
      2. The FAE provides two models that can be used as the base mesh: Mike and Oscar. All heads presented in the forthcoming sections are based on Oscar. Oscar was chosen over Mike because it is composed of more polygons. Therefore, it appears much smoother than Mike. This effect is especially pronounced at the silhouette, where the straight edges of the polygons are clearly visible with Mike.

        Figure .1: The Initial Head Mode - Oscar

        The Oscar model is shown in Figure 5.1. Currently, this is the appearance of Curtin's FAQbot. While the model has human qualities, it does not look like a person. It is clear that it is just a computer model. The aim of this project was to turn this head into a realistic head, a head that is indistinguishable from a real person.

        To achieve this ambitious goal, four key areas were targeted: the face, the hair, the eyes and the environment. Each of these areas improves upon the realism of the face. The results monitor the incremental progress of the face throughout these stages. The final model is then compared with the original input images. Success depends upon how closely the synthesised head matches the original pictures of the subject.

        Before delving into the development of the realistic head model, it is important to show the FAE's current capabilities. The FAE ships with two realistic head models: Loris and Chen. These two models are pictured in Figure 5.2 and Figure 5.3. These models serve as a benchmark for the work done in this research. The aim is to improve upon the realism of these heads, while maintaining the same technology core: the FAE.

         

        Figure .2: Head supplied with FAE - Loris

        Loris and Chen were created based on data gathered by a Cyberware scanner. As such, the resolution of the cylindrical texture maps is very low. The Loris map measures 512x512 pixels, while Chen measures a mere 256x256 pixels. It may not be visible in this document, but this lack of resolution is clearly apparent on screen. The face textures appear very blocky, and in the case of Chen, the individual pixels are clearly visible. This severely distracts from the realism of the face, because it reminds the user of the computer behind the face. That is, it becomes apparent that this is not a person on the screen, rather just a computer simulation. Therefore, this project aims to produce a realistic face where these artefacts are not visible.

        Figure .3: Head supplied with FAE - Chen

        Both Loris and Chen lack realistic hair. This detracts from the realism of the face, as it again becomes visible that this is just a computer model. However, even more important are the eyes. Eyes are immensely personal and vary greatly between people. Humans make eye contact when communicating with another person. Therefore, they will also make eye contact with the talking head. Since the viewer will focus on the eyes, realistic eyes are paramount to achieving a realistic face. Clearly, the default green eyes supplied with the FAE do not look realistic.

      3. The Face
      4. Figure .4: Bernie - Input Images

        Figure 5.4 shows the input images used to create the first head, Bernie. These images were acquired using an Olympus C-2500L SLR digital camera using the highest quality settings. These images measure 1612×1368 pixels, and stored as JPEG files, take up 1.5MB each. The images come directly from the digital camera. They were not filtered through any clean-up programs. However, the image depicting the right side of the face came out over-exposed and was too bright to use. Therefore, it was replaced by a mirrored version of the left image. The Face Styler software allows the image to be flipped, therefore no additional image-editing software was needed.

        The next step is to align the feature points with each image. Initially, the feature points are aligned with the front image. Consulting the diagram that shows the location of each FDP on the face can help with the positioning of these points (see APPENDIX B). Once the points are in position, they are adjusted to fit the remaining two images.

        It was anticipated that this phase would be a very time-consuming and painstaking process. However, after some practice with the Face Styler software, positioning all 76 FDP points (84 minus the teeth and tongue) can be done in 15 minutes. Compare this to traditional face modelling techniques, such patch modelling, which can take many hours to create a realistic model (Flemming and Dobbs, 1999). The trade-off is model detail. A head created using patches has far greater geometric detail (and realism) than a head calibrated using FDP points. However, this lack of detail is offset by using a realistic face texture map.

        Once the feature points are aligned with each image, the cylindrical texture map of the face can be generated. It was found that using the Mike model to generate the texture map produced the best results. The Mike model was specifically designed to be calibrated by MPEG-4 FDPs, and as such, many of its vertices coincide with FDP points. Therefore, the calibrated mesh accurately represents the intended face. This is important during texture map generation because the model must line up with the feature points.

        Oscar is adaptation of a pre-existing model, and it does fit the FDPs as well as Mike. Therefore, a texture map generated using the Oscar model may not line up with all features. When that happens, the generated texture map exhibits artefacts as shown in Figure 5.5.

        Figure .5: Artefacts caused by Poorly Fitting Model

        When displaying the talking head, the Oscar model is superior to Mike. Mike has a very low number of polygons (750), and therefore appears very faceted. This is noticeable especially around the edges. The Oscar model is composed of many more faces (2444). It appears much smoother than Mike. The fact that Oscar does not represent the intended face as accurately as Mike is not very important during the display. The textures simply stretch to fit the model, causing the face to look slightly different than intended. The difference is very minimal, and certainly no cause for concern.

        Figure .6: Bernie - Cylindrical Face Map

        Figure Figure 5.6 shows the cylindrical map generated by projecting the input images shown in Figure 5.4 onto the Mike model. Using this texture map, the head shown in Figure 5.7 is obtained.

        Figure .7: Bernie - Textured Head with Face Mapping

        The quality of this head already surpasses those supplied with the FAE. This is due to the higher resolution texture map obtained by the digital camera. Texture maps acquired using a Cyberware scanner are usually limited to 512×256 samples, and this is clearly visible in the Loris and Chen maps supplied with the FAE.

        It is also important to note that the lighting conditions of the input images do not interfere. That is, the photographs of the front, left and right views were taken at different times. Therefore, the lighting conditions differ between each image. Pighin et al. (1998) went to great lengths to capture all photographs simultaneously so that lighting differences would not affect the final texture map. However, these results, especially the cylindrical texture map shown in Figure 5.6, show that this lighting factor is not particularly important. A similar conclusion was reached by Guenter et al. (1998). They found that incorrect lighting was very subtle and usually not noticeable.

        The next testing head, John, was generated using only single reference image (shown in Figure 5.8). Obviously, in this situation the system has no knowledge of the appearance of the side of the face. Nevertheless, it smoothly interpolated the missing parts to generate a surprisingly good cylindrical texture map (Figure 5.9).

        Figure .8: John – Input Image

        Figure .9: John – Cylindrical Face Map

        Figure .10: John – Textured Head with Face Mapping

        The images shown in Figure 5.10 prove that it is possible to recreate a face given only a single reference image. However, it is still necessary to place the FDP points in a three-dimensional space. Since there is only a single reference image in the xy-plane, positioning the points in the z-plane is very difficult. It was found that this is made easier by loading a side-view image, even if that image does not represent the same person. This allows the points to be roughly placed, so they at least retain correct positions relative to each other. As can be seen in Figure 5.10, this works very well.

        Figure 5.10 also shows that the glasses worn by the subject do not interfere with the texture map. That is, the glasses do not look out of place despite the fact that they are moulded directly onto the skin. As long as the head does not turn too much to the side, the glasses look very natural. The FAE and the FAQbot usually display the face directly head-on, therefore this is not a problem.

      5. The Hair
      6. The heads displayed in the previous section look much better than the ones shipped with the FAE. However, they still lack certain realism. To solve this, something must be added to the top of head.

        Figure .11: Bernie – Textured Head with Hair

        Adding the hair makes a large difference to the visual appeal of the face. Instantly, the head looks more balanced and realistic. However, the hair leaves a lot to be desired. For example, Bernie has very wispy hair that sticks out on the sides. This detail is not visible in Figure 5.11, as the hair looks flat and polygonal (especially around the silhouette). John has much smoother hair, and as such the head show in Figure 5.12 looks more like the person depicted in the input image.

        Figure .12: John - Textured Head with Hair

        To create the hair, the user must place 101 feature points. Unlike placing the facial feature points, placing the hair points is a time consuming process. The face is separated into different features, such as the mouth, nose, left eye, etc. This is makes it easy to work on one feature at a time. However, the hair does not exhibit such features. Therefore, it is much more difficult to adjust a certain area of the hair. It quite common to become lost amongst a sea of hair points. This problem is captured by Figure 5.13, which shows how cluttered the interface can become.

        Figure .13: Cluttered Hair Modelling Interface

        To reduce this cluttering, the hair should be broken into different parts. For example, each slice or strip should be able to be worked on independently of all other points. Reducing the number of FDPs would also reduce the clutter. However, this would require that the system calibrate a generic hair model using the same techniques used to calibrate the face. This was out of the scope of this research, and is discussed further in the Future Work section.

      7. The Eyes
      8. The eyes have a dramatic effect on the realism of the head. The standard green eyes supplied with the FAE conflict with the texture mapped face. They draw attention because they are too bright. Subsequently, the realism of the face is significantly reduced.

        Figure 5.14 shows the final head model. It looks remarkably similar to the person depicted in the input photographs. The only differences are a lack of hair wisps, and a slightly longer face. The difference in face shape can be attributed to the Oscar model not mapping exactly to the specified FDP points. The face may not pass a Turing Test, however the realism objective is fulfilled.

        Figure .14: Bernie – Textured Head with Eyes

        The eyes were a great success. They exhibit far greater detail than first anticipated. This can be seen in Figure 5.15. The iris clearly shows the highlights produced by the two lights in the room when the photographs were taken.

        Figure .15: Close Up showing Realistic Eyes

        The John model suffers from the same issues as Bernie. The face is slightly too elongated, and the hair too polygonal. Otherwise, the face looks remarkably realistic, despite being modelled from only a single input image.

        Figure .16: John – Textured Head with Eyes

      9. The Environment
      10. The environment, although not directly affecting the realism of the head, adds reason and meaning to the face. This is best demonstrated by Figure 5.17. Without the background, the viewer would never know that Bernie is a newsreader. The background provides a context, or setting, for the talking head. That is, the viewer can instantly infer that Bernie is reporting on a high-jump story that made it into the sport highlights of Friday, November 24.

        Figure .17: Bernie - The Newsreader

        The images in Figure 5.18 compare the effects of an out of focus environment. The city contains a lot of distracting detail. However, it is much harder to focus on the city when it blurred. Therefore, in the right image, the viewer's attention remains on the face.

        Figure .18: Bernie – In the City

      11. Miscellaneous Heads

      To test the Face Styler, several more heads were created. Each head was modelled very quickly, without paying too much attention to the accuracy of the face. Nevertheless, each head demonstrates a significant issue of the Face Styler.

      Groucho Marx, shown in Figure 5.19, was created using a single input image scanned from a book (Anovile, 1971). This image is very interesting, as it is low resolution and not in colour. This causes the teeth, which are supplied by the FAE, to look out of place. In a situation like this, the Face Styler should also allow the teeth texture to be acquired. However, in the neutral position, which is a pre-requisite for the face in the input images, the teeth are not visible. Therefore, this head also shows the effects of calibrating the model using a non-neutral face. This is not a problem for still images, however the face cannot be properly animated. As all animation is performed relative to the neutral face, Groucho has a permanently open mouth. Speech is performed by opening the mouth even further. This effect, despite being quite comical, detracts from the realism of the face.

      Figure .19: Groucho Marx

      Michelle and Sally in Figure 5.20 represent female models. They were considerably more difficult to create than a male head because the FAE supplies only male models: Mike and Oscar. Since the cylindrical face map is created by first projecting the input images onto the model (Mike), the subtle differences between a female and a male head caused texture alignment problems.

      Figure .20: Michelle and Sally - Female Models

      The next head shows the versatility of the Face Styler software. Due to its interactive design, the user has a lot of creative freedom. Figure 5.21 shows the Mike model deformed into Lilly the family cat. Obviously, the model was not intended to depict a cat. As such, the head is not very accurate, nor is it suitable for animation. However, it does show that with a little creativity, anything is possible.

      Figure .21: Lilly the Family Cat

      The last model shows that police mug shots are an excellent source of face images…

      Figure .22: Bill Gates

    4. FAE Integration
    5. Figure .23: Bernie - Facial Expressions

      Figure 5.23 and Figure 5.24 show the results of loading the FDP file generated by the Face Styler into the unmodified FAE. This proves that the heads are fully compliant with the FAE, and hence MPEG-4. As these images were taken using the unmodified FAE without the Face Styler integration layer, they do not show the realistic hair. However, it does fulfil the objective that the realistic heads must be easily integrated with the existing FAE software.

      Figure .24: John - Facial Expressions

    6. Performance

To satisfy the performance requirements, two conditions must be met. Firstly, the amount of geometry must not exceed 10,000 triangles. This is the limit imposed by consumer-level graphics cards to achieve a frame rate of at least 15fps. The second condition is all texture maps must fit into the available on-board graphics memory. If this condition is not met, the system needs to shuffle the textures between system ram and graphics ram at every frame. This reduces the frame rate to unacceptable values.

The Oscar model is composed of 2444 triangles. The hair geometry is composed of 180 triangles. Therefore, the combined total of 2624 triangles falls well below the 10,000 limit. Thus, it would even be possible to use a higher-detail model than Oscar.

The realism of the face comes from the high-resolution texture maps. Therefore, keeping within the limits of consumer-lever graphics cards may be difficult. The default face texture map is generated at 1024×1024. This is the texture size limit imposed by most OpenGL graphics accelerators. Therefore, the texture will be loaded at that resolution. The hair texture map is another 1024×1024 image. The eyes and teeth are relatively small images, and require only several kilobytes of memory.

A 1024×1024×3 (RGB) texture map requires 3MB of texture memory. There are two such images, the face and the hair. Looking at the table below shows that at 1024×768×32, there is only 3MB of available texture memory on a 16MB card. Therefore, there is not enough memory for a full screen OpenGL window. However, the situation is not quite that bad. Windowed operation will work because the system only allocates back and depth buffers at the size of the OpenGL window.

Graphics cards with only 16MB of texture memory usually operate at 16-bit resolutions (the recommended bit depth). At 1024x768x16, there is 9MB of free texture memory. This is just enough to hold the two texture maps. However, at 16-bit resolutions the situation is improved even further. Instead of storing the texture maps at 24-bits per pixel, they are downsampled to 16-bits per pixel to match the frame buffer. Therefore, each 1024×1024 texture map consumes only 2MB. At 16-bit resolutions, there is plenty of available texture memory even for full screen operation.

Table 4: Available Texture Memory

The following tests indicate the performance of the FAE on an nVidia Riva-TNT graphics card with 16MB of memory. This card represents the lowest common denominator of consumer graphics accelerators, since it was one of the first cards to fully support OpenGL.

The following three performance tests were executed:

To test performance, a 952-frame FAP file with a target frame rate of 100fps was used. The system will never reach this target frame rate. However, this stops the FAE from introducing a synchronisation delay between each frame. Frame rate was measured using the fps counter built into the FAQbot.

Resolution

Depth

Refresh

Textured

Smooth

800×600

16

85

6

28

1024×768

16

85

6

28

1280×1024

16

60

6

28

         

800×600

32

85

6

28

1024×768

32

85

6

28

1280×1024

32

60

6

20

Table 5: Standard FAE Results (FPS)

Resolution

Depth

Refresh

Textured

Smooth

800×600

16

85

28

28

1024×768

16

85

28

28

1280×1024

16

60

28

28

         

800×600

32

85

28

28

1024×768

32

85

26

26

1280×1024

32

60

20

20

Table 6: Modified FAE Results (FPS)

Resolution

Depth

Refresh

Textured

Smooth

800×600

16

85

85

85

1024×768

16

85

85

85

1280×1024

16

60

28

57

         

800×600

32

85

43

75

1024×768

32

85

43

43

1280×1024

32

60

< 1

< 1

Table 7: Full Screen Results (FPS)

Comparing the results between the Standard FAE and the Modified FAE shows that the texturing bottleneck is the factor that eliminates realtime performance. This is substantiated by the fact that the frame rates of the two FAE's are identical when texturing is disabled. Table 6 shows that using texture objects fixes this problem.

The results also show that the system is not limited by the graphics speed, rather by the CPU. That is, the FAE's performance is bounded by the speed at which it can decode the FAPs, not the speed of the graphics card. Evidence for this can be seen in both Table 5 and Table 6. The frame rate never reaches above 28fps, even when the resolution is lowered. At 1024×768×32, the graphics card starts to become the limiting factor, as frame rate starts to drop. Because performance for both textured and non-textured heads are the same, it can be inferred that the system is not running out of texture memory. The most likely bottleneck is the fill rate.

The Full Screen tests show the performance of rendering the face alone. Therefore, it is a good indication of the graphics card's limits. Table 7 shows that the system can easily handle the realistic head model up to 1024×768×16, because frame rate is equal to the refresh rate (the maximum possible value). At 1280×1024×16 it runs out of texture memory, because there is a large performance difference between the textured and non-texture model. The system never becomes fill rate limited, because even at 1280×1024×16 it comes close to the refresh rate.

Table 7 clearly indicates that the graphics card cannot handle 1280×1024×32 because it does not have enough memory. Both textured and non-textured performance is less than 1fps. This table also shows that texture memory is a big problem even at 800×600×32. Very interesting results can be seen at 1024×768×32. At this point, the system runs out of both memory and fill rate, because both textured and non-textured performance is the same low value.

Given these results, it can be seen that the Riva-TNT is optimised for 16-bit graphics. It can perform at the maximum frame rate until it runs out of texture memory (because the resolution is too high). However, the card does not perform so well in 32-bit, because fill rate starts to become a limiting factor. It cannot reach the refresh rate even at the lowest resolution with texturing disabled. This is a graphics card issue, and is not related to the head model.

The objective that the realistic face executes at interactive frame rates on consumer graphics hardware is fulfilled. Even when the system is in 32-bit mode, it can redraw the model at 43 frames per second. The only remaining bottleneck is the FAP decoder, the FAE itself.

  1. FUTURE WORK
    1. Arbitrary Camera Locations
    2. Although the Face Styler engine supports any number of cameras at arbitrary positions around the model, the interface only supports three orthogonal views: Front, Left and Right. This is limiting, because it means the subject must be photographed from precisely those views. Allowing the user to interactively choose the location of the camera would allow images from many sources, such as magazines and film, to be used. It would bring us one step closer to recreating an arbitrary face from publicly available documents.

    3. Automatic Camera Calibration
    4. The Face Styler application allows the user to align a set of FDP points onto photographs of a person's head. These FDPs must simultaneously align with all the input images. That is, the FDP on the tip of the nose most point to the tip of the nose in all input images, front, left and right. It can become difficult to align all FDPs if the images depict the head at different scales and orientations.

      To simplify this alignment process, the system should let the user enter a separate set of FDPs for each image (rather than a single set of FDPs that are shared between all images). Then, the system could perform some form of scattered data interpolation to align the points. Pighin (1998) uses a simplified form of this by utilising a small number of feature points to determine the camera position and rotation, and hence, image alignment.

      Another approach would be to use a camera calibration algorithm, such as Tsai's method (1986), to work out the positions of the cameras. Tsai's algorithm requires at least seven (optimised calibration requires 11) three-dimensional data points from two views (left-eye camera and right-eye camera) to reconstruct the camera parameters. That is, it can determine the position, direction, scaling factor, focal length and radial lens distortion of both cameras. Obviously, this technique would require modification to work with the Face Styler's multiple arbitrary views.

    5. Hair
    6. The hair requires more work before it matches the realism of a photograph. The silhouette of the hair is currently very polygonal. This reduces the visual appeal of the talking head. To reduce this polygonal appearance, the hair geometry would require further tessellation. To implement this, the system would need to calibrate the hair model using the same techniques used to calibrate the face. This was out of the scope of this research.

      Instead of tessellating the hair geometry, another technique could be used to improve the realism of the hair silhouette. Teams who research geometry simplification algorithms have observed that image quality of low polygon models can be retained as long as the silhouette appears detailed (Gu et al. 1999; and Sander et al. 2000). They dynamically increase the detail of the object around the silhouette, while retaining a low number of polygons.

      Earlier research by Gardner (1984) confirms the hypothesis that the image quality is directly influenced by the object silhouettes. Gardner simulated natural scenes, such as clouds and trees, using textures. To overcome this problem of polygonal object silhouettes, Garner used alpha blending to give the impression of soft edges. For the trees, a high frequency alpha map implied leaves at the edges. This significantly increased the realism of the trees, giving them the appearance of depth.

      This technique could be used to improve the realism of the hair. For example, alpha blended wisps could be added to the hair geometry. For further realism, these hair wisps could be animated.

    7. Ears

    The ears displayed by the FAE do not look very realistic because the texture map is not applied correctly. This is a limitation of the FAE's ear calibration algorithm. Currently, this algorithm causes polygons to overlap and intersect. In addition, some polygons are twisted around so the normals face the inside of the head, rather than towards the viewer. These polygons are then removed during rendering if back-face culling is enabled.

    These ear calibration issues cause problems with the texture mapping. To improve the realism of the head, these issues need to be addressed. This could not be done because modification to the FAE was not possible.

  2. CONCLUSION
  3. The model used by the FAQbot, Oscar, lacks the realistic appearance of a human. While it may sound and behave like a person, it does not look like a person. This detracts from the believability of the head, and consequently, reduces the ability of the head to communicate. Texture mapping can be used to make the head look more like a real person.

    The FAE ships with two head models that utilise texture mapping to improve realism. However, the textures were acquired using a Cyberware scanner and are consequently very low resolution. This lack of resolution is clearly visible when the head is animated. This significantly detracts from the believability of the head.

    The Face Styler allows a high-resolution texture map to be generated from three input images of a person's head. Once these images are imported into the Face Styler, the user can model the head in 15 minutes. Traditional modelling techniques, such as patches, required many hours to create a realistic head.

    The high-resolution input images create a realistic cylindrical texture map. Even if only a single input image is used, the system can create a quality texture map.

    Adding hair to the model significantly improves the appearance of the head. However, the hair exhibits a polygonal silhouette that hiders the realism and believability of the head. This could be improved by adding alpha-blended wisps to the hair.

    The eyes, in particular the iris, makes a large impact on the appearance of the head. The eyes make the head look like the person in the input photographs.

    The environment creates a setting for the talking head. This setting can convey a lot of detail about the role and purpose of the head. For example, a newsreader is typified by the background. However, a background image may distract the viewers attention away from the head. Therefore, the background needs to be blurred.

    The Face Styler can create realistic heads using images acquired from a variety of sources. The results show heads created from images taken using digital cameras, scanner books and photographs, and from public archives on the Internet.

    The heads produced by the Face Styler are represented using FDPs, and are directly compatible with the existing FAE. The heads can even be animated using standard FAPs. However, since the FAE cannot display the hair an the environment, this head lacks certain realism and believability.

    To solve this problem, an integration layer between the FAE and the Face Styler is required. This layer allows realistic hair and an environment to be displayed. It also improves the performance to the point that the limiting factor is the decoding of the FAPs, not the rendering of the head.

    In conclusion, by targeting the face, hair, eyes and environment, a realistic head can be created in 15 minutes. To achieve believability, this head needs to be rendered in realtime. The Face Styler integration layer provides the required rendering performance even on a low-end consumer graphics card. The head generated by the Face Styler achieves the balance between realism and performance to create a talking head that is far more believable than the original Gouraud shaded Oscar model.

    BIBLIOGRAPHY

    Ambrosini, L., Costa, M., Lovagetto, F.and Pockaj, R. (1998). 3D Head Model Calibration Based on MPEG-4 Parameters. The 6th SPACS - IEEE International Workshop on Intelligent Signal Processing and Communication Systems. Melbourne, Australia.

    Ando, M. and Morishima, S. (1995). Expression and Motion Control of Hair using Fast Collision Detection Methods. Image Analysis Applications and Computer Graphics: Third International Computer Science Conference (ICSC) 95, Hong Kong, Springer Verlag.

    Anjyo, K., Usami, Y. and Kurihara, T. (1992). A Simple Method for Extracting the Natural Beauty of Hair. Computer Graphics, 26(2), 111-120.

    Anovile, R. J. (Ed.). (1971). Why a Duck. Studio Vista.

    Beard, S., Crossman, B., Cechner, P., Marriott, A. (1999). FAQbot, Proceedings of Pan Sydney Area Workshop on Visual Information Processing, November. University of Sydney, Australia.

    Cyberware (1990). 4020/RGB 3D Scanner with Color Digitizer. Monterey, California.

    Daldegan, A. and Thalmann, N. M. (1993). An Integrated System for Modelling, Animating and Rendering Hair. Eurographics '93, 12(3), 211-221.

    Debevec, P. E. (1999). Image-Based Modelling and Lighting. SIGGRAPH Computer Graphics Newsletter: Applications of Computer Vision to Computer Graphics, 33(4), 46-50.

    Debevec, P. E., Taylor, C. J. and Malik, J. (1996). Modelling and Rendering Architecture from Photographs: A Hybrid Geometry- and Image-based Approach. SIGGRAPH 96 (August), 11-20.

    Debevec, P. E., Yu, Y. and Borshukov, G. D. (1998). Efficient View-Dependent Image-Based Rendering with Projective Texture-Mapping. 9th Eurographics Workshop on Rendering, 105-116.

    Faigin, G. (1990). The Artist's Complete Guide to Facial Animation. Watson-Gutill Publications, New York.

    Flemming, B. and Dobbs, D. (1999) Animating Facial Features and Expressions. Charles River Media Inc, USA.

    Gardner, G. Y. (1984). Simulation of Natural Scenes using Textured Quadric Surfaces. Computer Graphics, 18(3), 11-20.

    Gasper, E. (1988). Getting a Head With Hyperanimation. Dr Dobb's Journal of Software Tools, 13(7): 18.

    Gu, X., Gortler, S. J., Hoppe, H. McMillan, L., Brown, B. J. and Stone, A. D. (1999). Technical report TR-1-99, Department of Computer Science, Harvard University.

    Guenter, B., Grimm, C., Wood, D., Malvar, H. and Pighin, F. (1998). Making Faces. In SIGGGRAPH 98 Proceedings, 55-66.

    Hall, V. (1992). Speech Driven Facial Animation. Honours Dissertation, School of Computing, Curtin University of Technology.

    ID Software. (1999). Optimizing OpenGL drivers for Quake3. http://www.quake3arena.com/news/glopt.html.

    Ip, H. H. S. and Yin, L. (1996) Constructing a 3D Individualized Head Model from Two Orthogonal Views. The Visual Computer, 12, 254-266.

    Kajiya, J. T. and Kay, T. L. (1989). Rendering Fur with Three Dimensional Textures. Computer Graphics, 23(3), 271-280.

    Koch, R. M., Gross, M. H., Carls, F. R., von Büren, D. F., Fankhauser, G. and Parish, Y. I. H. (1996). Simulating Facial Surgery Using Finite Element Methods. In SIGGRAPH 96 Conference Proceedings, 421-428.

    Kurihara, T. and Arai, K. (1991). A Transform Method for Modeling and Animation of the Human Face from Photographs. In Computer Animation 91, 45-58. Springer-Verlag, Tokyo.

    Lander, J. (1999). Read My Lips: Facial Animation Techniques. Game Developer Magazine, CMP Media Group, June.

    Lavagetto, F. and Pockaj, R. (1999). The Facial Animation Engine: Towards a High Level Interface for the Design of MPEG-4 Compliant Animated Faces. IEEE Transactions on Circuits and Systems for Video Technology, 9(2).

    Lavagetto, F., Pockaj, R. and Costa, M. (1999). MPEG-4 Compliant Calibration of 3D Head Models. The Picture Coding Symposium. Portland, Oregon.

    Lee, W. S., Escher, M., Sannier, G. and Thalmann, N. M. (1999). MPEG-4 Compatible Faces from Orthogonal Photos, In Proceedings of International Conference on Computer Animation 99, 186-194. May, Geneva, Switzerland.

    Lee, W. S., Kalra, P. and Thalmann, N. M. (1997). Model Based Face Reconstruction for Animation. Proceedings of MMM 97 (World Scientific Press), Singapore, 323-338.

    Lee, Y., Terzopoulos, D. and Waters, K. (1995). Realistic Modelling for Facial Animation. In SIGGRAPH 95 Conference Proceedings, 55-62.

    Mauch, J. E. and Birch, J. W. (1983). Guide to the Successful Thesis and Dissertation, chapter 4, 70-73. Marcel Dekker, New York.

    McMillan, L. and Gortler, S. J. (1999). Image-Based Rendering: A New Interface Between Computer Vision and Computer Graphics. SIGGRAPH Computer Graphics Newsletter: Applications of Computer Vision to Computer Graphics, 33(4).

    Moffitt, F. H. and Mikhail, E. M. (1980). Photogrammetry. Harper & Row, Yew York, Third Edition.

    Morishima, S. and Harashima, H. (1991) A natural human-machine interface with model-based image synthesis scheme. In Proceedings of Picture Coding Symposium 1991, 319-322. Tokyo, Japan.

    MPEG (1999). Overview of the MPEG-4 Standard. ISO/IEC JTC1/SC29/WG11 M2725, Seoul, South Korea.

    MPEG Systems, Study of Systems CD, ISO/IEC JTC1/SC29/WG11/N2403.

    MPEG Systems, Text for CD 14496-1 Systems, ISO/IEC JTC1/SC29/WG11/N1901.

    MPEG Video and SNCH, Study of CD 14496-2 (Visual), ISO/IEC JTC1/SC29/WG11/N1901.

    MPEG Video, Text for CD 14496-2 Video, ISO/IEC JTC1/SC29/WG11/N1902.

    nVidia. (1999). Riva TNT2: High-Performance 128-bit TwiN Texel 3D Processor. http://www.nvidia.com/Products/TNT2.nsf.

    nVidia. (2000). OpenGL Performance FAQ v2.0 for NVIDIA GPUs. http://www.nvidia.com/Developer.nsf.

    Parke, F. I. (1974). A Parametric Model for Human Faces. Ph.D. Thesis, University of Utah. UTEC-CSc-75-047.

    Parke, F. I. (1982). Parameterized Models for Facial Animation. In: IEEE Computer Graphics and Applications, 2(9), November, 61-68.

    Pighin, F., Auslander, J., Lischinkski, D. and Salesin, D. (1997). Realistic Facial Animation Using Image-Based 3D Morphing. Technical Report UW-CSE-97-01-03.

    Pighin, F., Hecker, J., Lischinksi, D. Szeliski, R. and Salesin, D. (1998). Synthesizing Realistic Facial Expressions from Photographs. Proceedings of SIGGRAPH 98, in Computer Graphics Proceedings, Annual Conference Series.

    Pighin. F., Szeliski R. and Salesin, D. (1999). Resyntherizing Facial Animation through 3D Model-Based Tracking. In Proceedings of 7th IEEE International Conference on Computer Vision (ICCV) 99, Kerkyra, Greece.

    Rosenblum, L. E., Carlson, W. E. and Ill, E. T. (1991). Simulating the Structure of Human Hair: Modelling, Rendering and Animation. Journal of Visualization and Computer Animation, 2(3), 141-148.

    Sander, P. V., Gu, X., Gortler, S. J., Hoppe, H. and Snyder, J. (2000). Silhouette Clipping. Computer Graphics (SIGGRAPH 2000 Proceedings), to appear.

    Segal, M., Akeley, K., Frazier, C. and Leech, J. (1999). The OpenGL Graphics System: A Specification (Version 1.2.1). Silicon Graphics, Inc.

    Segal, M., Korobkin, C., van Widenfelt, R., Foran, J. and Haeberli, P. (1992). Fast Shadows and Lighting Effects using Texture Mapping. SIGGRAPH 92, 249-252.

    Sheridan, M. (1994). Integrating Synthetic Human Hair Modelling Techniques into the Facial Animation System FAX. Honours Dissertation, School of Computing, Curtin University of Technology.

    Shepherdson, R. (2000). The Personality of a Talking Head. Honours Dissertation, School of Computing, Curtin University of Technology.

    Tsai, R. Y. (1986). An Efficient and Accurate Camera Calibration Technique for 3D Machine Vision. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 364-374. Miami Beach, Florida.

    Turing, A. (1950). Computing Machinery and Intelligence. Mind, 59(236), 433-460.

    Usher, J. (1997). Dynamic Hair Generation. Honours Dissertation, School of Computing, Curtin University of Technology.

    Vannier, M. W., Marsh, J. F. and Warren, J. O. (1983). Three-Dimensional Computer Graphics for Craniofacial Surgical Planning and Evaluation. Computer Graphics, 263-273.

    VRML. (1997). Information Technology - Computer Graphics and Image Processing - The Virtual Reality Modeling Language (VRML) - Part 1: Functional Specification and UTF-8 Encoding. ISO/IEC 14772-1:1997.

    Watanabe, Y. and Suenaga, Y. (1989). Drawing Human Hair using Wisp Model. New Advances in Computer Graphics: Proceedings of CG International '89, Springer Verlag.

    Watanabe, Y. and Suenaga, Y. (1992). A Trigonal Prism-based Method for Hair Image Generation. IEEE Computer Graphics and Applications, 12(1), 47-53.

    Waters, K. (1987). A Muscle Model for Animating Three-Dimensional Facial Expressions. In Proceedings of SIGGRAPH 87, July, 17-24.

                  1. FDP TABLE
                  2. The following table is taken from the MPEG-4 specifications (MPEG Systems, MPEG Video).

                    Feature points

                    Recommended location constraints

                    #

                    Text description

                    x

                    y

                    z

                    2.1

                    Bottom of the chin

                    7.1.x

                       

                    2.2

                    Middle point of inner upper lip contour

                    7.1.x

                       

                    2.3

                    Middle point of inner lower lip contour

                    7.1.x

                       

                    2.4

                    Left corner of inner lip contour

                         

                    2.5

                    Right corner of inner lip contour

                         

                    2.6

                    Midpoint between f.p. 2.2 and 2.4 in the inner upper lip contour

                    (2.2.x+2.4.x)/2

                       

                    2.7

                    Midpoint between f.p. 2.2 and 2.5 in the inner upper lip contour

                    (2.2.x+2.5.x)/2

                       

                    2.8

                    Midpoint between f.p. 2.3 and 2.4 in the inner lower lip contour

                    (2.1x+2.4.x)/2

                       

                    2.9

                    Midpoint between f.p. 2.3 and 2.5 in the inner lower lip contour

                    (2.3.x+2.5.x)/2

                       

                    2.10

                    Chin boss

                    7.1.x

                       

                    2.11

                    Chin left corner

                    > 8.7.x and
                    < 8.3.x

                       

                    2.12

                    Chin right corner

                    > 8.4.x and
                    < 8.8.x

                       

                    2.13

                    Left corner of jaw bone

                         

                    2.14

                    Right corner of jaw bone

                         

                    3.1

                    Center of upper inner left eyelid

                    (3.7.x+3.1 1.x)/2

                       

                    3.2

                    Center of upper inner right eyelid

                    (3.8.x+3.12.x)/2

                       

                    3.3

                    Center of lower inner left eyelid

                    (3.7.x+3.1 1.x)/2

                       

                    3.4

                    Center of lower inner right eyelid

                    (3.8.x+3.12.x)/2

                       

                    3.5

                    Center of the pupil of left eye

                         

                    3.6

                    Center of the pupil of right eye

                         

                    3.7

                    Left corner of left eye

                         

                    3.8

                    Left corner of right eye

                         

                    3.9

                    Center of lower outer left eyelid

                    (3.7.x+3.1 1.x)/2

                       

                    3.10

                    Center of lower outer right eyelid

                    (3.7.x+3.1 1.x)/2

                       

                    3.11

                    Right corner of left eye

                         

                    3.12

                    Right corner of right eye

                         

                    3.13

                    Center of upper outer left eyelid

                    (3.8.x+3.12.x)/2

                       

                    3.14

                    Center of upper outer right eyelid

                    (3.8.x+3.12.x)/2

                       

                    4.1

                    Right corner of left eyebrow

                         

                    4.2

                    Left corner of right eyebrow

                         

                    4.3

                    Uppermost point of the left eyebrow

                    (4.1.x+4.5.x)/2 or x coord of the uppermost point of the contour

                       

                    4.4

                    Uppermost point of the right eyebrow

                    (4.2.x+4.6.x)/2 or x coord of the uppermost point of the contour

                       

                    4.5

                    Left corner of left eyebrow

                         

                    4.6

                    Right corner of right eyebrow

                         

                    5.1

                    Center of the left cheek

                     

                    8.3.y

                     

                    5.2

                    Center of the right cheek

                     

                    8.4.y

                     

                    5.3

                    Left cheek bone

                    > 3.5.x and
                    < 3.7.x

                    > 9.15.y and
                    < 9.12.y

                     

                    5.4

                    Right cheek bone

                    > 3.6.x and
                    < 3.12.x

                    > 9.15.y and
                    < 9.12.y

                     

                    6.1

                    Tip of the tongue

                    7.1.x

                       

                    6.2

                    Center of the tongue body

                    7.1.x

                       

                    6.3

                    Left border of the tongue

                       

                    6.2.z

                    6.4

                    Right border of the tongue

                       

                    6.2.z

                    7.1

                    top of spine (center of head rotation)

                         

                    8.1

                    Middle point of outer upper lip contour

                    7.1.x

                       

                    8.2

                    Middle point of outer lower lip contour

                    7.1.x

                       

                    8.3

                    Left corner of outer lip contour

                         

                    8.4

                    Right corner of outer lip contour

                         

                    8.5

                    Midpoint between f.p. 8.3 and 8.1 in outer upper lip contour

                    (8.3.x+8.1.x)/2

                       

                    8.6

                    Midpoint between f.p. 8.4 and 8.1 in outer upper lip contour

                    (8.4.x+8.1.x)/2

                       

                    8.7

                    Midpoint between f.p. 8.3 and 8.2 in outer lower lip contour

                    (8.3.x+8.2.x)/2

                       

                    8.8

                    Midpoint between f.p. 8.4 and 8.2 in outer lower lip contour

                    (8.4.x+8.2.x)/2

                       

                    8.9

                    Right hiph point of Cupid's bow

                         

                    8.10

                    Left hiph point of Cupid's bow

                         

                    9.1

                    Left nostril border

                         

                    9.2

                    Right nostril border

                         

                    9.3

                    Nose tip

                    7.1.x

                       

                    9.4

                    Bottom right edge of nose

                         

                    9.5

                    Bottom left edge of nose

                         

                    9.6

                    Right upper edge of nose bone

                         

                    9.7

                    Left upper edge of nose bone

                         

                    9.8

                    Top of the upper teeth

                    7.1.x

                       

                    9.9

                    Bottom of the lower teeth

                    7.1.x

                       

                    9.10

                    Bottom of the upper teeth

                    7.1.x

                       

                    9.11

                    Top of the lower teeth

                    7.1.x

                       

                    9.12

                    Middle lower edge of nose bone (ornose bump)

                    7.1.x

                    (9.6.y + 9.3.y)/2 or nose bump

                     

                    9.13

                    Left lower edge of nose bone

                     

                    (9.6.y +9.3.y)/2

                     

                    9.14

                    Right lower edge of nose bone

                     

                    (9.6.y +9.1y)/2

                     

                    9.15

                    Bottom middle edge of nose

                    7.1.x

                       

                    10.1

                    Top of left ear

                         

                    10.2

                    Top of right ear

                         

                    10.3

                    Back of left ear

                     

                    (10.1.y+10.5.y)/2

                     

                    10.4

                    Back of right ear

                     

                    (10.2.y+10.6.y)/2

                     

                    10.5

                    Bottom of left ear lobe z

                         

                    10.6

                    Bottom of right ear lobe

                         

                    10.7

                    Lower contact point between left lobe and face

                         

                    10.8

                    Lower contact point between right lobe and face

                         

                    10.9

                    Upper contact point between left ear and face

                         

                    10.10

                    Upper contact point between right ear and face

                         

                    11.1

                    Middle border between hair and forehead

                    7.1.x

                       

                    11.2

                    Right border between hair and forehead

                    < 4.4.x

                       

                    11.3

                    Left border between hair and forehead

                    > 4.3.x

                       

                    11.4

                    Top of skull

                    7.1.x

                     

                    > 10.4.z and 10.2.z

                    11.5

                    Hair thickness over f.p. 11.4

                    11.4.x

                     

                    11.4.z

                    11.6

                    Back of skull

                    7.1.x

                    3.5.y

                     

                     

                  3. FDP DIAGRAM
                  4. The following diagram is taken from the MPEG-4 specifications (MPEG Systems, MPEG Video).

                  5. FSP FILE FORMAT
                  6. # Comments can be anywhere in the file.

                    # Blank lines are ignored.

                    FaceStyler 1.0

                    View "name of view" =

                    {

                    cameraPos = { x y z } # relative to origin (0, 0, 0)

                    pan = { x y z } # translation

                    zoom = { x y z } # scale

                    fdp = "name of fdp file" # shared between all views

                    image = "name of image file" # optional

                    imageTranslate = { u v } # only present if image exists

                    imageScale = { u v } # only present if image exists

                    imageRotate = n # only present if image exists

                    }

                    ... more views follow ...

                    Currently, the file must contain exactly three views: front, left and right. This is a Face Styler application limit, rather than a Face Styler engine limit.

                  7. HAIR FILE FORMAT
                  8. #VRML V2.0 utf8

                    #

                    # Comments can be anywhere in the file.

                    # Blank lines are ignored.

                    Shape {

                    appearance Appearance {

                    material Material {

                    } #material

                    texture ImageTexture {

                    url "full path name to texture map (jpeg, png or ppm)"

                    } #texture

                    } #appearance

                    geometry IndexedFaceSet {

                    ccw FALSE

                    coord Coordinate { point [

                    x1 y1 z1,

                    x2 y2 z2,

                    ...

                    xN, yN, zN,

                    ] } #coord

                    normalPerVertex TRUE

                    normal Normal { vector [

                    x1 y1 z1,

                    x2 y2 z2,

                    ...

                    xN yN zN,

                    ] } #normal

                    texCoord TextureCoordinate { point [

                    u1 v1,

                    u2 v2,

                    ...

                    uN vN,

                    ] } #texCoord

                    coordIndex [

                    a1, b1, c1, -1,

                    a2, b2, c3, -1,

                    ...

                    aN, bN, cN, -1,

                    ] #coordIndex

                    } #geometry

                    }

                  9. BERNIE'S IMAGE LIBRARY (BIL)

BIL is an image library created by the author of this paper. It allows loading and saving of JPEG, PNG and PPM images. However, due to its plug-in architecture, support for more formats can be easily added. Currently under development are drivers to load and save TIFF, GIF, TGA and RGB images.

BIL provides a very flexible interface to loading and saving images. It does not assume that the images are stored on disk. To encode (save) an image, the user simply requests the encoded data block by block. To decode (load) an image, the user presents the image data to BIL block by block. In essence, the library has a streaming interface.

The following functions are used to create and destroy an image in memory:

BILimage bilNewImage(BILsizei width, BILsizei height);
BILboolean bilDeleteImage(BILimage image);

File formats are registered before encoding and decoding using the following functions:

BILboolean bil{format}RegisterEncoder();
BILboolean bil{format}RegisterDecoder();

eg: bilJpegRegisterEncoder();

Encoding (saving) is performed using the following functions:

BILcontext bilBeginEncode(BILimage image, const BILchar *key);
BILsizei bilEncodeBlock(BILcontext context,

BILvoid *block, BILsizei size);
BILboolean bilEndEncode (BILcontext context);

Decoding (loading) is performed using the following functions:

BILcontext bilBeginDecode(BILimage image, const BILchar *key);
BILsizei bilDecodeBlock(BILcontext context,

const BILvoid *block, BILsizei size);
BILboolean bilEndDecode (BILcontext context);

For example, to load a JPEG image from a disk file, the following code is used:

image = bilNewImage(0, 0);
context = bilBeginDecode(image, "jpeg");
while ((size = fread(buffer, 1, sizeof(buffer), file)) > 0)
bilDecodeBlock(context, buffer, size);
bilEndDecode(context);

There are convenience functions to save and load images to and from disk:

BILboolean bilSaveImage(BILimage *image,
const BILchar *fileName,
const BILchar *driver);
BILboolean bilLoadImage(BILimage *image,
const BILchar *fileName,
const BILchar *driver);

Every BIL function returns BIL_FALSE (0 or NULL) on error. To check the error condition, the following function is used:

BILerror bilGetError();

The above is only a brief summary of the BIL API. You can find complete documentation at the BIL web site: http://www.geocities.com/SiliconValley/7259/bil/.