Share

Glasgow’s Dimensional Imaging has pioneered 3D markerless performance capture and is now helping Oscar-winning actors translate virtual characters to screen, writes Adrian Pennington, but can they now help to crack the age-old problem of the ‘uncanny valley’ and bring soul to CGI humans?

A decade ago two Scottish students devised a novel approach to digitally reproducing human emotions that initially found a home in psychology and clinical research. Now their systems are in high demand, helping map an athlete’s or actor’s performance with unrivalled fidelity onto lifelike CG models and all manner of characters for TV series, feature films, games and commercials.

Virtual Clones spun out of Glasgow and Edinburgh Universities where Phd graduates Doug Green and Colin Urquhart were investigating stereo photogrammetry; the creation of 3D models from stereo pairs of images.

“We had an idea to go to video games companies and make virtual versions of real-life people,” explains Urquhart, CEO at the Glasgow-based specialist. “In 2003, that was a bit ahead of its time. The graphics capabilities of the games consoles meant that they couldn’t handle sophisticated 3D scanning.”

Surgical simulations

Rebadged as Dimensional Imaging, the duo found fertile applications for their work in facial surgery and orthodontics research, helping surgeons assess treatment and illustrate how subjects would appear post-operation.

“The Holy Grail in that area is to plan surgery in three dimensions and show patients what they may look like,” explains Urquhart. “That still hasn’t developed into a clinical tool. There is a lot of work to be done on the simulation side.”

A couple of breakthrough sales brought them back into the entertainment sphere. In 2009 Electronic Arts were looking for a system that would allow them to capture highly accurate 3D facial likenesses of athletes for EA Sports’ FIFA 10. The process had traditionally been very time-consuming and reserved only for a few star players.

“EA was one of the first developers to recognise the need for improved capture workflows,” says Urquhart. Lionel Messi and Gareth Bale were two of the stars of the latest in the franchise, EA FIFA 14, whose facial likeness was used for both the game and the related promo We Are FIFA 14.

As sports sims began to focus on casting actual personalities, so developers of fictional titles began to seek more realism in their characters. Also in 2009, Valve Software selected DI’s tools to capture facial expressions of real-life talent in Left for Dead 2

DI’s starting point was pairs of standard digital cameras designed to mimic the human vision system. Information such as 3D shape and appearance (texture) could be captured in a single flash, unlike rival scanning devices that required subjects to remain still for longer.

That basic technique has evolved to embrace multiple sets of synchronised video cameras shooting at high speed – 48 or 60 frames per second rather than 24fps – to capture minute facial nuances. For even greater fidelity, 500fps video can be recorded.

“We use Machine Vision cameras to capture monochrome images which gives us a higher signal-to-noise ratio,” says Urquhart. “Using colour video cameras negatively impacts the quality of data.”

DI’s secret sauce is the ability to track a consistent mesh topology through captured sequences, which makes the data easy to use in subsequent animation pipelines. The result is a facial performance-capture system that doesn’t require any markers or special make-up but is nevertheless able to track thousands of points on the face, capturing the subtlest of expressions and performances.

“There are only so many markers you can put on a face,” he says. “This limits the fidelity of motion capture. Rather than track marker points, we can track through a sequence of frames to get a fixed topology mesh that deforms. That’s what makes our system unique and useful for entertainment.”

A typical set-up comprises three pairs of cameras: two monochrome sets for shape and tracking and one pair for texture and colour. “The fundamental difference between our system and others that capture 3D data at video rates is that our tracking works at the pixel level rather than tracking features on the face,” explains Urquhart. “We use the natural skin texture and the pores.”

The process begins with a template mesh that has the topology the customer wants. A subject sits in front of the cameras under standard video lighting and acts out the scene. A clean audio track can be captured at this point for use in a later ADR session. After reviewing the reference footage the director selects the take they want.

“We fit that to one frame in the sequence to get the initial mapping onto the individual actor, and then we track every single one of the 2,000 vertices,” he says. “The result is a neat topology which deforms directly with the performance. Other systems track a sparse set of data that drives a rig that then deforms a character.”

Meshing data with reality

The data is then exported to animation packages such as Maya for animators to refine the performance. “When you have a 2,000-vertice mesh, you need a detailed, accurate, and well-defined rig,” Urquhart says. “We have our customers go through 100 or more expressions that we track using the detailed mesh to create a rig.”

The creation of believable virtual characters has grown in tandem with the processing power of graphics cards from ARM and Nvidia. A 2012 British Gas spot promoting smarter home energy devised by CHI & Partners featured a dozen facial captured performances using DI’s tools; BBC fantasy TV series Merlin and Atlantis deployed the system to create startling mythological characters alongside VFX outfits Vine and Ten24.

Since the system is portable, DI increasingly finds itself called to attend shoots and films sets on location all over the world, from London to Hollywood. Recently in Montreal, Urquhart captured the facial performance of actor Vincent Cassel for director Christophe Gans’ Beauty and the Beast, and reveals that he has since captured the performance of three Oscar-winning actors for two in-production projects.

The ultimate goal is life itself

One goal of performance capture is to produce a character so photorealistic it becomes indistinguishable from real life. It’s a process that has foundered many times in the past due to the blankness registered behind the eyes – call it soul, if you will – of CGI humans and is commonly known as the ‘uncanny valley’. With subtlety a forté, can DI solve the age-old conundrum?

“We are so sensitive to facial animation that you have to be incredibly careful how it is applied,” says Urquhart, implying that animation has to up its game. “The greater the detail of the facial performance that is captured, the greater the potential for realism, but if the animation itself is less convincing then you will see the uncanny valley.”

There is another school of thought suggesting that performance capture can be rendered more genuine if actors are permitted to act with freedom of movement, rather than the somewhat straightjacketed sitting position. That’s why DI is putting much of its R&D into a head-mounted rig.

“There’s a real desire, particularly in the games market, to do full-body performance motion and facial capture simultaneously,” says Urquhart. “Experienced actors such as Vincent Cassel can deliver brilliant facial performances independent of the scene, but it is much harder for less experienced actors to do so without body movement. As the fidelity of the capture process advances, it means the performance itself has to match and a whole head-mounted camera is a way of delivering that. This is a really important step forward.”

Games developers are also looking for ways to create even greater volumes of realistic performance-captured artists as efficiently as possible. The Finnish creator of Max Payne, Remedy Entertainment, recently licensed DI’s software to create multiple lifelike digital doubles for its upcoming Xbox One title Quantum Break.

With director James Cameron lining up the sequels to Avatar, with its host of performance captured Na’vi, and JJ Abrams likely to use similar techniques (while avoiding the fate of Jar Jar Binks) for the first of a new set of Star Wars films, it would not be outlandish to expect a small office in Glasgow to get a call.

Share