At the inaugural Develop Conference in Brighton I spoke about the differences between sound in the real world and sound in the virtual worlds that we create. Right on cue, the day before my presentation, Mark Rein, VP of Epic Games, claimed in his keynote session that:
“the future of gaming is not Donkey Kong, it is these big expansive worlds”
It seems pertinent to examine the fundamentals of our craft at a time when much of the industry is evangelising the idea of complex, immersive virtual worlds and promising to deliver photo-realism, HD graphics and advanced surround sound technologies in their games. Surely, when creating a virtual world, we should emulate the real world as closely as possible? It turns out that, as ever with interactive media, the task of recreating reality is not quite so straightforward…
At the School of Sound conference in 2001, Professor Paul Robertson, a musician who has spent much of the last 20 years working with neuroscientists to try and unravel the mysteries of music and the mind, opened his presentation with some fascinating facts about the brain. One point in particular has stuck with me ever since, albeit sketchily paraphrased from memory:
Music is not processed in just one specialised part of the brain – so far we have identified 12 different areas which can be utilised when listening to music, one of which only shows activity when listening to bird song! Even after a severe brain trauma the chances are that the brain will still be capable of processing music to some degree. In other words – if you’re not musical you’re brain dead.
The brain is amazing. And it is wired for sound.
Our brains enable us to create and appreciate art. Whilst the technologies which allow us to make games are astounding, the reason we even bother at all is that, by some twist of evolutionary fate, we have the ability to ignore that games are totally and utterly ridiculous. Think about it; you can sit in front of your monitor or TV and, with the lights down, sound turned up, interfacing via some manner of controller, save the world from those pesky aliens. Again.
You have to be a willing victim to become immersed in any virtual world, be it conveyed via a game, film, broadcast or book. Film theory has the concept of the film/audience contract whereby an audience fulfils their obligation by buying-in to the film’s virtual world so long as the film fulfils its obligation of providing an entertaining experience. Similarly, the game/player contract ensures that no matter what we throw at the player, as long as it is explained competently and is entertaining, they will accept it.
In his book Audio-Vision, film sound theorist Michel Chion takes this one stage further by naming this the audio/visual contract to emphasise that sound and image are two separate entities which are only fused together in the minds of an audience. This is a phenomenon he calls synchresis (synchronism and synthesis).
The audio/visual contract and synchresis are what allow our virtual worlds to be entertained. They also permit wondrous things such as acousmatic sound; sound which has been disassociated from its source. For example, combining the sound of an axe chopping wood with the moving image of a bat hitting a ball will create an incredibly powerful strike which adds up to more than the sum of its parts.
Before we use sounds in our virtual worlds we must first capture them. Capturing anything involves a transformation; the caged animal is a different beast to its free-roaming brethren. A sound in the real world is free to interact with its environment in an infinite number of ways. A captured sound lacks this vivacity; it is flat, static, a shadow of its former self. Therefore, your choice of microphone and its location are, whether you are conscious of it or not, a manipulation which takes place before a sound is even recorded.
Microphones do not hear what our brains hear. The brain is capable of some rather impressive filtering whereas a microphone records whatever hits its membrane. However, these filtering skills are severely diminished when you are listening to a sound recording. For example, when listening to me give my presentation the audience were (hopefully!) tuned in solely to the sound of my voice, whereas if you were to listen to a recording of my presentation you would be distracted by the room’s acoustics, the aircon, the projector’s fan, general noise from the audience and that damn loudspeaker on the right which buzzed throughout the whole day. In our virtual worlds we need to perform this filtering on behalf of our brains, which is the process of mixing. Dynamic, real-time mixing is a relatively new frontier but it is a high priority in our next generation games.
Our perception of the real world is so much more complex than that offered by even the best binaural recording. Even if our virtual worlds could precisely mimic sound in the real world and be photorealistic, we’d still just be sat there in front of our monitor or TV. Fortuitously, sound and music can be used to represent some of the information which is missing from the virtual world.
Music, in particular, is like monosodium glutamate for the ears (MSG is the chemical they add to snacks and some Chinese foodstuffs to make them taste better than they intrinsically do). The use of music in virtual worlds is especially interesting as it is such an incredibly abstract concept; our lives are not scored by music. In a virtual world, music can be used to intensify an experience and act as an emotional signifier without revealing itself as a manipulative device. Whilst there is no doubt that this is a very powerful tool when used well, it is in no way realistic.
It’s behind you
Everything I’ve covered so far has, for the most part, been applicable to the virtual worlds of both films and games. Surround sound is an area where the two are not so closely aligned.
In film, surround sound is largely concerned with the use of diffuse sound; hence the surround channels being represented by a barrage of speakers along the walls of the theatre. Film established fairly early on in its experiments with surround sound that directional sound was distracting to an audience as it diverted their attention away from the screen/virtual world. Game cinematics tend to stick to the conventions of film surround sound, but in-game sound uses the surrounds in a directional manner. This divergence is as a result of the contrasting voyeuristic nature of film against the participatory nature of games. It’s also because we lazily let the game handle most of the panning in a rather simplistic fashion, though you can expect our use of surround sound to become more sophisticated as we take advantage of discrete surround panning. It should be noted that a game which has surround sound differs greatly from a game which really uses surround sound to its advantage.
Whilst some directional cues can be beneficial to the player it is folly to think that directional surround sound offers any exactness. Despite this apparent failure to represent the virtual world accurately, the audio/visual contract ensures that as long as this doesn’t infringe upon the gaming experience it will be readily accepted by the player.
Another area of surround sound which is evolving is the way in which surround music is mixed. Even though there is nothing realistic about the use of music or surround sound in our virtual worlds, there is a debate about whether music should make overt use of the surround channels. Some argue that players will find it strange that music is coming from behind them, but proponents of this argument fail to take in to account how strange it is that, irrespective of locus, there is any music at all. Anyone who dismisses surround music hasn’t heard it done well, if at all.
Dialogue in our virtual worlds is significantly different from every-day conversation in the real world. The fundamental difference is caused by dialogue being a vocalisation of the written word. Genuine conversation is a chaotic stream of consciousness littered with interruptions, mispronunciations and other utterances that we are astonishingly good at deciphering. This is in sharp contrast to dialogue which tends to:
- consist of complete sentences
- pause at the end of each sentence
- be well thought through and articulate
- be absolutely packed with information
- stay clear of naturalism unless reflecting the emotional state of a character
The only thing that dialogue shares in common with spontaneous speech is that it is spoken aloud.
Similar to the process of mixing, whereby we recreate the brain’s ability to focus on that which is most important, dialogue distils everyday speech down to its essential function; conveying information. In games we tend to overuse dialogue and force it to carry all the weight of the narrative and exposition. This is in spite of dialogue being but one method of communicating information to the player. Here, we should try and emulate the real world more by taking some of the weight off of dialogue and using other aspects of the virtual world to tell the story and communicate information to the player.
Sound in a virtual world works in a decidedly different way to sound in the real world. Thankfully, our brains are quite forgiving of the relative discrepancies between the infinitely complex of the real and the constructed, refined simplicity of the virtual. The best way to recreate reality is by emulating our brain’s response to it. Only then can we begin to create a plausible, entertaining experience for our audience and fulfil our half of the audio/visual contract.