Tethered is an immersive VR strategy game in which you take on the role of a god-like spirit tasked with looking after your flock of ‘Peeps’, the cute but simple-minded inhabitants of the beautiful, floating archipelagos that you gaze down upon from your lofty seat amongst the clouds. This disembodied third person perspective sets the experience apart from the embodied, or implied-embodied, first person perspective employed by most other VR games and had a significant impact upon various aspects of its audio presentation.
My intention here is to highlight a different kind of VR audio experience with the hope that this helps to further our collective understanding of what is common to all VR audio experiences and perhaps also what aspects are more readily applicable to the growing variety of sub-genres and experiences that VR supports.
The VR Illusion
A player’s reaction in VR is much more likely to be instinctual or subconscious – the layer of abstraction that is normally provided by the TV or computer screen, speakers and input device, is dramatically reduced in VR. For audio this means that the familiarity and comfortable habituation that players have accumulated from all the audio experiences they’ve ever had are no longer their only point of reference, they are also referencing their perception of how sound behaves in the real world.
It’s unsurprising, then, that the simplistic ways that we generally rely upon to represent sound in our virtual worlds often don’t translate all that well to VR and can fall short of players’ expectations.
However, not all VR experiences are the same or have the same challenges and requirements. This is easy to overlook when so much insistence is placed on the apparent necessity of the various different audio technologies which have found their true calling in VR. Don’t get me wrong, I love tech , and attempting to replicate the complex ways that sound can interact with its environment and our ears are super interesting problems, and it’s important to understand them and consider whether they are relevant to your project. But audio people, particularly interactive audio people, tend to skew technical in their abilities and interests which makes it all too easy to indulge that side of ourselves and get bogged down in the tar pit of technology. Technology exists to serve the player experience and all of its ludic, artistic, conceptual, dramatic, emotional and metaphysical pillars and pores. As always, you can’t really go wrong if you are evaluating tech through the lens of what the player experience truly needs.
Once I got stuck in to it, it was clear that Tethered didn’t need to invest in complex reverb models, obstruction, occlusion or propagation. It didn’t even really need a 3D audio solution. This didn’t particularly match my expectations of working on a VR game, but the project was nonetheless a great opportunity to try out and learn new things.
Everything is more or less in front of the player in Tethered, so 3D audio is a relatively subtle/nice addition to this VR experience rather than a significant deal breaker. There’s a bit of non-critical height differentiation between events that happen above you (such as the materialisation of clouds and falling eggs) and events that happen below you (which is pretty much everything else) and that’s about it.
The only thing that is guaranteed to be behind the player at some point is a distant flock of circling birds – normally, I probably wouldn’t have bothered scoring this kind of peripheral event, but these small details seem much more significant in VR and I found myself just gazing at the birds and enjoying them. So, I felt compelled to give them some sound support, but this also created the problem of them being rather distracting, especially when positioned behind the player. Placing them on the cusp of consciousness in the mix wasn’t quite enough, so I ended up emulating the way these kind of distant sounds can ebb and flow on the wind to make sure they weren’t an omnipresent menace.
In many respects I think this is representative of my first forays into creating a VR audio experience: try something; it invariably doesn’t quite work; iterate. Whilst this is a familiar way of working for any experienced games maker, the applicability of this across the entirety of the audio experience, coupled with an increased level of attention to detail, meant that there was significantly more to do before reaching a sense of satisfaction and completion with the work.
One of the primary knock-ons from using 3D audio tech is that it requires all of your spatialised sounds to be mono point sources (or composed from mono point sources). This is fine for the most part, but sometimes such overt directionality isn’t desirable. One event in Tethered which didn’t work so well when collapsed to mono is the lightning cloud strike, which you can use to zap marauding creatures or destroy obstacles on the islands. It’s a powerful event and one of the loudest and most visually striking in the game, which is also why it being so directional doesn’t quite work – the lack of any enveloping environmental reflections that you strongly associate with thunder and lightning just isn’t credible and feels off. Similarly, using a 2D stereo sound (i.e. going straight to the headphones) also feels strange because when you move your head around it doesn’t track. So, for this event I decided to try triggering two sounds at the same time.
To start with I had the blast of the initial lightning bolt as positional and the following rumble of the thunder as stereo, but that sounded more or less the same – the blast didn’t really register as positional, perhaps because it was too short to track, but maybe also because my ear immediately caught hold of the stereo tail. But, after some more experimentation, what I settled on was having a stereo blast with an additional (and unique) positional element layered on top, which then gives way to the thunder which is initially stereo but smoothly resolves to being positional. The stereo elements really sell the power and size of the event, especially when juxtaposed against all the other mono-positional sounds in the game, and, whilst it makes little sense on paper, having the thunder resolve to being positional really helps with localisation. There’s clearly plenty of room for experimentation here and it’s going to be cool to see these kinds of techniques progress and mature.
Sense of Place
Reinforcing the sense of place is central to backing up the reality presented to players in VR. Because a strategy game essentially boils down to ‘lots of things happening in the world as a result of the player’s actions’ one of the primary ways of reinforcing the sense of place in Tethered was by paying close attention to the presentation of sounds that take place within it.
Background ambience is clearly an important consideration for all VR projects. However, the world surrounding the player in Tethered is of much less importance than the world they are observing in front of them. As a result, using surround ambiences of some ilk (e.g. first order ambisonics) wasn’t a good fit for this project. So, the backgrounds in Tethered are a mixture of plain vanilla stereo beds (without too much detail in them – it’s primarily air or wind with a wash of birdsong or crickets, depending upon time of day) with some random ambient spots to help create variation (e.g. sporadic birdsong and other wildlife, gusts of wind), and then there are also positional spots placed in the world to accentuate the features of a given island (e.g. a waterfall, an ominous cave, the wind in the trees).
The broadband noise of the waterfalls on some of the islands in Tethered was a bit of a nightmare – they’re huge and impressive which meant they needed a sound to match, but they’re also omnipresent, sucking up room in the mix. I addressed this by having them be loudest at the start of a level, when the player is most likely to be observing them as they survey the landscape, and then reduced them gradually over 45 seconds to move them out of the way. Another solution I considered was to subtly and smoothly ride the volume based on the player’s gaze, but I was put off that idea when it became clear from some other gaze-related issues in the game that just because the player is facing directly towards something doesn’t mean they are necessarily looking at that thing – in fact, it’s entirely possible to be looking at nothing in the world whilst having an absent-minded think! But there is something in this idea even if I couldn’t find a use for it here, and I’m absolutely convinced that once eye tracking tech comes online there is a whole world of intent-related mixing and dynamic audio which is super exciting to think about.
Alan McDermott, Secret Sorcery’s Creative Director, had the nice idea of adding unique ambient sounds to each of the buildings that the player can construct – these are designed to be relatively unobtrusive additions to the overall soundscape, but if the player leans in and gets close enough they ramp up and the player can hear what’s going on inside. It’s subtle stuff, but all of these nice little touches add up to an engaging experience that enhances the sense of place, draws the player in and (hopefully) meets their expectations.
Some of the most important sounds in Tethered are those made by the Peeps. On one level, hearing the various different sounds of them going about their daily business – gathering resources, calling out to each other or humming to themselves as they move between tasks – are simply nice touches that fill the air and help make the world feel vibrant and inhabited. On another level, there’s some crucial gameplay information to be gleaned about what is going on in the world, and where. And on yet another level these sounds compliment the animation and AI code to bring these little creatures to life in a way that is greater than the sum of its parts, and really helps to cement your emotional bond with them. It’s understandable, then, that these sounds received a lot of attention and iteration. One of my fundamental epiphanies was realising that all these sounds really needed to sound like they were taking place in the world, and that this was a great way of backing up the sense of place. I paid more attention to the impact that size, space and distance had on the sounds in Tethered than I have in any other game I’ve worked on.
Tethered always presents an exterior location to the player, and from a similar mid-distance, so this allowed me to bake a nice exterior reverb and EQ in to all of the samples that represent events emanating from the world. This is something I revised throughout development as I got a better sense of what worked, which allowed me to hone in on a sound that I felt was credible and supportive of the world we were presenting. This approach also permitted content-specific tweaks (e.g. different reverbs and settings for different sounds and categories of sound), something that is currently a bit more of an ask when using real-time effects.
Distance attenuation also received quite a lot of experimentation and tweakage over the course of development, starting out with an overtly realistic approach on everything – because that’s what VR requires or the player dies from lies, just like in The Matrix – but ending up with something relatively subtle that merely alluded to distance. Having committed to baking reverb in to samples, dynamically adjusting it in real-time wasn’t an option, but I never got to a point where I felt like volume attenuation and low pass filtering wasn’t enough to appropriately sell the effect – again, the lack of any real close proximity to events certainly helped here because it meant my baked reverb only had to contend with representing mid to far distances.
Similar to my content-specific approach to reverb, most sounds were split in to two main attenuation categories – important gameplay events that must be heard by the player (with a max attenuation of -3dB and LPF at 12.5kHz over 20k distance units), and sounds that are more just part of the soundscape and exist to bring the world to life and could therefore afford to have a more significant roll-off (max attenuation of -48dB and LPF at 16kHz over 60k distance units). It’s interesting that for key events the allusion to distance comes primarily through frequency roll-off – I wanted them to sound affected without being easy to miss in the the mix – but for all other sounds the opposite is true – I didn’t mind them being a lot quieter but I didn’t want them to disappear completely either so I had to keep more high frequencies in there to catch your ear. Not letting the low pass filters go crazy is also beneficial for the HRTFs used by 3D audio tech (or, at least, it gives your ears more information to work with once an HRTF has been applied to a sound), but that was an ancillary benefit to be honest as I was primarily mixing for what felt and sounded right.
Significant visual occlusion exists in some of Tethered’s levels (e.g. an island’s topography might peak in the middle, hiding what’s on the other side and requiring the player to change position to get a better view) but backing this up in an overtly realistic manner with an audio obstruction effect would have had the same detrimental and unfair impact on the player experience as enforcing a realistic distance attenuation effect. I’m sure a subtle nod towards this might have been nice, and the facility was there if I wanted to try using it, but the lack of this never felt problematic and I didn’t want eat more CPU just to tick this box. Perhaps the coincidence of obstructed sounds generally being more distant and therefore sounding more obfuscated due to the attenuation effects kicking in helped to tick this box…
The music in Tethered has much in common with many other VR experiences in that it is quite chilled and offers players some calming reassurance in an effort to counterbalance the fact that being transported to another place is, in and of itself, quite an intense experience. So, the musical palette is generally one of lush string quartet intermingled with analogue synths, warm wind band-like brass and sparse solo instrumentation.
Where the music experience differs is that Alan had tasked me with looking in to using music, or musical sound, to communicate key information to the player in order to reduce the amount of visual UI clutter displayed in the world – Secret Sorcery had identified that this worked against maintaining the sense of presence in VR. So, on top of the ambient bed tracks sits a suite of easy to remember, symbolic musical motifs that the player comes to learn and associate strongly with some of Tethered’s most important gameplay events. When the player hears one of these iconographic music stingers they are informed as to what is happening, the positional audio and head tracking helps them to tune in to where it is happening and this in turn helps them to locate it visually and respond as they see fit.
So, on paper, this sounds like a great idea and, indeed, the end result works well for Tethered, but there was an awful lot of effort required to pull this off. The fundamental issue is that naively playing musical motifs at the same time as an underlying music bed track is a recipe for disaster – you need temporal and harmonic alignment for this to be a pleasant experience. So, I spent a couple of weeks prototyping an interactive music system in Unreal Engine 4 that proved this approach was at least technically viable and then handed it over to Scott Kirkland, Secret Sorcery’s Managing Director and one of their l33t coders, who spent a couple of days making it more robust and scalable for using in the game proper. Once it was up and running, there was then the issue of authoring all the stinger content, which was no small feat given that the stingers were memory-resident samples that had to be recorded from scratch (you can’t use virtual instruments or samples as samples – it’s against their licensing terms) and were segmented in to beat boundaries that could update at any point in their duration to match changes in the underlying harmonic progression of the bed track.
Doing this topic justice would be a whole other article in and of itself, so I won’t go in to more detail here because this piece is already a bit sprawly, and I’m at risk of cannibalising the audience for a GDC talk on this subject. But here’s a video that helps to illustrate the technique:
The main bit of takeaway here is that it’s pragmatic to spend your time and resources in the places that will return the biggest bang for buck on your project, and this might not relate to any of the technologies or approaches that are flavour of the month or that you are keen to utilise or learn more about. Prioritising amongst all the different technologies you could throw at a VR project is crucial, particularly for an independently developed and published title like Tethered.
I think it’s important to mention that the audio experience in Tethered was implemented using stock Unreal Engine 4 without any middleware and I found this to be a good experience! This isn’t an anti-middleware point – I love using middleware on projects where it enables a whole other level of finesse that otherwise would not be possible (Unity, I’m looking at you) – but Unreal’s own audio tools were a great match for this project’s needs despite any apprehensions I had due to the prevailing received wisdom. The most refreshing thing for me was having a totally seamless development environment with minimised abstraction between the audio implementation tools, the game engine and previewing the game. The fact that I was able to prototype and experiment with new functionality myself using the Blueprints system, despite never having used the Unreal Engine before, was of massive benefit to this project.
Working remotely as a freelancer and being an integrated part of the team was made possible by the brilliant support I received from Secret Sorcery. Alan McDermott isn’t just their creative director, he was previously the audio director at Sony’s Evolution Studios and it’s this level of AAA audio insight from someone in a senior creative position at the company that enabled the audio experience in Tethered to be a central part of the player experience – Alan was sowing audio seeds long before I was brought on to the project to help realise them. Similarly, having the managing director of a company as your audio coder is a real perk. This wasn’t the tragically familiar case of Scott “doing audio” because it fell in to his lap – he’s a massive fan and champion of audio, appreciates its contribution towards the player experience, and was brilliantly responsive to all of the hair-brained things that Alan and I cooked up.
There is no substitute for teamwork and close collaborations – this is where the good stuff comes from and it deserves as much of your attention as any other aspect of a project, be that your own chops in your given craft or the technology you choose to use in it or develop for it. If you are working with reasonable people, there is a strong relationship between being seen to do the right thing for a project and being given the trust, freedom, encouragement and support to take that to the next level. Sound is viewed primarily as a utilitarian and technical discipline by most non-audio professionals – merely a box that needs to be ticked. If you feel alienated, uninvolved or unsupported, consider whether this is because your words and actions are confirming this false suspicion.
Creating the audio experience for Tethered wasn’t so much about slavishly recreating reality, it was more about alluding to it with a focus on backing up the world presented to players, being hyper-aware of the importance of this and, as a result, paying attention to the subtle details that helped to fulfil this function. Additionally, when there were problems that didn’t have easy solutions, but were clearly central to making the experience work, spending time on these paid dividends and helped to create a unique experience. I think all of that is equally applicable to any game and, in that regard, VR audio is nothing new – the overall approach hasn’t changed, the player experience has and this is where the knock-ons stem from.
I think it’s quite easy to dismiss a lot of the solutions I’ve discussed here because they quite clearly could have been used in a non-VR game. This is both true and totally misses the main point I’ve been trying (and possibly failing) to get across – none of the technologies or approaches in VR are new or unique; they have and are being used in non-VR games too. What’s new is that VR puts a pressure on your approach which pushes you in directions that you are unlikely to have considered, unlikely to have been able to justify, and unlikely to have had the determination to persevere with were it not for the amazing sense of presence and the requirement to back that up. If VR game audio is anything, it is this.
Another way of looking at it is that it’s important to differentiate between the requirement to present the virtual world in a way that meets the player’s expectations versus how you go about achieving that. The former is VR 101 and common to all VR experiences, the latter is project-specific (possibly sub-genre-specific) and no amount of reading up on “VR audio” is going to tell you precisely what the right thing to do is on a given project, it’s simply going to make you aware of what other people have done on their projects and possibly inform aspects of the approach you choose to take.
There are no rules.
Go nuts :)
1 – I don’t think that the terminology of “first person” and “third person” makes a great deal of sense in VR – arguably, all VR experiences are intrinsically first person irrespective of ‘camera placement’ because the player is always directly experiencing the world through their own eyes and ears and very much in control of where they are looking. So, perhaps it makes more sense to consider whether the player is embodied (i.e. looks down and sees their body) or disembodied (looks down and sees no body). That’s not a fool-proof distinction either because there are VR experiences where you have no visible body but you do have disembodied floating hands which make it feel like you have a body present in the virtual world – “implied embodiment”. And then there are first person experiences where the player isn’t playing ‘themselves’, but is role-playing as someone else. This can work brilliantly if the player is invited to participate, but if handled clumsily it can easily bump the player out of the experience (e.g. you are given hands that you can’t move, or your character speaks on your behalf, or someone speaks to your character in such a way that it doesn’t feel like they are talking to you) – “implied disembodiment”. What I find interesting is that these are all issues in non-VR games too, but the lack of abstraction in VR puts a magnifying glass on them – the player’s sense of personal identity is stronger when they are in the virtual world rather than merely peering in to it. Indeed, I think this puts a welcome pressure on the ability for players to communicate directly and naturally with NPCs. I’ll wrap-up this rambling brain-fart of a footnote by asking you to consider the possibility of the player being an acousmêtre that can observe and haunt other inhabitants in the world as a disembodied voice. I think that’s a pretty rad concept :)
2 – I know these distance units are essentially meaningless, but consider them relative to each other. FWIW, 20k units equates to something like the distance from your viewing position to the middle of an island (“just in front of you”), the result being that important events are perceived to be either “near you” or “not near you”, which is a useful cue for the player. 60k units is a little less than the length of the largest islands in the game, so a sound which is waaaaaaaaaaaay over there is going to be on the threshold of hearing – audible if you choose to tune in to it in a quiet moment, but otherwise out of the way.
This article originally appeared on DesigningSound.org