Video games need to overcome challenges in audio that are not found in other forms of media. Interactive audio researcher Karen Collins  weighs in on this phenomenon, explaining that media formats like film and TV use a passive form of immersion, the participant cannot interact with the experience, this is commonly referred to as linear media. Nonlinear media generally refers to video games and requires an active participant to take part in the experience. A participant can do anything that’s within the game rules at any moment in time, this level of interactivity raises problems or concerns within game audio, as it too, needs to react and respond in sync with the active player.
This leads to further implications, as game developers must find a balance in difficulty, as too easy or too hard can both break a players' immersion. Jonathan Lanier  the audio lead on The Last of Us, describes immersion as a state where a person becomes fully absorbed in an artificial experience. Based on research from Brown, Cains  and Phillips , for immersion to occur, gamers need to pass through various stages starting from; No immersion, to engagement, then engrossment, and finally total immersion. See figure 2 for details.
Audio plays a crucial role in reaching the state of total immersion, something nonlinear media like films do not need to comprehend. This paper will investigate the unique challenges that developer Naughty Dog faced when producing interactive audio for the video game The Last Of Us . This will touch upon;
Ludus - Game rules, boundaries, what constitutes as win/loss
Ludonarrative Dissonance - The disconnect between gameplay and narrative.
Ludic Functionality - Functions that will subtly guide players through the game.
Audio Perspective - Where in which the player should hear sound from
Procedural Sound Design - How to prevent sound repetition.
Dialogue - The implications of dialogue existing within diegesis (Gameworld).
Music - How can music adapt to active changes within the game world.
The Last of Us
A mutated Cordyceps fungus spreads and infects humans throughout the United States, slowly turning them into cannibalistic monsters in a post-apocalyptic world. The story follows Joel and Ellie as the main protagonists, Joel is tasked with escorting Ellie who is a young teenage girl to the fireflies, a rebel post world organisation seeking a cure to the virus. Naughty Dog faced many challenges in their sound department; these dilemmas ranged from gameplay, dialogue and music implementation.
In 1958 Roger Caillois  associated the term Ludus to describe the mechanisms of games (e.g. sports activities, boards games, casino games) and the rules and structures that govern winning, losing and rewarding players for their mastery or luck at the game. The term has since become more defined for video games by Gonzalo Frasca  which describes the term as a set of rules that define victory or defeat, as well as gains and losses. In video games, these rules set the goals, player freedom, and the interaction boundaries for exploring and achieving said goals. Passive nonlinear media does not have to worry about this concept, yet without these rules and boundaries, a video game cannot function.
Ludonarrative / Ludonarrative Dissonance
Once the game rules are set, a developer then needs to think of the Ludonarrative; how the gameplay relates to the narrative. Former Ubisoft employee Clint Hocking  discusses Ludonarrative Dissonance in the video game BioShock , despite the games' success, Hocking points out how the games ludic and narrative functions are consistently conflicting throughout the game, breaking the immersion if one is unable to accept the dissonance. Uncharted Golden Abyss  could also be considered an example of Ludonarrative Dissonance, e.g. After the main character Drake has killed hundreds of enemies throughout the game, he states “That he cannot leave a man to die”, this is a dissonance between the narrative and action during gameplay. Dialogue can play a huge role in video games, and at times this contrast between gameplay and dialogue causes Ludonarrative Dissonance. This is an issue that’s specific to video games; however, game developer Jonathan Blow  discussed in an interview his dislike for the term, while he agreed with its premise, he tried to extend the terminology, by applying it to films. For example, the main protagonist in a movie could be family orientated; however, he could kill many people and return home, back to playing the family man again, giving no thought to his previous actions. However, Ludonarrative Dissonance is unique to video games; movies are a passive experience whereas in a game the player is active, with the intended goal to immerse the player. Ludonarrative Dissonance means the gameplay is no longer connected to the narrative, which can disconnect the player such as the case with Hocking and Bioshock.
Ludic / Audio Functions
Both film and games use functional audio, in film the cocktail party effect can be used to draw the attention of the audience to a particular point or object. Named the cocktail party effect after the natural phenomenon, one possesses to filter sound and listen to what one wants . Films artificially do this by way of a linear mix that filters and controls the gain of various sounds. Games will often try to implement the cocktail party effect, as it can be an effective method of guiding players through the game's narrative, removing the need for additional dialogue or text prompts. For example in the original Assassin's Creed  and its later iterations, incorporates a mechanic known as eagle vision (also affects sound), this mechanic allows gamers to overhear specific conversations, essentially notifying players of their next objective. The issue developers face when implementing such a mechanic, is the active real-time adaptation during gameplay, while also ensuring players are informed with the correct information.
Naughty Dog developed and implemented a similar mechanic called 'Listen mode'. When listen mode is active, Joel or Ellie can filter all audio to focus on the sound of nearby enemies. As the player listens, they can identify the type of enemy, and through a good headset or surround sound system; the player can detect the general direction and proximity of the enemy. Sound Designer Jonathan Lanier  gave insight by describing the use of mix snapshots, these snapshots represent different audio mixing levels for different events. One of the events being listen mode, which adjusts the mix to highlight the sound of enemies, a low pass filter is added to further distinguish sound of enemies, against the environmental audio and music. See figure 3 for details.
This aural information can guide players through the game rules and boundaries while aiding towards the players' mastery of the game . Stevens and Raybould use the I.N.F.O.R.M model , which breaks the functions into six distinct categories; Instruction, Notification, Feedback, Orientation, Rhythm-action and Mechanic. The Last of Us has distinct sounds for each monster type, referring to I.N.F.O.R.M, this would be classified as notification, as it is informing the player of a specific monster. While listen mode is active, a player can detect the type of enemy, direction and proximity; this would be classified as a mechanic, as it has been developed as a player ability, which enhances the players listening (cocktail party effect). This is contributing to the ludonarrative of the game by providing players with information which directly impacts how they play the game. These audio tools help players achieve mastery, while also helping to proceed through the game's narrative. This challenge is unique to nonlinear media, as players need to prepare for the enemy type, and know where the enemy is, without these audio functions the player could face stumbling into enemies, and repeatedly losing during gameplay. Referring to the earlier research on immersion, repeatedly losing will cause players to lose interest in the game, an issue linear media does not have to consider, as it is a passive form of entertainment, and participants cannot lose for consuming.
In Film the camera is a spectator that enables the audience to view the narrative, from here the audio editor can change how an audience is perceiving the audio. Perhaps the audio is from a character's perspective; maybe the audience is experiencing sound from the camera's point of audition (camera perspective), conceivably the audience could be hearing completely unrelated sounds that do not relate to the picture, a form of juxtaposition. In video games, this is a concern, although some of these techniques can be applied to cinematics within a video game; however, a combination of these techniques during gameplay would only cause disorientation for the gamer. Games like The Last of Us face the inevitable obstacle in audio perspective, should the active participant hear from the camera location? (Camera on the player) Or should the perspective be from the character being controlled?. Lanier  discussed this issue during his GDC talk and explained how it did not sound right from the camera's perspective, as the sound became too reactive, easily breaking the immersion from the character. While from the character's perspective, the sound was not responsive enough. This problem is not only specific to games but is also specific to the third person game genre. In first-person games, the solution is much simpler, as it is only natural that sound is heard directly from the character/camera perspective. Lanier details how the team went for the middle-ground approach, placing the audio volume between the character and camera, while audio panning is from the camera's perspective. See figure 4 below for details.
Lanier admitted although the team were satisfied with this system, it could be broken by rapidly and repeatedly spinning the camera around the character, but in the end, they argued why a player would do that? With the only reason being that they specifically wanted to break the audio system.
Procedural Sound Design
Sound design can have a significant impact on film, with the creation of hyper-reality sound, a combination of layering and amplifying sound, making the sound more detailed to how it sounds in reality e.g. ambience, soundscapes, explosions, action and Foley. The sound designer bakes the sound into the picture; once it is finished, it can never be changed. Games, in general, are longer to finish than watching the average ninety-minute film, with some games players can rack upwards of hundreds of hours upon gameplay. In a video game, if the same sound for footsteps, gunshots, explosions etc.; was repeatedly used, it is likely to break the immersion. Collins  iterates on the use of random pitch and gain to break up the monotonous repetition of sounds. Therefore, in The Last of Us, this proposes a unique challenge for the sound design team to overcome.
Procedural sound design treats audio as a system, rather than a one and done wav file . It adds variation and helps prevent repetition; it can be anything from a complex or simple system. It is usually implemented with multiple wav files, processing these files in various ways, e.g., pitch, delay, randomness, gain, concatenation, envelopes etc. Stevens  demonstrates the possibilities in a sound design post, which turns a few wav files of an explosion sound and turns it into multiple outcomes, solving the issue of hearing the same sound repeatedly. Collins  also discusses this technique, and how it can be applied to footsteps, randomly selecting and combining heel-toe movements from a container/pool of audio files. Collins goes on to say this technique in many cases is a necessity, with games working to an allocated memory budget. This is one issue where linear media like film does not need to worry, as the sound designer is in control of the production, and once it is set, it will never change. In nonlinear media like games, the active players dictate the terms, which could be repetitive if there is no variation to the sound. In this instance, The Last of Us was manually analysed, listening to the environments, and triggering multiple explosions and gunshots to test for procedural sound design or linear approach.
After triggering multiple explosions and gunshots, the sound always sounded slightly different, concluding that weapons use a system for procedural sound design
After rigorous testing, it was concluded the environments never sounded like a loop, they always appeared to be evolving, with random animals, creeks, and wind sounding natural and unpredictable. The same could be said for the footsteps when walking through the environments, always adapting the sound to the correct service type, which in itself is a unique challenge to video games unfound in film, which use Foley artists to synchronise movement to picture, with the final result being baked.
Ashley Johnson (Ellie), Troy Baker (Joel) and other voice casts from the Last of Us received numerous awards for their performance during the production, making dialogue one of the strengths of the Last of Us. When compared to film it shares many similarities in driving the narrative, however, with games in general, dialogue creates many unique challenges that need to be overcome, than when compared to film. For instance, in Heavy Rain , a video game which tries to blur the line between film and games, the player takes the role of Ethan, this character has taken his family to a shopping centre where his son Jason goes missing. As the player tries to locate Jason through the crowd, the game prompts the player to call his name.
The game does not call for this action once, but repeatedly. In the end, the player will notice that the same voice samples are repeatedly triggered, where if noticed, breaks the immersion in this otherwise emotional scene. This issue is not unique to Heavy Rain, it can be found in RPG games like Skyrim , or Stealth games like Dishonoured , although it is a different experience when the player interacts with AI characters, it is common for the same voice actor to be playing multiple people, and repeating the same phrases. When attacking or sneaking past enemies, it is not uncommon for them all to share the same voice. Again, these issues are unique to video games and don’t exist within linear media types like film.
Tackling Dialogue In The Last of Us
In The Last of Us, players receive instructions, hints, lore and stories through character companions. One issue Lanier  described is immersion breaking. This is when the dialogue is unintelligible due to the acoustics (reverb settings), and when the player moves away from a companion into another room, the voice is filtered and cut due to the games sound propagation setup (sound waves travelling through the air in the game world). This meant players could be missing out on crucial information. The team decided that in certain parts of the game the dialogue would be kept separate from the standard acoustic and propagation settings. This meant custom dialogue mixes for certain sections of the game; Lanier discusses a reverb sweetener, which is essentially a reverb specifically for dialogue, the team noticed that even when the reverb did not match that of the acoustic setting, that it still sounded natural during gameplay. This prevented the dialogue from becoming unintelligible due to the games' acoustics. However, this did not solve the problem when a character was in another room, to resolve this the audio team allowed the dialogue to pass through obstructions, which would normally prevent or filter the sound from passing through .
It is also worth briefly mentioning the challenges faced by the audio team when implementing music. In linear media like film and TV, the music is baked and has no reason to adapt, video games also contain some linear moments, like cut scenes which share the music baked mentality. Nonetheless, this is where the similarities end, during gameplay, video game music needs to adapt and adjust to the player, for example, a player could be listening to relaxed exploration music, then the gameplay changes to an intense combat state, in all likelihood, the relaxed exploration music would break the immersion. The same applies when switching from combat to explore, if combat music is playing while the player is exploring, the gamer will likely suspect more enemies, breaking the immersion once realised that music is not adapting.
Music In The Last of Us
The emotional music in the Last of Us is mostly reserved for cutscenes, whereas gameplay, stingers are implemented, which are short ornamental music arrangements, usually 5 to 30 seconds in length . These stingers are triggered as Joel or Ellie enter new locations, or go about exploring and discovering the landscape. By implementing a stinger approach, the audio team can ensure the music triggers with player interaction, which could be merely entering a new location. This type of appliance works well for this style of game; if there was music constantly playing, it could distract a player from the sound design work, which showcases the size, atmosphere and enemies in each location. Bajakian  explains how musical stingers can be subtle or complex, but when triggered, they can set the mood of an area, or alert the player to something mysterious. Stingers are one and done, meaning the musical phrase will generally end musically and satisfying, as Bajakian discusses, music that abruptly ends is unsatisfying for gamers.
Jonathan Mayer  music manager on The Last of Us, explains how the final sequence was structured around the music, slowing the enemy attacks down to suit the mood. In this sequence Joel lifts an unconscious Ellie, and carries her to an elevator, there is dramatic music playing as the elevator descends, it is timed to open on a musically appropriate note to end the track, here the music is informing the game when it is appropriate to progress.
Video games face many unique challenges that aren’t applicable to film. Nevertheless, games can benefit from the knowledge that exists within the film world, counting techniques in sound design, point of audition, musical motifs, cocktail party effect and narrative. Developers need to think of the rules that will govern their game, how players win, lose, and get rewarded for their actions, these are essential for player immersion. Developers must consider Ludonarrative Dissonance, and work towards resolving issues between the gameplay and narrative, in The Last of Us, the characters are aware of their actions, and live with the things they have done and seen, this is showcased through dialogue throughout the game. As the developers introduce ludic functionality, it delivers the potential to significantly enhance the immersion, as seen in the Last of Us, a gamer can activate the mechanic listen mode; which acts like a cocktail party effect, notifying the player of the enemy type, with the added benefit of enemy orientation. The ludic functions subtly guide gamers through the game's narrative, aiding players towards mastery of the game. Procedural sound design is a must for game developers, as they cannot afford to have immersion broken through repetitive triggering of the same footsteps, explosions, gunshots and location soundscapes. The audio also needs to stay within the allocated budget, and procedural sound design can offer multiple outcomes with only a few sounds. In the Last of Us, the location soundscapes sounded alive, footsteps were always dynamic, and adjusted to surface types, the weapons always sounded slightly different, with no tell-tale signs of a repeated sample. Developers must consider how to best tackle dialogue, as having multiple characters with the same voice, could break the immersion, as can the re-triggering of the same phrase as is the case in Heavy Rain. The Last of Us allows dialogue to happen naturally throughout gameplay as the player makes their way through the game. Furthermore, developers must also consider how the music will adapt to the player. Finally, it must be decided which method of music implementation will be used. In The Last of Us, the developers applied stingers, which trigger as the player enters a new location, or if something of interest is close by, these ornamental musical forms, help to set the mood to the game's narrative, contributing and complying to the ludonarrative.
“Video games require the player to be actively involved to make decisions based on the action occurring on screen. This active interaction is the most important element that distinguishes the medium. Players are actively involved in determining the outcome of a game, whereas in linear media like film there is no interaction; instead, viewers watch passively”. Sweet .
Collins, K. (2008) Game Sound: An Introduction to the History, Theory, and Practice of Video Game Music and Sound Design [Online]. Cambridge, Mass. London: MIT Press.
Naughty Dog (2013) The Last of Us [Video Game]. PlayStation 3: Sony Computer Entertainment. United States.
Chion, M. & Steintrager, J. A. (2015) Sound: An Acoulogical Treatise. Duke University Press.
Ubisoft Montreal (2007) Assassin’s Creed. PlayStation 3, Xbox 360: Ubisoft. Canada
Jonathan Lanier (2014) Aural Immersion: Audio Technology in The Last of Us [Presentation]. Presented at: GDC, 2014. Available from: <https://www.gdcvault.com/play/1020444/Aural-Immersion-Audio-Technology-in> [Accessed 8 May 2018].
Caillois, R. & Barash, M. (2001) Man, Play, and Games. Urbana: University of Illinois Press.
Gonzalo Frasca (1999) LUDOLOGY MEETS NARRATOLOGY: Similitude and Differences between (Video) Games and Narrative. Parnasso [Online]. Available from: <http://www.ludology.org/articles/ludology.htm> [Accessed 14 May 2018].
Clinton Hocking (2007) Ludonarrative Dissonance in Bioshock. Click Nothing [Online blog]. Available from: <http://www.clicknothing.com/click_nothing/2007/10/ludonarrative-d.html> [Accessed 14 May 2018].
Stevens, R. & Raybould, D. (2015) Game Audio Implementation: A Practical Guide Using the Unreal Engine [Online]. Focal Press, United Kingdom.
2K Boston & 2K Australia (2007) BioShock [PlayStation 3, Xbox 360]. 2K Games. Australia & Boston.
Jonathan Blow (2016) Interview With ‘The Witness’ Creator Jonathan Blow [Online]. Available from: <http://time.com/4355763/the-witness-jonathan-blow-interview/> [Accessed 14 May 2018].
SIE Bend Studio & Naughty Dog (2011) Uncharted: Golden Abyss [PlayStation Vita]. Sony Computer Entertainment. United States.
Richard Stevens (2016) Why Procedural Game Sound Design Is so Useful - Demonstrated in the Unreal Engine. A Sound Effect, 18 January [Online blog]. Available from: <https://www.asoundeffect.com/procedural-game-sound-design/> [Accessed 14 May 2018].
The Last of Us Awards and Nominations [Online]. Awards. Available from: <http://www.imdb.com/title/tt2140553/awards> [Accessed 17 May 2018].
Winifred Phillips (2014) A Composer's Guide to Game Music. Cambridge, Massachusetts: The MIT Press.
Jonathan Mayer (2016) GDC: Music Design: Lessons from The Last of Us and More [Online]. Presented at: GDC, 2016. Available from: <https://www.youtube.com/watch?v=UFLmVsyIQDA> [Accessed 8 May 2018].
Michael Sweet (2015) Writing Interactive Music for Video Games. Upper Saddle River, NJ: Addison-Wesley.
Brown, E. & Cairns, P. (2004) A Grounded Investigation of Game Immersion. In: Proc. Chi 2004, ACM Press, 2004. Press, pp. 1297–1300.
Quantic Dream, (2010) Heavy Rain [Online] [PlayStation 3]. Quantic Dream. France.
Bajakian, C. (2013) Producing Music for AAA Video Games [Presentation]. Presented at: GDC. Available from: <https://www.gdcvault.com/play/1017670/Audio-Bootcamp-Producing-Music-for> [Accessed 24/04/2018].
Bethesda Game Studios (2011) The Elder Scrolls V: Skyrim [PlayStation 3, Xbox 360]. Bethesda Softworks. Canada.
Arkane Studios (2012) Dishonoured [PlayStation 3, Xbox 360, PlayStation 4, Xbox One]. Bethesda Softworks. Canada.