Sound
Film sound generally comprises three elements: dialogue, music, and sound effects. Our typical experience of film sound is that it builds out the diegetic world of the film, creating a realistic effect even if the diegesis itself is not realistic. Footsteps sound like footsteps. A scene in a restaurant includes the sounds of silverware clinking and people chattering. Even if the sliding doors on a spaceship don’t exist in reality, we accept that the whoosh sound they make is consistent for that diegetic world.
But as much as sound contributes to a film’s “realism,” or as much as it seems to arise “naturally” from the screen, it is essential to remember that the soundtrack is entirely independent from the image track. What you hear in a movie is often not recorded when the image was filmed, but added later. The soundtrack is the composite result of sound editing (the collection and recording of different sounds) and sound mixing (the assemblage of those sounds into a unified whole, controlling for different levels of volume and pitch). Even though it often does, the soundtrack does not have to bear any connection to the image track. This is why we avoid characterizing film sound as an audio illustration of the images. One should avoid the assumption that sound is subordinate to the visuals. Sound in fact has a powerful effect of how viewers perceive moving images.
As an example, consider an experimental film titled The Girl Chewing Gum (1976), directed by British filmmaker John Smith. The 12-minute short has a remarkably simple set-up but it comes with a twist. The majority of the film comprises a single shot of a London street scene as ordinary citizens walk by, cars drive down the road, and birds fly in the air. It is a documentary record of a day in London. However, on the soundtrack, we hear an unseen narrator dictating instructions and barking out stage directions. He says, for example, “Now I want the old man with white hair and glasses to cross the road. C’mon, quickly. Look this way. Now walk off to the left.” The man in the film appears to respond directly to these offscreen directions.
What is actually documentary footage of everyday life suddenly seems like a scene in a fiction film. By adding his own voiceover, everything in the image appears to follow his commands, as if he were directing reality. Even time itself seems to respond to him. When the camera frames a clock tower, we hear the voiceover say, “I want the long hand to move at the rate of one revolution every hour, and the short hand to move at the rate of one revolution every 12 hours.” Time only moves forward because the director says so! The clever conceit of Smith’s film shows the interpretive power of sound. Sound directly affects what we perceive in an image. Far from being secondary to the image – only an add-on for the effect of realism – sound is of primary importance and can radically alter how we understand what is represented.
This chapter outlines film studies terminology related to sound.
SOUND AND STORY
Diegetic or non-diegetic. Like many other elements of film form, sound can be described in terms of its relationship to the diegesis of any narrative film. Diegetic sound is any sonic element that originates from or can be said to be located in the story world of the film. Is the sound audible (or potentially audible) to a character within the fiction? Character dialogue, the ambient sounds of a location where a scene takes place, a song playing on the radio of the protagonist’s car: all of these are examples of diegetic sound. Non-diegetic sound, by contrast, is any sonic element that originates outside the diegesis. It is a sound that is audible to the spectator but not to a character. One example would include the film’s musical score. When Jack Dawson (Leonardo DiCaprio) steps to the helm of the Titanic to proclaim that he is “king of the world,” he obviously cannot hear the swelling strings that accompany that moment.
Filmmakers often cross the boundary between diegetic and no-ndiegetic sound in order to play the expectations of the viewer. At the opening of The Father (Florian Zeller, 2020), for example, an instrumental score accompanies images of Anne (Olivia Colman) walking to her aging father’s apartment. The score is clearly non-diegetic at this point, but when she enters her father’s study, he is seen wearing headphones. When he removes them, the score drops significantly in volume, as it now is heard coming from the headphones. What had been understood to be non-diegetic sound is revealed to have a diegetic source. What might explain this boundary-crossing between diegetic and non-diegetic registers? The Father is about an elderly man – Anne’s father Anthony (Anthony Hopkins) – suffering from dementia, and the film uses subjective narration throughout to visualize the disorienting and confusing experience of extreme memory loss. Multiple actors play the same character, for instance, and elements of the setting change without notice, making it difficult to follow a clear timeline. Given this formal experimentation, the play with sound at the beginning of the sound takes on an added dimension. We can recognize it as the first instance of something we take as reality (the standard use of an instrumental score) to be part of Anthony’s subjective experience (music playing in his headphones). The film will execute the same maneuver repeatedly, taking what we think to be objectively true within the narrative and subsequently revealing it to be something else entirely. In this way, it formally captures the subjective experience of the dementia patient.
Internal or external. Diegetic sound can be further subdivided into internal and external, depending on where it originates within the diegesis. Just as a point-of-view shot provides the subjective perspective of a character, showing the viewer what they see, sound can be used to provide access to the thoughts and feelings of a character. We call this internal sound, meaning that its source is located within the mind of a character. A voiceover that narrates what a character is thinking about, or an auditory hallucination, would be included in this category. More common is external sound, which can be sourced to the world of the diegesis, potentially audible to any character. Dialogue is an obvious example, as is the ambient noise of the setting.
SOUND-IMAGE RELATIONS
In film, the image track and the soundtrack are independent of each other. This is a fact that is easy to forget, since it feels entirely natural to us that we should hear speech when we see a person’s lips moving or a sound when an object carries out its function (what’s a kettle without its whistle or a door without its slam?). But it is essential to remember that what we hear in a film is the consequence of intentional choices, a matter of purposeful design, and that the sounds that accompany the image were often not recorded when a scene was filmed but added later.
Because they are independent of each other, sound and image can be assembled in different arrangements of space and time. This section will outline some of the possible arrangements of sound-image relations.
Offscreen or onscreen. Offscreen sound refers to any sonic element whose source is located outside of the frame. By contrast, onscreen sound refers to any sonic element whose source is visible within the frame. In fact, sound is commonly used and efficient way that a film can call attention to an offscreen space. Horror films routinely do this, using the sound of a scream or creaking floorboard to suggest the presence of something monstrous just out of view. A specific variation of offscreen sound is the voice-off, which is when a character speaks dialogue offscreen. Alfred Hitchcock’s Psycho (US, 1960) provides a well-known example. Throughout the film, the mother of Norman Bates (Anthony Perkins) is only ever heard offscreen. Those who have seen the film will know the reason: the voice we hear is not Norman’s mother, but Norman himself, pretending to be his mother, who died years ago. The use of the voice-off allows Hitchcock to withhold this revelation until late in the film.
Synchronous or asynchronous. Sound that is in temporal alignment with the image, where there is a coincidence between what is seen and what is heard, is synchronous sound. Asynchronous sound therefore refers to when sound and image fall out of temporal alignment. Synchronous sound is the normative standard for film sound, but it is an achieved effect, lending credibility to the perceived realism of a film’s diegesis. Any user of TikTok knows well the diminished effect that results when a creator fails to seamlessly lip-synch to a sound clip. However, though it is infrequent, asynchronous sound can be used for expressive ends by filmmakers, either for dramatic or comic effect. For example, in All About My Mother (Pedro Almodovar, Spain, 1999), a mother witnesses the death of her son when he is struck by a car. Almodovar presents the moment through a subjective camera movement, as if from the son’s point of view, and the mother’s words of anguish are misaligned with the movements of her mouth when she cradles his body. It is as if the pain of her loss rends apart the soundtrack and the image track, tearing the fabric of the film itself. A comic use of asynchronous sound is found in Singin’ in the Rain (Stanley Donen, US, 1952), a film that is about cinema’s transition to sound in the late 1920s, when some silent film actors experienced difficulty adapting to the new demands of recorded dialogue. The film’s story centers around the production of a film titled The Duelling Cavelier that has to be converted from a silent film to a “talking picture.” A series of technical glitches arise at the preview screening of the film, one of which involves the film falling out of synchronization. Dialogue of one character is heard out of the mouth of another character, creating a comedic series of unintended meanings.
Simultaneous or nonsimultaneous. Simultaneous sound and synchronized sound may appear to be the same thing, but whereas synchronization is a technical achievement that makes the image and sound coincide, the simultaneity or nonsimultaneity of sound refers to the distinction between story and plot. Simultaneous sound describes whether a sound is present at the same time as its source. This source can be onscreen or offscreen, but it is understood as occurring simultaneous to the sound it emits. This is the normative standard for film sound. However, just as the plot can shuffle the order of story events, sound and source can be separated from each other. An example of nonsimultaneous sound is a device called a sonic flashback, where some sonic element (whether dialogue, music, or sound effect) from an earlier moment in the film is repeated on the soundtrack. An illustrative example of this device occurs at the conclusion of The Usual Suspects (Bryan Singer, US, 1995). The events of the film are narrated from the perspective of Roger “Verbal” Kint (Kevin Spacey) as he recounts what happened to a police detective. Without revealing too much of the film’s twist ending, the detective eventually realizes that Kint has been lying during the entire interrogation about what happened. A series of sonic flashbacks accompany this realization, as the soundtrack repeats dialogue from Kint over the course of the interrogation that allows viewers to reevaluate the meanings of what they hear.
DEVICES RELATED TO SOUND
Sound perspective. Sound perspective refers to the correspondence of a sound’s acoustic qualities (such as pitch and volume) to the relative position of its source to the camera. The camera provides the viewer’s point of entry into a narrative. We experience the diegesis from its perspective. Since the position of the camera can vary significantly relative to the story action, sound often reflects this shifting perspective. In other words, just as in real life, if the source of a sound is far away, then the sound it makes will be quieter, or if its source is close, its sound will be louder. This alteration of sonic elements depending on the position of the camera is described as maintaining sound perspective. Careful attention to sound perspective in a film’s sound design contributes the creation of a diegetic world that feels like a fully realized three-dimensional space. It is often easy to notice how sound perspective is preserved in scenes that take place as characters are driving in a car. The interior of a car sounds different from the environment outside of the car. Inside, dialogue is clearly audible and perhaps so too is music from the radio. Street noise and the sounds of traffic are comparatively dampened. If the film cuts to a shot where the camera is positioned outside of the vehicle, perhaps mounted to the hood and pointed at the front windshield, then this street noise will be noticeably louder while it may not be possible to hear the dialogue or car radio any longer. French director Jacques Tati makes comic use of sound perspective in his film Playtime (France, 1967). A large section of the film takes place in a modern office building with glass walls separating interior and exterior spaces. The building has a distinctive buzzing as its room tone, so when Tati cuts freely between camera positions inside and outside a room or the building, the room tone appears and disappears on the soundtrack in response. The fact that we notice this underlines Tati’s intention to show how modern life has altered our acoustic environment – for example, its artificial sounds crowding out the sounds of nature.
Sound fidelity. Like sound perspective, sound fidelity is an aspect of film sound that assesses its faithfulness to reality. To preserve sound fidelity means that there is a realistic correspondence between a sound and its source. In other words, the source emits the sound we expect it to: a car horn sounds like a car horn, a plate breaking sounds like a plate breaking, and a flute playing like a flute playing. Because much of a film’s soundtrack is added in post-production, a filmmaker is under no obligation to maintain this correspondence, and will deviate from it for expressive purposes. One example of breaking sound fidelity can be seen in the early sound French comedy Le Million (Rene Clair, 1931). The film concerns a lost winning lottery ticket, hidden in a coat pocket. At one point, several characters jostle for possession of the coat, and when they bump into each other, the director adds the sound of a bass drum. This sound effect is not natural to the action, but it underlines the comedic aspect of the collision. In this same sequence, Clair adds the sounds of a sports arena as the characters run with and toss the jacket like a football. This unrealistic addition on the soundtrack presents the scene in a new way, prompting the viewer to interpret the depicted action differently.
Sound bridge. A sound bridge is a transitional device typically used to link two scenes together. In a sound bridge, the audio from the following scene is briefly included at the conclusion of the preceding one, or vice versa. This is another use of nonsimultaneous sound, since the image track and soundtrack do not correspond. We hear something before or after we see it. In The Lost World: Jurassic Park (US, 1997), director Steven Spielberg makes creative use of a sound bridge. The opening scene of the film depicts a young girl being attacked by small dinosaurs when her family stops on an island shore to eat lunch. The girl’s mother races toward her daughter, and seeing what has happened, screams in horror. Spielberg holds over the sound of the scream across an edit to a shot of Ian Malcolm (Jeff Goldblum) yawning in front of a subway advertising featuring a tropical scene. The sound bridge overlays the sound of the scream with the image of Ian yawning, in what plays as a morbid joke about Ian’s comic indifference to the horrific attack.
A BRIEF HISTORY OF FILM MUSIC
An important aspect of film sound is the use of music to accompany the images. Music serves various functions in a narrative film. It can help to establish the historical setting through the use of contemporaneous music (i.e. disco in the 1970s, jazz in the 1920s). It can comment on the story action – for example, where the lyrics of a song are relevant to the circumstances of a character. Finally, it creates mood or atmosphere. Music acts as a resonator of the emotional subtext of a scene, amplifying the underlying feeling of the narrative. Like film sound in general, music greatly affects how we perceive story action. Directors, in collaboration with composers or with their music supervisors (who oversee the selection of song choices for a film), can choose musical accompaniment that “matches” the story action, or they can pick something contrary to that action, as a form of counterpoint. For example, when Quentin Tarantino presents the grisly action of a criminal gangster cutting off another man’s ear in Reservoir Dogs (1992), he sets the scene to the early 1970s song “Stuck in the Middle with You,” by Stealers Wheel. The jaunty pop hit contrasts sharply with the violence of the scene, and it underlines the gangster’s casual indifference to brutality – and even his enjoyment of it.
Musical accompaniment encompasses both a film’s score, original music composed for the film involving specific choices of orchestration and instrumentation, and its soundtrack, which refers to its use of pre-existing music or songs. The remainder of this section roughly outlines the major historical tendencies in the musical accompaniment of film. It examines these tendencies within three periods of film history: the silent era (1895-1927), the classical era (1927-1960), and the post-classical era (1960-present).[1]
Contrary to its name, silent cinema was never silent. Synchronized sound – and therefore, spoken dialogue – did not reach adoption at an industrial scale until the late 1920s, but there were technological systems for film sound in place and deployed at a limited scale since the medium’s beginnings. Still, we have inherited the false assumption that silent film had little means of incorporating sonic elements, including music and sound effects. This was not the case. While movie theaters were not wired for sound, film exhibition did provide for ways to create a rudimentary soundtrack. Live musical accompaniment was a common feature of film exhibition, with musicians typically positioned near to the screen to allow them to see what was being projected. At smaller theaters, like the nickelodeons of the 1900s, this might mean only a piano or drums, but the larger “picture palaces” that arose in the 1910s might feature an orchestra. There was no written score, and therefore live accompaniment would often rely on the improvisational skills of the musicians. The goal of the performers was to capture the tempo of the story action in musical form, so the accompaniment would reinforce the tone of each scene and provide its overall rhythmic structure. In addition to live music, early cinema borrowed traditions from theater to create additional sound effects. Film exhibitors could reliably reproduce a catalog of familiar sounds: trains, gunshots, horse hooves, chimes, and various types of weather (rain, wind, thunder, etc.).
The arrival of industrial-scale sound film in 1927 fundamentally changed the relation between image and sound. Now there would be recorded dialogue to consider, and indeed, in early sound filmmaking, non-diegetic music was concentrated in non-dialogue sequences to avoid interfering with the spoken words. The studio era of classical Hollywood was distinguished by its ability to mass-produce films quickly and efficiently without sacrificing artistic quality. This was enabled in part by a division of labor in studio personnel, where each craftsperson had specific tasks in which they specialized. This applied to early music composition. Initially, composers were grouped into different genres of music (say, those specializing in romantic scenes or comedic scenes), and therefore several composers might work on the same film, depending on the demands of the story. As sound film developed, however, this group method gave way to single-author composition, where one composer was responsible for a film’s score. With this came the idea that a score should be a coherent whole, with identifiable themes or leitmotifs tied to specific characters or sequences. Musical composition in classical Hollywood was defined by underscoring, where a musical score accompanies, without interference, a dialogue scene. Generally speaking, the use of pre-existing music or popular songs was avoided so that the unity of the underscore was not disrupted.
Another shift in film music is observable following the break-up of the studio system, which accelerated in the 1960s. One noticeable change involved the commercialization of the film score. During this period, film studios were purchased by conglomerates with no relation to the film industry, such as Gulf + Western’s purchase of Paramount Pictures in 1966. This signaled a transition from a studio era characterized by vertical integration – where a studio owned the means of production, distribution, and exhibition – to a post-studio era characterized by horizontal integration – where one corporation owns companies across different industrial sectors. Why is this important? Because in the post-classical period, studios moved the recording of film scores in-house and exploited film music as a promotional strategy and revenue source. Film composers of the time, such as Henry Mancini, reimagined the score along the lines of the music album, where individual themes were treated as “tracks” on a record. A film’s soundtrack frequently became organized around an identifiable theme – think Star Wars or Indiana Jones – or a title song – think “Raindrops Keep Fallin’ on my Head” from Butch Cassidy and the Sundance Kid (1969) or “Moon River” from Breakfast at Tiffany’s (1961). Furthermore, the instrumentation of the film score incorporated trends from and the musical styling of popular music. A film’s soundtrack therefore could function as an integral part of a film’s marketing.
In addition to the commercialization of the film score, filmmaking of the post-studio era regularly incorporated pre-existing music, typically pop songs, into the soundtrack. Sometimes referred to as the “Easy Rider approach,” after the counterculture biker movie starring Peter Fonda, films sought to target the emerging youth market by including rock ‘n’ roll, folk, funk, and R&B music as part of a film’s overall sound design. This was a commonly used strategy of the New Hollywood films of the late 1960s and early 1970s, such as The Graduate (Mike Nichols, 1967), The Last Picture Show (Peter Bogdanovich, 1971), and American Graffiti (George Lucas, 1973). A soundtrack comprised of pre-existing songs is known as a compilation score. For example, The Graduate’s soundtrack is entirely composed of songs by Simon & Garfunkel, whereas American Graffiti populated its soundtrack with 1950s-era hits to match the setting of the film. Whereas in classical Hollywood the inclusion of a popular song would be routinely tied to a diegetic performance, in the New Hollywood era, pre-existing songs accompanied scenes without the need for any diegetic anchor.
The inclusion of a pre-existing song presents formal difficulties for a film director in terms of its synchronization with the images. With an instrumental score, the composer can account for the rhythms of a scene, adjusting the timing of the score to the timing of the sequence. By contrast, a pre-existing song enters a film with a predetermined unity, often familiar to audiences, so a filmmaker has to take consideration with how much or which parts of a song to include. In the 1980s, American cinema took its cue from a new cable channel called MTV by incorporating music video aesthetics into film form. In these instances, film borrows the fast-paced style and disjunctive editing of music videos. A popular song tends to interrupt the narrative with a montage sequence, edited specifically to the music. Scholar Justin Wyatt calls these montage sequences modules. Popular examples from the period would include the final audition scene of Flashdance (Adrian Lyne, 1983), set to “What A Feeling” (composed by Giorgio Moroder, performed by Irene Cara), and the warehouse dance scene in Footloose (Herbert Ross, 1984), set to “Never” (performed by Moving Pictures).
Given the history outlined above, it should be clear that the relation between film images and music is more complicated than “setting a mood.” From underscoring to modules, the ways that music has been incorporated into film has varied historically and reflects how the film and music industries have co-evolved in relation to each other.
[1] Much of the history provided below is based on Julia Bess Hubbert’s Celluloid Symphonies: texts and contexts in film music history (Berkeley: University of California Press, 2011). The author also consulted Jeff Smith, The Sounds of Commerce: Marketing Popular Film Music (New York: Columbia University Press, 1998).