The Elements of Musical Perception.

(HeadWize Technical Series Paper)


Although an understanding of acoustics and psychoacoustics is not mandatory for enjoying headphones, it is useful knowledge especially when evaluating headphone sound quality or headphone acoustic simulators. Headphones do not sound the same as loudspeakers, and judging headphones based on a loudspeaker reference will more often result in disappointment. Also, the headphone accessories market now offers several spatial processor products, which operate on different acoustic principles. It is likely that listeners will have different preferences when deciding to purchase these types of signal processors. An understanding of acoustics and psychoacoustics can make the selection process less frustrating by grounding the listener’s expectations in scientific fact.

This article is meant as an introduction – just a few simple mathematical formulas – and includes real-world examples of the operation of acoustic principles that are relevant to audiophiles. The section on spatial hearing is deliberately compact to avoid duplicating the discussions in other articles on the HeadWize site. For more information about 3D hearing, see A 3D Audio Primer and The Psychoacoustics of Headphone Listening. For more information about headphone technologies and headphone accessories, see A Quick Guide To Headphones and A Quick Guide To Headphone Accessories. For more information about evaluating headphones, see Judging Headphones For Accuracy.


Simple Harmonic Motion


Sound waves are longitudinal waves (as opposed to transverse waves such as light waves) in that they oscillate in the same direction as their propagation. They move through the air as a series of compressions and expansions (also called rarefactions) of air molecules. The air molecules only vibrate up and down but do not move with the wave. A pulse traveling along a stretched out Slinky toy is an example of a longitudinal wave. Longitudinal waves can be represented using the familiar notation for transverse waves. Both longitudinal or transverse waves follow basic wave principles.


A simple wave is characterized by amplitude (its displacement from the center) and frequency (measured in Hertz, the number of oscillations per second). A simple sine wave completes one full oscillation in period T (secs) = 1/frequency (Hz) and travels 360 degrees. A simple sine wave is also called a pure tone.

Huygen’s Principle


Diffraction effects play an important role in acoustics, and are best understood when sound waves are viewed as Huygen wavelets. A freestanding sound source transmits sound in all directions so that the wavefront is actually a sphere. The physicist Christian Huygen saw all waves as being made up of an infinite number of tiny circular (in 2D) or spherical (in 3D) waves. Then, a complete wave pattern was merely the sum of these wavelets.



A consequence of Huygen’s Principle is that waves can bend around edges. If a wave hits a wall with an opening, a Huygen wavelet comes out the other side. Sound waves bend more than light waves (so one can hear but not see around corners), and low frequencies bend more than high frequencies (so diffracted sounds have a “muffled” quality). For example, loudspeaker drivers are mounted in a closed box (or one with a specially tuned vent) to prevent the out-of-phase rear waves from diffracting to the front and canceling out the main output of the speaker. Tweeters may have an “anti-diffraction” ring so that the dispersion of high frequencies is not affected by edges and seams on the speaker box.



When a sound wave hits a surface at an angle (the incidence angle), it bounces off that surface at the same angle (the reflection angle). Reverberation is a result of waves reflecting off walls and objects in an acoustic space and is one of the spatial cues used by the human brain for 3D hearing. When the surface is uneven, reflection analysis deconstructs it into a series of smaller flat surfaces, which are later summed. The diagram above is a simplification of reflection analysis using Huygen wavelets.

Inverse Square Law and Absorption

Sound waves radiate outward in a spherical shape. The further a listener is from the sound source, the weaker the loudness of the sound. The area of a sphere is calculated as A = 4pr2, where r is the radius of the sphere. The intensity of a sound wave decreases in proportion to the inverse square of the distance from the source.

The intensity of sound is also affected by the absorption characteristics of the air and of reflective materials. The degree of absorption is called the absorption coefficient. Some materials have similar absorption across all audio frequencies, but others are better at absorbing a particular band of frequencies. For example, frequencies below 1 kHz will travel much farther in air than those above 1 kHz. Thus, when the sound source is at a distance, a listener will hear a muffled quality to the sound because of the inverse square law and the absorption of high frequencies. Since headphones are worn so close to the eardrums, headphone sound will have more high frequency components than sound from loudspeakers.

Absorption and the inverse square law also affect the reverberation time of an acoustic space. Reverberation time measures how long it takes sound to decay by a factor of one million and is an important characteristic of concert halls. The best acoustic spaces for listening to music have a smooth rate of decay (as opposed to a rough decay rate where the sound keeps changing volume). The better concert halls have reverberation times of around 2 seconds.

Doppler Effect


If a sound source is in motion, a stationary listener will hear a change in pitch as the source approaches and leaves. (Light exhibits a Doppler effect that astronomers use to calculate the speed of stars traveling through space.) If the sound source is moving towards the listener, the sound waves tend to bunch together and the perceived pitch is higher than the actual sound. If the source is approaching at the speed of sound, the listener hears a sonic boom because all of the sound waves arrive at the same time. If the source is moving away from the listener, the wavelengths are stretched out, so the perceived pitch is lower than the actual sound.



When two or more waves travel in the same direction or cross each others’ paths, they remain distinct. Thus, the instruments in an orchestra or band or the voices in concurrent conversations are distinguishable even though they are playing or speaking at the same time. However, at a molecular level, waves that move in the same medium add together. They displace a total amount of air that is equal to the sum of their individual displacements. Music is heard as complex waves, so any analysis of musical perception begins with the Superposition Principle.



Although waves exist independently of one another, in the special case of similar waves combining, the result can be either constructive or destructive interference, depending on whether the waves are in phase or out of phase. Phase is a comparison of how closely two waves are in sync and is measured in degrees. Constructive interference will enhance sound (for example, make it louder). Destructive interference will weaken sound. If two identical waves are 180 degrees out of phase, they will cancel out. Whether the interference is constructive or destructive, the superposition principle requires that the individual waves continue to exist separately. The interference itself is merely the effect of the waves together at one point in space.



When two waves of slightly different frequencies combine, they produce a wobbling sound called beats. Beats have two characteristics: the beat frequency (how often the sound changes volume) and the tone frequency, which is the tone that the listener hears. The beat frequency fb = f2 – f1, where f2 > f1. The tone frequency ft = (f1 + f2)/2.

Standing Waves


When two waves collide that are identical in frequency and amplitude and traveling in opposite directions, they can create a standing wave. Unlike traveling waves, standing waves appear to vibrate in place. That is, the wave peaks alternate from positive to negative in place but do not move forwards or backwards, and each peak terminates with a point of zero displacement on both sides. The peaks are called antinodes and the points of zero displacement are called nodes.


One form of standing wave is resonance. Normally, if an object is excited to vibration, the vibration will fade away due to dampening. However, all objects have a preferred vibration frequency called the resonance frequency, at which vibrations are reinforced as standing waves within the object. If not excited continuously, an object vibrating at resonance will eventually calm down, but over a longer period of time than it would take at any other frequency. Resonance is a component of the sound of musical instruments, but is the bane of listening environments, which should not emphasize any one frequency or set of frequencies over others. Loudspeakers and headphone are dampened to reduce or eliminate the effects of system resonances on sound reproduction.


Harmonics and Overtones

A harmonic or overtone series consists of a fundamental frequency and sucessive frequencies that are integer multiples of the fundamental. If f is a fundamental, then its harmonic series would be f, 2f, 3f, 4f, 5f…. More than a mathematical curiosity, harmonics are central to musical perception. Jean Baptiste Fourier discovered that ANY waveform can be represented by summing a series of sine waves of different amplitudes and phases. For example, a square wave can be constructed from the sum of a fundamental frequency and its odd harmonic series.


The sounds of musical instruments and voices are filled with harmonic content (see section on Timbre). When audio amplifiers overload or clip, they generate harmonics. Thus, even if clipping takes place at a low frequency, an amplifier can output enough high frequency harmonics to damage tweeters. When bipolar transistor amplifiers overload, they produce more odd-order harmonics. When tube and MOSFET amplifiers overload, they produce more even-order harmonics. In the great Tube vs. Transistor debate, practitioners of the art of glass audio often cite this difference as one of the main reasons for the superiority of tube sound (however, not by a long shot is everyone convinced that tubes sound better).

Complex Waves (Timbre)

A complex wave is the sum of two or more harmonics. The human ear hears the timbre of a sound from a musical instrument based on the fundamental note (pitch) and the amplitudes and phase characteristics of the harmonics present in the sound. In addition, the tone quality of an instrument is affected by attack and decay transients. Attack transients occur when an instrument starts playing a note (for example, the striking of a piano note). Decay transients are the sounds of a note fading away. If these transients are removed (say, edited out in a recording), the sound of the remaining steady note loses its distinctiveness.

Loudness Perception


The Fletcher-Munson curves (above) measure loudness perception in human hearing at various sound pressure levels (deciBels or dB). With 1kHz as the reference point, hearing tends to be “flat” in the middle frequencies, but requires higher SPLs at the low and high frequencies to sound as loud as the reference. Thus, each curve marks SPLs of equal perceived loudness over frequency. At low listening levels, bass perception suffers dramatically, and the perception of the timbres of vocals and musical instruments changes. Quality tone controls or equalizers can help restore a satisfying tonal balance to music when listening at safe volume levels. For more information about hearing conservation, see Preventing Hearing Damage When Listening With Headphones.

The Missing Fundamental and Fundamental Tracking

If two or more notes played together are successive harmonics in a harmonic series, then the human ear will hear a third note: the fundamental frequency of the series. This effect is called the Missing Fundamental. If pairs of notes played in sequence have a frequency ratio of 3 to 2 and have different fundamental frequencies, the human ear will construct a fundamental frequency for each note. This phenomenon is called Fundamental Tracking. The headphones on most portable stereos exploit both of these principles to simulate an extended low frequency response. For more information on evaluating headphone sound quality, see Judging Headphones For Accuracy.


Binaural Beats

If two soft notes that are close in frequency are played separately in each ear (with no physical mixing either by aural bleed or bone conduction), the listener will hear Binaural Beats, which result when the brain mixes the sounds. Binaural Beats are unlike regular beats that derive from the mixing of sounds in the air and are one illustration of why headphones sound different from loudspeakers.

Spatial Cues for 3D Hearing (ILDs, ITDs and HRTFs)


There are three types of spatial hearing cues: interaural time differences (ITDs), interaural level differences (ILDs), and head-related transfer functions (HRTFs). ITDs refer to the difference in time for a sound to reach both ears. ILDs describe the amplitude differences in the frequency spectrum of sound as heard in both ears. HRTFs are a collection of spatial cues for a particular listener, including ITDs, ILDs and also taking into account the effect of the listener’s head, outer ears and torso the perceived sound.

Low frequency spatial cues are different from those for high frequencies. A listener’s head, body and ears acoustically contour sounds depending on the location of the source. With high frequencies, differences in the amplitude spectrums between the ears (ILDs) aid in placing the source. However, low frequencies tend to diffract around the head. Instead, the human brain factors the delay time or phase difference (ITDs) between ears to determine the location of low frequency sound souces. For example, if both ears hear a low frequency sound simultaneously, then the source is either directly in front or in back of the listener. If there is a delay, then the source will appear closer to the ear that hears it first. Time delays are also significant in high frequency localization.

ILDs and ITDs alone generally are not adequate for the human brain to resolve 3D sound. Head-related transfer functions (HRTFs) include ITDs and ILDs, but study them at a more personalized level. ITD and ILD measurements have generally assumed a spherical, disembodied head model. HRTFs factor in the effects of a listener’s outer ears (pinna), head and torso on perceived sound. In addition to the frequency-contouring of HRTFs, head movement can help the brain to localize sound. HRTFs are different for every listener. Headphones do not image realistically because they isolate the sound reproduction to the ears without the benefit of HRTFs to create spatial cues.

The Precedence Effect

The Precedence Effect localizes sound based on the first wave that reaches the ear, regardless of the loudness of any later arriving waves. Therefore, if several speakers are playing the same music, it will appear to come from the speaker closest to the listener, even if more distant speakers sound louder. If the sound source is a pure tone in a reverberant room, a listener who enters the room and does not hear the start of the sound will find it very difficult to localize (it seems to be coming from all directions).

For more information on 3D hearing, see 3D Audio Primer and The Psycoacoustics of Headphone Listening.


Benade, Arthur H., Fundamentals of Musical Acoustics (1990).
Berg, Richard and Stork, David, The Physics of Sound (1982).
Campbell, Murray, The Musicians Guide To Acoustics (1987).
Hall, Donald, Musical Acoustics (1991).
Hartmann, William M., “How We Localize Sound,” Physics Today, November 1999.
MacPherson, Ewan, “A Computer Model of Binaural Localization for Stereo Imaging Measurement,” JAES, September 1991.
Roederer, Juan, Introduction to the Physics and Psychophysics of Music (1975).
Sokol, Mike, The Great Amplifier Debate: Tube vs. Transistor, Free Spirit (1993).

c. 1998, 2000 Chu Moy.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.