Technologies for Presentation of Surround-Sound in Headphones.

(HeadWize Technical Series Paper)


The distorted spatial reproduction in headphones is particularly objectionable with playback of surround-sound recordings. Standard headphones can at least convey a sense of the left-right staging in stereo recordings, but the surround channels have nowhere to go in stereo phones. While the debate continues as to whether music composers may legitimately use space as a compositional tool, film soundtrack editors are busy plotting the sonic trajectories of special effects explosions across a theater, dopplerizing the blare of police sirens and giving the voices of off-screen actors and other sounds a spatial presence.

One option for improved headphone surround is to use 4-channel headphones. The latest research suggests small modifications to the traditional design of 4-channel phones can bring dramatic improvements in spatial resolution. Another headphone surround option is surround-sound decoders (also called virtualizers) that feature auralization processing to simulate a virtual surround loudspeakers. These are becoming very popular headphone accessories.

All of these advances have been a long time in coming. This paper traces the development of 4-channel headphones and acoustic simulators through patent disclosures and technical papers. Only a few of the important pioneers in this field are included due to space limitations, and digital signal processing (DSP) technologies are discussed only briefly. The following are excellent introductions to the topic of psychoacoustics: The Elements of Musical PerceptionA 3-D Audio Primerand The Psychoacoustics of Headphone Listening.


Interest in generating realistic sound fields in headphones goes back many decades. In the 1950s, the Journal of the Audio Engineering Society published several papers on binaural reproduction. In the 1960s, audio research was moving in the direction of electronic processing for stereo recordings played back over headphones. Benjamin Bauer, a pioneer in the field, authored several papers on the topic of crossfeed processing for improved headphone listening (today’s commerical crossfeed processors rely heavily on Bauer’s findings). By the 1990s, enough data had been gathered about psychoacoustics (along with advances in digital signal processing) that true auralization for headphones was now feasible.

Quadraphonic Decoder

In the meantime, recording formats were also undergoing their own spatial revolution. Walt Disney’s Fantasia (1951) was the first multi-channel motion picture release. It wasn’t until the 1970s that multi-channel sound made it into consumer audio with the arrival of quadraphonics (“quad”). Quad recordings encoded 4 channels (2 front and 2 rear) of sound into 2 channels, which were compatible with regular stereo systems. Consumers with the correct matrix decoders (there were 3 competing quad formats) could hear all 4 channels, played back through 4 loudspeakers to recreate a 3-D sound field.

Four-channel headphones hit the market shortly thereafter. These phones tried to mimic superficially a quad loudspeaker setup with 4 transducers, 2 per earcup. In general, they did a poor job of spatial imaging, because their design did not take into account the psychoacoustics of hearing. They sounded little better than listening to quad recordings with regular stereo headphones (the latest research suggests that 4-channel phones require additional signal processing to generate a realistic sound field).

For any number of reasons, the quad formats never caught on (although quad diehards continue to hope for a resurrection). In the 1980s, record companies were no longer producing any quad recordings. Multi-channel sound for the consumer was at a standstill for many years. A number of “surround” recording formats, such as ambisonics and holophonics, tried unsuccessfully (as of this writing, anyway) gain marketplace acceptance. While quadraphonics languished, Dolby Surround stormed the film industry and crept into consumer systems via home video theater.


In 1982, Dolby Laboratories introduced Dolby Surround, a technology for encoding standard theater loudspeaker configurations (left, right, center and surround) into the two channels using quad technologies and its own “steering” algorithms for directed imaging. This format evolved into the current “5.1” configuration (left, right, center, left surround, right surround – and a subwoofer) under a new name, Dolby Digital (AC3), which debuted in 1992. Dolby Surround and Dolby Digital have been fantastically successful, crossing over into other commercial venues such as broadcasting and music-only CDs and making “surround-sound” a household word.

As with quad recordings, recordings with multi-channel Dolby must be decoded to obtain the surround information. The industry standardization on the Dolby formats spurred development of decoders for home video centers. The availability of digitally-based auralization technology perfectly coincided with the surround-sound revolution. Some models of Dolby decoders (no larger than a remote control) began incorporating auralization for realistic playback of surround sound in headphones. Before these decoders were a reality, auralization processors were expensive and bulky computer workstations. The commerical success of the first generation of these headphone-based decoders has brought auralization hardware down to the level of consumer electronics.


The first quadraphonic headphone designs followed the “2-2” system and housed a total of four transducers. Each earcup had a two transducers, with the rear channel information going to the one closest to the back of the head. Possibly the transducers were angled slightly inward to aim at the ear drum. These phones could be adapted for regular stereo listening by driving both transducers with the same audio signal – usually accomplished with an adapter plug.

Although this basic design (which is still in use today) could sound more spacious than regular stereophones, it failed to image outside the head of the listener, even when fed with true quadraphonic material. Regular stereo headphones suffered from lack of interaural crossfeed and HRTF filtering, which are required for true 3-D hearing. Four-channel phones further compounded the problem with inadequate spacing between the front and rear channels.

Figure 1

In 1971, the United States Patent and Trademark Office (USPTO) granted patent number 3,796,840 to Kazuho Ohta for a 4-channel headphone design with a spatial expander circuit (figure 1). The construction of the headphone itself was nothing new, but Ohta claimed that the spatial expander circuit solved the problem of in-head localization. He discovered that if the rear channels were phase-reversed compared to the front channels, the listener would hear “a truly effective panoramic surrounding sound effect.”

Why this was the case, Ohta could not explain, but theorized that “the effect is due to the construction of the headphone itself according to which the sounds of the first and second channels and those of the third and fourth channels respectively reach the left ear and the right ear separately….” He seemed to be saying that the invention works because the phase shift effectively prevents rear channel information from blending into mono with the main signal, thus simulating physical separation between the front and rear channels. In fact, the circuit is an early example of sum-difference processing, which operates by recovering ambience information from the audio signals. (See the section on acoustic simulators for more information on sum-difference processing.)

The patent describes two versions of the spatial expander. In the first (figure 1a), the headphone has two DPDT switches, one at the center of each earcup, which connected the audio signal to the rear channel transducers. A listener can reverse the phase of the rear channel 180-degrees by simply flipping the switch. However, the 180-degree shift sometimes produces an “unnatural” surround effect. Further, if the front and rear channels happen to be identical in amplitude and phase, they would cancel out – particularly troublesome with low frequencies.

Figure 2

The second version of Ohta’s spatial expander (figure 1b) addresses these issues by providing a means to adjust the phase of rear channel information. The circuit consists of a transformer, capacitor and potentiometer connected to form a low pass filter. High frequencies are shifted 180-degrees, but low frequencies are not affected. At the crossover point, the signal shifts 90-degrees. The crossover frequency can be varied via the potentiometer – as predicted by the equation: f(c) = 1/(2*pi*RC). Ohta suggested that the best effect was obtained when the crossover point fell between 200Hz and 1kHz.

Figure 3

The Ohta surround effect is at best a vague approximation of a true 3-D sound field, because the implementation lacked sophistication and an understanding of how ambience recovery systems should be applied in headphone systems. Variations on Ohta’s idea surfaced for years, but then the quad format flickered out and quad phones became antiques. It was not until the 1990s that 4-channel headphone design saw a spark of life again.

In 1991, Florian König developed 2-channel and 4-channel headphones with improved spatial localization. In previous stereo and quadraphonic designs, transducers were usually placed directly facing the entrance of the ear. For stereo headphones, König discovered that if the transducers were positioned in the lower half of the earcup, approximately 30-degrees off from the center line, the soundwaves would take on some of the HRTF characteristics of normal hearing and could actually simulate in-front localization of headphone sound fields. König’s circumaural 4-channel design (shown in figure 3) places the rear channel transducer in the upper half of the earcup, towards the back of the head.

Figure 4

König noted that the imaging of his decentralized configuration sometimes degraded due to standing waves between the earcup, buffer board (on which the transducers are mounted), the ear pinna and the temple. Acoustic felt was not sufficiently effective in reducing these standing waves. At the same time, König began investigating the possibility of 4-channel sound in supra-aural phones – and particularly the lightweight “Walkman-style” headphones found on most portable stereos. As a result, he devised a new headphone system theory:

To image a nearly ideal point sound source, the headphone should have

  • an open speaker sound radiation in the lower part of the earcup, below the ear canal
  • acoustic felt (middle damping) beside the speaker and
  • a highly damped or maximally-closed upper part of the speaker above the ear canal.

The supra-aural version of König’s 4-channel phone (figure 4) implements these principles in the cramped space typical of supra-aural earcups. Both transducers are positioned close to the vertical axis of the earcup (less than 30-degrees off-center for the front transducer). The areas of high acoustic impedance in the diagram are implemented with acoustic felt, cut with a curve along the cutoff line to help minimize standing waves. König cautioned that the choice of transducer is important in this design. He recommended selecting transducers with a cumulative spectral delay time (CSDT) of about 1ms or less.

To simulate 3-D concert hall acoustics, König suggested the application of a small amount of reverberation and crossfeed processing with these time delays:

  • left-right delay: 5 to 10 ms.
  • front-rear delay: 30 to 50 ms.

Figure 5

König began research on a 6-channel configuration (figure 5), in which a third transducer is added to the earcup to reproduce “above” or vertical sounds. The positioning of the third transducer is again based on hearing directions, and all three transducers have modified radiation patterns from the application of specifically shaped acoustic felt. Auditory tests have shown very promising results when compared with THX cinema standards.


Acoustic simulation circuits (virtualizers) for headphones are based on either audio enhancement processing or head-related transfer functions (HRTFs). At their most basic, both types use some form of electronic crossfeed, but otherwise diverge considerably in the other details. Simulation based on HRTFs tries to create the actual sound field that a listener would hear from loudspeakers. Audio enchancement processing exploits psychoacoustic effects such as ambience recovery to simulate a realistic sound field in headphones. The sound image in audio enchancement simulation has been described as less well-defined than with HRTF simulation. On the other hand, HRTF simulators are more difficult to design and implement, and may require extensive personalization for different listeners.

Most audio enhancement processing relies on some form of ambience recovery, which restores realism by emphasizing the ambience information in audio signals. Ambience recovery is accomplished with sum-difference processing, which is fairly easy to implement in analog-based circuits. HRTF virtualizers can also be analog, but usually achieve the best performance with digital processing. Consequently, sum-difference systems are generally more cost effective and may be slightly more pervasive. With the cost of computing power declining, both types of virtualizers are successful and well-represented in the marketplace. This section follows the development of these two tracks of acoustic simulation technology, which have brought a revolution to headphone sound.

HRTF-Based Virtualizers

Figure 6

The beginnings of HRTF acoustic simulation circuits go back to the 1960s when Benjamin Bauer described a crossfeed network to improve headphone listening in his article “Stereophonic Earphones and Binaural Loudspeakers.” Based on the limited HRTF studies of the time, Bauer simulated the crosstalk that occurs in normal hearing by mixing a bandwidth-limited and time-delayed (low pass to 5kHz with 0.4 ms delay) portion of one audio channel into the opposite channel (figure 6). The circuit did not fully externalize the sound field, but did offer a sense of depth and mitigated the exaggerated perspective of headphone stereo. Despite that this article was written at the dawn of transistor audio, the principles described are still in use today in commerical crossfeed processors – though the current circuits are perhaps a bit more sophisticated.

Bauer’s circuit was a first step on the road to true electronic auralization processing. This simple crossfeed alone was not enough to fully externalize a headphone sound field (in fact, many listeners heard a sound field that almost resembled monaural). What Bauer’s circuit did not account for was the complex HRTF filtering that takes place when sound interacts with a listener’s head and outer ears (pinna). Bauer bandwidth-limited the crossfeed, but did not compensate for ear canal resonance and pinna effects. The interaural delay is the natural phase-shift from the filter network. Both the frequency shaping and time delay are difficult to control in passive circuits. Various adaptations of Bauer’s work followed. For example, in 1977, Nobumitsu Asahi received a patent (no. 4,136,260) for an “Out-of-Head Sound Reproduction System” that consisted of a crossfeed filter with a notch filter that approximated the 10kHz resonance dip of the ear canal, but not the rising characteristic of the curve before the dip.

Figure 7

In 1996 (these things do take time), the USPTO granted patent no. 5,751,817 to Douglas Brungart for a stereo acoustic simulator that paid homage Bauer’s analog design once more in the Digital Age (figure 7). By the 1990s, psychoacoustic science had advanced to the point where acoustic laboratories could measure a person’s HRTFs and then play back digitally-filtered sound in headphones to pinpoint localization. The “convolvers” that performed this HRTF shaping were basically personal computers fortified with such exotica as digital filter banks and headphones with tracking sensors. Based on the research of J.M. Loomis (“Active Localization of Virtual Sounds,” JAES, Oct. 1990), Brungart’s invention does its magic with far less. Like Asahi’s design, Brungart’s invention is a crossfeed processor with HRTF filtering, but this time including the rising characteristic of the ear canal resonance at 5kHz. Coming three decades after the introduction of crossfeed processing, it is a testament to the enduring quality of Bauer’s work.

Figure 8

In concept, the primary differences between Brungart’s and Bauer’s circuits are the Pinna-Related Filter at the input and the refined time delay, both constructed with opamps. The Pinna-Related Filter (figure 9) mimics the ear canal resonance curves measured off a KEMAR dummy head located 7 feet from, and facing 30-degrees left and right of, stereo loudspeakers (figure 8). The first stage is an infinite gain, multiple feedback path bandpass with a center frequency of 5kHz, a Q of 5 and an inverting maximum gain H of -1. The second stage is an inverting summer with gain. The filter’s response is a better approximation of an HRTF than the Asahi dip filter, but is still not optimal (figure 9a). The time delay circuit (figure 9b) is a low-pass, 4th-order Bessel filter with a constant group delay of 250 microseconds. These design goals were based on the findings of F.L. Wightman and D.J. Kistler that time delay below 2.5kHz dominates all other lateralization cues.

Figure 9

Figure 9a

Figure 9b

The accuracy of the HRTF filter is an important factor in how successfully a crossfeed processor will image in 3D. If the HRTF responses are measured with a dummy head, then listeners who have the same shape head as the dummy will hear the best sonic presentation. However, measuring HRTFs is usually done in acoustics laboratories with equipment that is not available to the average consumer, who, by the same token, would likely find the procedure to be too tedious and exacting to be bothered. In the past, companies that sold digital acoustic simulators would have customers make appointments for an HRTF “fitting.” The measurements were programmed into the simulator, which, like a custom suit, then took on the aural idiosyncracies the original owner – unless, of course, refitted.

Another issue arises when designing an simulator for surround sound: Brungart’s invention assumes two fixed sound sources (stereo loudspeakers), but surround sound images from 4 or 5 loudspeakers. Therefore, a good simulator may need to be capable of simulating multiple virtual loudspeakers as well as provide for individual customization of its HRTF filters. Digital convolution workstations with their complex programmable filter banks would have no trouble banging out any audio waveform on demand, but are commercially impractical as consumer products.

Figure 10

In 1993, the USPTO granted patent no. 5,371,799 to Danny Lowe for a “Stereo Headphone Sound Source Localization System.” Lowe’s invention detailed a means of generating locational cues in headphones without the need for the computing power of convolution workstations. He discovered that “by utilizing a transfer function corresponding to a location directly in front of a listener, that is, at 12 o’clock and then adjusting the amplitude and delay corresponding to the indirect sides of the head-related transfer function, it is possible to achieve all azimuths over a 180-degree span using a single head-related transfer function filter” (emphasis added).

Lowe’s invention divided a sound signal into 3 sections: direct wave, early reflections and tail reverberations. Given a direct wave, Lowe used digital signal processing technology to add the other two sections. A direct wave is processed with HRTF filter coefficients based on the desired azimuth of the virtual sound source to produce the early reflection and tail reverberation waveforms, which are then summed with the direct wave before being sent to headphones.

Figure 11

Lowe’s work paved the way for the first consumer surround-sound decoder for headphones (made by Virtual Listening Systems, Inc). On January 4, 1996, the USPTO granted patent no. 5,742,689 to T.J. Tucker and D.M. Green for a multi-channel signal processor for headphones. The processor places phantom loudspeakers anywhere in a virtual room and thus can recreate true surround sound in stereo headphones. Although the patent contains several innovations, among them is a means for customizing a virtualizer’s HRTF filters for different users without the need for an elaborate acoustics laboratory.

Tucker and Green proposed an auralization processor containing a database of HRTFs stored in ROM (figure 11). A listener would configure the virtualizer by ranking a series of HRTF-filtered test tones for best performance (the criteria being elevation and front/back localization). There are an infinite number of HRTFs, so Tucker and Green applied cluster analysis to group HRTFs into an HRTF binary tree structure. The customization process would then be a search for the “best match cluster” down the HRTF tree. Once the HRTF match was made, the virtualizer would virtualize each surround channel with the appropriate HRTF coefficients (following Lowe and other patented technologies).

Even though the Tucker and Green invention simplified HRTF configuration, consumers found the process to be somewhat involved. Further, many consumers were not able to find a suitable HRTF match, so that the decoded sound was not much better than non-processed stereo in headphones. Another approach to the HRTF problem argued that a single “universal” HRTF curve could be effective for all listeners, if the HRTF filter were sufficiently accurate in real-time operation. A more accurate HRTF filter could also improve the performance of selectable curve decoders. In the past, the digital filters in auralization processors (called finite impulse response filters or FIRs) were of two types: time domain filters and fast convolution filters. Time domain filters had short time delays, but also had short filter lengths. Convolution filters had longer filter lengths, but longer time delays. Neither type was optimal for real-time operation.

Figure 12

On March 26, 1996, the USPTO granted patent no. 5,502,747 to David S. McGrath for a new digital filter design with improved accuracy and short time delay. Instead of a single filter, the McGrath design had a series of component filters working in parallel, but functioning as a single filter (figure 12). The component filters were comprised of both time domain and convolution types (with different time delays) whose outputs were combined to create a single filter with a long impulse response. The resulting time delay of this ensemble was the shortest delay of the component filters. The McGrath filter was incorporated into, among other things, the surround headphone technology that became known as Dolby Headphone. Dolby Headphone can work well from a single universal HRTF setting.

Ambience Recovery and Audio Enhancement Virtualizers

While Dolby Headphone and other virtualization solutions based on digital HRTF processing could produce spectacular sound fields, they required digital processing for best performance and consequently were expensive until the cost of computing power fell. In the meantime, another analog-based virtualization technology, called ambience recovery, was also gaining popularity. Ambience recovery was a type of audio enhancement simulation that could be implemented effectively with analog sum-difference processors. It was based on an idea that went back to the time of Kazuho Ohta’s spatial expander circuit, which was for playback of quadraphonic recordings in 4-channel headphones. As the fortunes of 4-channel sound ebbed and waned, much of the research was directed at improving stereo sound with spatial expanders.

Despite its name, sum-difference processing is not entirely distinct from HRTF processing. Both systems have many concepts in common such as crossfeed and acoustic transfer functions. However, the primary use of sum-difference circuits in acoustic simulation is in ambience recovery. They first convert audio channels into pairs of sum (e.g., left + right = L+R) and difference signals (e.g., left – right = L-R), and then process these composite signals according to various acoustic transforms. The processed L+R and L-R signals can then be recombined into left and right stereo channels [for example, (L+R) + (L-R) = L and (L+R) – (L-R) = R].

The principle behind ambience recovery was that the stereo sound field could be widened and made more spacious by emphasizing the directional information (ambience, reverberation) that was unique to each channel. The directional information was found in the L-R signal, and so the spaciousness of the sound field could be increased by amplifying the L-R signal during processing. The difference signal also functioned as a form of crossfeed, except that the crossfeed was inverted in comparison to HRTF systems. In loudspeakers, the inverted crossfeed widened the stereo image by helping to cancel interchannel crosstalk. In headphones, such cancellation did not take place. Instead the sound field appeared more spacious because of the increased ambience content in each channel.

Figure 13

Figure 13 shows one of the earliest commercial examples of a spatial expander for headphones was designed by Jacob Turner (“Headphone with Cross Feeding Ambience Control,” US Patent No. 3,924,072 – July 10, 1974). Like the pioneering crossfeed circuit designed by Benjamin Bauer, Turner’s invention electronically mixed the interchannel crosstalk that occurs in loudspeaker listening. However, unlike Bauer’s circuit, the crossfeed here was full bandwidth, was phase-shifted 180 degrees and did not have a time delay. The resulting sound field had a reduced separation, but actually sounded more expansive due to the increased levels of ambience, which was controlled through the Ambience potentiometers. This circuit is featured in the Koss “Phase” brand headphones.

Many ambience-recovery simulators extol the lack of time delays as an advantage over HRTF-based systems, because including time delays can have undesired side-effects, such as comb filter distortion in center stage of the stereo image, where much of the important musical content resides. Nevertheless, time delays are locational cues that aid in accurate spatial perception, and there are sum-difference processors that use time delays. Interestingly, the technique of sum-difference processing allows for the introduction of time delays without some of the problems associated with HRTF systems.

Figure 14

The circuit in figure 14, invented by Joel M. Cohen, disclosed one means of incorporating time delays in sum-difference processing (“Stereo Image Separation and Perimeter Enhancement,” US Patent No. 4,308,423 – December, 29, 1981). It worked by creating a L-R difference signal, delaying the difference signal with a serial analog delay (such as a bucket brigade), and then combining it with the left channel and its inverse with the right channel. The delayed difference signal could be frequency-contoured to mimic head-related shading before summing. The benefit of this technique was that the monaural information was not delayed, so that the center of the sound field did not suffer comb effect distortion.

In headphones, the crossfeed of the Cohen circuit could function to even out the sound field. In contrast to non-delayed sum-difference systems, when the delayed difference signal in Cohen’s circuit was mixed with the summed signal, there might be a stronger ambience effect due to the delay affecting both left and right channel directional components. That is, when summing takes place, each channel received not only the delayed, inverted crossfeed, but also delayed directional information from that channel. The sound field gained spaciousness possibly at the expense of focus. In both headphones and loudspeakers, the width or spaciousness of the sound field is dependent on how much of the L-R signal is summed into the left and right channels.

None of the sum-difference circuits described so far are concerned with virtualization or perception correction in headphone listening. An HRTF-based virtualizer makes use of time delays and head-related frequency shading of the crossfeed to generate a realistic sound field in headphones. Since the ambience recovery in sum-difference processing does not require time delays, there has been considerable interest in sum-difference virtualizers that avoid time delays. Makoto Iwahara first demonstrated that it was possible to generate a sound field with the same transfer function as that of an HRTF simulator, but without the delay elements. (“Audio Signal Translation with No Delay Elements,” US Patent No. 4,349,698 – September 14, 1982).

Figure 15

In the simplified diagram of an HRTF virtualizer above (figure 15), the crossfeed undergoes a transfer function B/A, where A is the near acoustic path from a speaker to the listener’s corresponding ear, and B is the far (crosstalk) acoustic path from a speaker to the listener’s opposite ear. The crossfeed transfer function B/A consists of both a low-pass filter to mimic the diffraction effect of the listener’s head and a time delay to account for the longer far acoustic path B. HP is the headphone transfer function, which describes how sound is modified going from the headphone transducer to the eardrum.

Figure 16

Figure 17

The output of the HRTF virtualizer in figure 15 is represented by the matrix equation shown in figure 16a. Iwahara rearranged that equation into equations representing the same transfer functions but using sum and difference inputs and outputs: Li + Ri, Li – Ri and Lo + Ro, Lo – Ro. The resulting equations in figure 16b suggested that the transfer function B/A could be duplicated by multiplying the input sum and difference signals by factors 1+B/A and 1-B/A, so that the time delay component is effectively eliminated. Figure 17 shows graphs of these multiplication factors which Iwahara labeled Fsa and Fma. These curves were simple enough for Iwahara to implement with minimum-phase opamp filters.

Figure 18

The diagram in figure 18 shows one way that Iwahara designed a sum-difference virtualizer using the 1+B/A and 1-B/A filters. The circuit has the same output characteristics of the HRTF simulator in figure 15, but with a sum-difference topology and employing the filter responses delineated in the Fsa and Fma curves. There was a small amount of phase shift in the output of the circuit in the low frequencies, so Iwahara included a phase shifter in the difference signal stage to correct for this effect.

One objection to sum-difference processors has been the sometimes “tinny,” “cave-like” and “harsh” sound. The reason for this quality is that the difference signal contains mostly midrange frequencies, and human hearing has greater sensitivity in the 1-4kHz range. In standard sum-difference processing, the greater the amount of ambience recovery, the more the midrange is emphasized to the point of sounding harsh. Further, any phase variations in the 1-2kHz range can blur the sound image, because these frequencies have wavelengths that are about the same as the distance between a listener’s ears.

Figure 19

In response to these criticisms, Arnold Klayman discovered that selective boosting of difference signal frequencies could mitigate some of these negative qualities. He devised both a stereo enhancement and perspective correction systems based on this principle (“Stereo Enhancement System,” US Patent No. 4,748,669, May 31, 1988). The stereo enhancement circuit (figure 19) worked by producing sum and difference signals, selectively altering the relative amplitudes of the difference signal frequencies and the relative amplitudes of the sum signal frequencies, and combining the processed sum and differences signals with the original left and right signals.

The selective boosting of difference signal frequencies provided for a wider sound field, and improved perception of ambient reflection and reverberant fields, which were no longer masked by direct sounds. The image was more stable, because the frequency components of increased phase sensitivity were not inappropriately boosted. A spectrum analyzer controlled the equalizers that boosted the quieter difference signal frequency components and the relative amplitudes of sum signal frequency components in proportion to corresponding difference signal frequency components. A fixed equalizer de-emphasized frequencies with wavelengths comparable to distance between listener’s ears. To avoid overemphasizing artificial reverberation, the difference signal boost was reduced, if the system detected indications of artificial reverb in the sum and/or difference signals.

Figure 20

Of particular interest to headphone listeners is the perspective correction system (figure 20). It modified the sum and difference signals to compensate for localization distortion in loudspeaker or headphone listening. In headphones, it created a front-localized sound image in the side-mounted transducers. When used with loudspeakers, it restored the side-localization of sounds for what is essentially 2-speaker surround sound.

The Perspective Correction circuit applied fixed equalization to the sum and difference signals to conform to the directional frequency response of the human ear. A different equalization curve was used for headphones and speakers. For headphones, the sum signal is equalized to restore the front-localized sounds to the appropriate levels, as they would have been perceived from front-mounted loudspeakers. For 2-speaker surround, the circuit switches in the difference signal equalizer, which is bypassed when listening with headphones. The fixed equalizers divided the audio spectrum into a series of bands about 1/3 octave wide.

The equalization parameters for headphones are as follows:

Center Frequency (Equalization)

500Hz (-5.0dB)
1kHz (-7.5dB)
8kHz (-15dB)

Figure 21

The headphone perspective equalization curve is shown in figure 22c and is derived by subtracting the front frequency response and the side frequency response of the human ear (curve 21a – curve 21b). (The curve for 2-speaker surround is simply the side response – the front response: curve 21b – curve 21a). Note that the EQ for headphones is not the same as for side-mounted speakers, because headphone coupling to the ear influences the sound reaching the eardrum. Also, the equalization may be modified depending on the type of headphone and specific headphone characteristics.

Klayman’s perspective correction system is easily adapted to include modern surround sound formats. To create an equalization curve for rear channels, one need only plot a frequency response for rear sound and subtract from that the curve for side sounds. What remains is to apply sum-difference principles to enhance the reproduction of the surround sound itself.

Figure 22

In 1999, the USPTO granted patent no. 5,970,152 to Arnold Klayman for an Audio Enhancement System for use in a Surround Sound Environment (figure 22). Consistent with sum-difference simulation philosophy to broaden the spatial image by diffusing localization cues, this simulator blended the surround channels with a series of spatial expanders to eliminate the perception of the surround speakers as point sources. With a standard 5.1 surround format, the Klayman circuit outputed a set of 4 ambience-enhanced audio signals for the front left, front right, rear left and rear right speakers. Each audio output was a function of at least three of the original audio source signals. The resulting surround image “envelopes or immerses” the listener as though the sound field were projected from an array of loudspeakers.


12/10/98: Added discussion re patent no. 5,502,747.

8/2/99: Added figure 12.

12/7/99: Added section on sum-difference processing.

Asahi, Nobumitsu, “Out-Of-Head Localized Sound Reproduction System For Headphone,” US Patent # 4,136,260, May, 17, 1977.
Bauer, Benjamin, “Stereophonic Earphones and Binaural Loudspeakers,” JAES, April 1961.
Brungart, Douglas, “Simplified Analog Virtual Externalization For Stereophonic Audio,” US Patent # 5,751,817, Dec. 30, 1996.
Cohen, Joel M., “Stereo Image Separation and Perimeter Enhancement,” US Patent # 4,308,423, December, 29, 1981.
Hull, Joseph, “Surround Sound: Past, Present and Future,” Dolby Laboratories, 1998.
Iwahara, Makoto, “Audio Signal Translation with No Delay Elements,” US Patent #. 4,349,698, September 14, 1982.
Klayman, Arnold, “Stereo Enhancement System,” US Patent No. 4,748,669, May 31, 1988.
Klayman, Arnold, “Audio Enhancement System for Use in a Surround Sound Environment,” US Patent No. 5,970,152, October 19, 1999.
König, Florian, “A New Supra-Aural Dynamic Headphone System for In-Front Localization and Surround Reproduction of Sound,” AES Preprint 4495(M8), 1997.
Lowe, Danny, et al., “Stereo Headphone Sound Source Localization System,” US Patent # 5,371,799, June 1, 1993.
McGrath, David S., “Method and Apparatus for Filtering an Electronic Environment with Improved Accuracy and Efficiency and Short Flow-Through Delay,” US Patent # 5,502,747, March 26, 1996.
Ohta, Kazuho, “Four-Channel Headphone,” US Patent # 3,796,840, Dec. 2, 1971.
Tucker, Timothy and Green, David, “Method and Device For Processing A Multichannel Signal For Use With A Headphone,” US Patent # 5,742,689, Jan. 4, 1996.
Turner, John, “Headphone with Cross Feeding Ambience Control,” US Patent # 3,924,072, July 10, 1974.

c. 1998, 1999, Chu Moy.


3 thoughts on “Technologies for Presentation of Surround-Sound in Headphones.”

  1. […] In-front localization (IFL) headphones, such as the Vivanco SR2000 IFL and Ultrasone HFI-50, appear to image the soundfield in front of the listener’s head. In regular stereo headphones, the transducers are positioned to radiate directly into the listener’s ears. The transducers in the Vivanco SR2000 IFL headphones are mounted off-center and in front of the ears to simulate the acoustic path followed by soundwaves in normal hearing. (The AKG K1000 can project in-front imaging by allowing the listener to adjust the angle of the transducers away from the ears.) While the image has a distinctly forward characteristic, IFL headphones do not fully externalize the soundfield without additional electronic processing. For more information about IFL headphones, see Technologies for Surround Sound Presentation in Headphones. […]


  2. […] The distorted perspective of headphones can be mitigated by first processing the mix through an acoustic simulator such as a crossfeed filter. Where crossfeed processing is not sufficient, an auralization processor (virtualizer) applies more complex processing to achieve true 3-D spatialization. Virtualizers were once implemented with expensive computers and software, but are now available in consumer audio gear. They can be added as an outboard to an existing monitoring system. Acoustic simulators are sold separate devices or as components of headphone amplifiers, of surround sound decoders and even as accessories with headphones. Many PC sound cards feature 3D sound outputs for headphones. Be careful to distinguish between acoustic simulation for for headphones and for loudspeakers (acoustic simulation for loudspeakers generates surround sound from stereo loudspeakers). Some in-ear monitors, such as AKG’s IVM1, have a built-in virtualizer. For more information about acoustic simulators, see A Quick Guide To Headphone Accessories, An Acoustic Simulator For Headphone Amplifiers and Technologies for Surround Sound Presentation in Headphones . […]


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.