Spatial Sound – An Overview.

by Kimmo Vennonen

It is easily confirmed that we hear sound in three dimensions and the perception of the spatial aspects of sound has been essential to our survival. For instance, ascertaining the location of a charging elephant, or crossing a busy city street both rely on this ability, called localisation. The ear-body-brain combination correctly decodes a handful of simultaneous and possibly conflicting spatial cues, often to within a few degrees of precision and in a fraction of a second.

The perception of music has always been an experience rich in spatial detail, but very much taken for granted due to its everyday nature. It is now acknowledged that the acoustic spaces music was performed in had a profound influence on the particular style. For example it is hard to imagine Gregorian Chant in an outdoors setting, or Balinese gamelan music in a cathedral.

The Invention of Stereo

The invention of sound recording in the previous century was a cultural milestone, bringing certain music to places, times and people it had never touched. The most primitive apparatus was an instant curiosity piece, yet the superiority of the live musical performance was rarely questioned. What was lacking in recorded sound was a sensation of ambience, a separation of the instruments and a perception of the context of the performance. By the early twentieth century a simple theory of human sound localisation had been developed, that enabled others to propose ways to convey this missing spatial detail.

Thus stereo was born, but it has not always been applied conforming with any stereo theory. For example, many audio practitioners are unaware that loudspeaker stereo relies on the conversion of intensity differences to phase differences around the head. Many sound engineers have still not adapted to stereo, preferring “multiple-mono” recording strategies. To some extent this is because the equipment makers have not produced appropriate hardware.

The Field of Psychoacoustics

Psychoacoustics is a field broader in scope than spatial sound, but there are many applications of it to this topic. Most practical spatial systems (including stereo) rely on psychoacoustics, endeavouring to create a natural impression, or an illusion of enveloping sound. Over the preceding few decades to the present there has been a considerable refinement of the theoretical and empirical aspects of the psychoacoustics of spatial sound. However we are still in the situation where there is no agreement on the finer details of what is perceptible, measurable or relevant. A consequence has been a lack of a unified approach on how to implement spatial sound beyond stereo, and indeed a lack of agreement on what are desirable characteristics of a spatial reproduction system. This lack of consensus is not confined just to the hardware, but is deeply related to what is the best way of encoding the spatial experience into an inevitably limited information bandwidth.

This problem could be broadly termed spatial coding, with discrete, matrix and kernel solutions being proposed. The discrete and matrix approaches both use speaker feed signals in the transmission medium, with desired sound locations between the speakers being reproduced with a lessened spatial precision or “stereoism” (literally, “solidity”). Kernel methods are more sophisticated, encoding an infinite number of source directions into a given number of channels, but abandoning the concept of a discrete “point source” being reproduced at the limited number of source locations corresponding to speaker locations. Kernel methods do not specify speaker locations implicitly, leaving that choice to the listener who must use the appropriate decoding for his/her speaker configuration.

Quadraphonics was a discrete and/or matrix solution, but there was no real agreement on what was intended to be achieved or what theoretical basis it had beyond “double stereo”. Furthermore, its integration into broadcasting was found not to be straightforward. As a consequence, it failed. Ambisonics is an alternative solution from the same era that offers a very viable compromise using kernel coding, but missed the opportunity to be market tested.

The argument over desirable characteristics has also carried over into the areas of microphones and loudspeakers. In some ways one can consider this a related debate to that which often occurs in the art of stereo recording. Is the intention to record an acoustic event with great precision, or to generate a perhaps even more pleasant spread of sonic images with not much relation to reality? Should one use highly directional loudspeakers in the aid of better imaging, or use less directional radiators that (in stereo) use room reflections to create a more natural listening experience? These issues are far from solved although it appears that in the cinema there is a greater consensus, because the listening environment and coding methods are more highly standardised.

The question of what is really practical or appropriate has not been settled either, although some will say that the buying public knows best. Most homes cannot accommodate sixteen speaker sound systems, nor is it acceptable yet to hand out three dimensional virtual reality headphones to rock concert patrons. Different technologies evolve to suit different contexts.

In the last decade of the twentieth century there has been a growing interest in spatial sound arising from the development of new communications and media technologies. Virtual Reality and High Definition Television (HDTV) both require spatial sound reproduction in excess of what ordinary stereo can offer. Many rental videotapes and television shows are now Dolby encoded for cinema style surround sound. New equipment and software is beginning to proliferate, claiming to deliver spatial sound.

Modern Approaches To Simulating Spatial Sound

There is no commercially available system that successfully conveys the natural spatial hearing experience. Apart from Ambisonics, a close approach is made by binaural stereo and transaural stereo methods, both capable of delivering a quite natural sounding three dimensional effect when used with well recorded source material. However, the limitation is that one must wear headphones with binaural stereo , or in the case of transaural stereo, there is only one good listening position for hearing the full spatial effect through the loudspeakers.

In the laboratory much more is possible, at a much greater cost. If digital audio technology continues to become more affordable then some of these systems will become commercially feasible, but that is no guarantee of their universality or intrinsic merits. Indeed, the rise and fall of quadraphonics could easily be emulated in the next decade by some new candidate. As before, the two key concerns will probably be compatibility with existing stereo and standardisation of software format, including perceptual coding to reduce the digital data rate. Multispeaker stereo, using a small number of frontal speakers is proposed as a worthwhile improvement over stereo, while retaining backwards compatibility.

It remains to be seen whether the recording industry is willing to set aside the stereo paradigm and update all the equipment to a new format, when most consumers would still be using stereo (and even mono) for many more years. Once HDTV with multichannel sound becomes commonplace, there may be an irresistible pressure for the audio-only industry to settle on the same or an even better surround format. In the next few years, these transmission and distribution decisions must also be made for digital audio broadcasting (DAB). The use of simple matrixing and speaker-feed signals seems particularly short-sighted to this author, yet that is exactly what is being proposed for HDTV by some European and Japanese researchers.

Apart from what occurs in the mass marketplace, there will always be distinct applications for other multichannel systems intended for psychoacoustic and localisation research, for auralization (used to simulate building acoustics etc) and even for contemporary music concerts. Current systems are all very specific to their context with not many common features, apart from the use of many amplifiers and loudspeakers. Likewise there appears not to be any accepted spatial coding standard, apart from ad hoc discrete methods. Often the spatial effect is achieved by a “brute force” approach at high expense. However, these multichannel systems can achieve striking effects when combined with the appropriate spatial software or spatial processors.

Equipment based on digital signal processing (DSP) has become commonplace in both the domestic and professional arenas. Processors for recording studios or on-stage applications employ complex but easily useable algorithms, creating a variety of effects with many consequences in stereo space.

Much effort has gone into the digital synthesis of reverberation. Nowadays it is possible to buy for a very reasonable price a device that produces stereo reverberation indistinguishable from a stereo recording of the real event. This technology is commonplace in recording and production studios. Lately a variant has been seen in the domestic sphere in the form of ambience simulators, often requiring the addition of an extra pair of loudspeakers. The devices enable one to play recorded music via a choice of synthesised acoustics, for instance a jazz club or a concert hall. This technology cannot fool the experienced listener, as individual instruments cannot be separated from the stereo mix and given individual spatial characteristics, as would occur in reality. It is not often recognised that reverberation is just one aspect of a perceived depth/distance effect, an area in need of more psychoacoustic research.

Head-Related Transfer Functions

The biggest current research effort in spatial sound is being directed toward head related functions, based on the shadowing effects of the head and on pinna cues. Using these functions it is now possible to generate binaural signals that convey three dimensional sound over stereo headphones, as a synthetic parallel to conventional binaural recordings. The main problem is that to achieve acceptable accuracy, the equipment must be calibrated to the individual’s spatial location versus spectral response, caused by the pinna. If a generic head related transfer function could be discovered that suited all individual variations in ear shape and size, it could be the equivalent of the Rosetta Stone for virtual binaural audio.

These techniques are being proposed for applications where a large amount of aural information must be quickly processed, for example in an aircraft cockpit where the pilot is making an approach to a busy airport. At the moment the head related techniques are far from reliable and cause many “front/back confusions” at the best of times. In spite of this, an immediately practical application may be in teleconferencing, where it is difficult to resolve the voices of several simultaneous participants by conventional means.

The application of this constellation of technological approaches to music composition and production, as opposed to reproduction, is quite varied. In many respects the popular music industry leads the way in terms of using the latest processing technology, albeit only in the realm of stereo. Equipment like Roland Sound Space (RSS) using a hybrid of methods has achieved a limited acceptance, commensurate with its limitations in terms of listening positions.

For computer and contemporary music no one spatial approach has dominated over stereo, with the possible (and lingering) exception of quadraphonics. Ambisonics is used by some and offers considerable advantages when a true three dimensional effect is required, combined with computational elegance. Others have built unique multichannel systems specific to a given venue. In any case, most people accept the need to use loudspeakers for delivering the music to a live audience, ruling out binaural or head related methods.

In computer and contemporary music composition the use of space is very much an individual matter. To a large degree it depends on the technological and software resources available if one is working in a format beyond stereo. Among others, individuals like Chowning, Stockhausen and Wishart have been influential, providing a glimpse of what is possible. The greatest consensus is found in the world of cinema, where the spatial possibilities are limited by the coding standard and informal conventions delineate what is acceptable.


My personal conclusion is that spatial sound is at the point now of a having sufficient technological tools to solve most of the problems, but lacking a consensus on how to employ these tools for optimum results. This has a resulted in a highly fragmented field, not aided at all by a general ignorance about spatial psychoacoustics and the competing marketing departments of key corporate players. What is required is the cultivation of an informed and general outlook on spatial sound, combined with an appreciation of past mistakes and achievements.

c. 1996, Kimmo Vennonen. (Republished with permission.)


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.