A Bicyclist’s Sense Of Hearing: How Important?

by John S. Allen

allen.gif

Other than warning about loose parts on the bicycle, what can the sense of hearing do for a bicyclist, and what can it not do?

There’s a lot of confusion on this subject. It’s often said that hearing is the bicyclist’s second most important sense, after sight. Well, not exactly. This statement neglects the sense of balance, the sense of touch and the kinesthetic, proprioceptive sense (sense of body positioning), which actually make it possible to ride a bicycle — even with your eyes closed. (See note 1 below). After these senses comes sight, which makes it practical to ride where there are things you might run into. But how far behind sight does hearing come?

In order to answer these questions, I’m temporarily going to trade my bicycle helmet for an engineer’s propeller beanie. (See my curriculum vitae if you wish to review my qualifications.)

Hearing: Sometimes Helpful, But Unreliable

In quiet (typically, rural) surroundings, the sense of hearing can sometimes alert a bicyclist to a motor vehicle, a charging dog or another potential hazard before the bicyclist can see it. Usually, the unseen hazard is either behind the bicyclist, or obscured by vegetation or another obstacle. A bicyclist may sometimes hear a car a mile away under ideal, quiet conditions, upwind and on level terrain or across a valley. But especially when riding into the wind, bicyclists are often surprised by motor vehicles overtaking them, and even more often by other bicyclists overtaking them. The refraction of sound waves by moving air works against the bicyclist in this situation, and so does wind noise.

The sense of hearing has a resolution of about +- 3 degrees for sound sources directly to the front or rear. At other angles, the resolution is poorer, since the timing difference between the two ears changes less rapidly with the angle of the sound source. At 50 feet (15 m), less than 2 seconds before the car reaches the bicyclist at a speed difference of 20 mph (30 km/h), +- 3 degrees amounts to a 6 foot (2 m) range of possible positions. This is in addition to the uncertainty as to whether the major noise source, the exhaust pipe, for example, is on the right or the left side of a vehicle.

Even under quiet conditions, then, the best that the sense of hearing can do is to provide an unreliable warning of a vehicle’s presence, and an inaccurate idea of its position. And while the sense of hearing can indicate that something is there, it can not indicate that nothing is there. Bicyclists learn very quickly not to trust their sense hearing to warn them before turning or changing lane position.

Under noisy urban conditions, the sense of hearing can not often provide an early warning, though often it does provide information about nearby vehicles. On a crowded street, only especially loud sounds such as car horns can provide an early warning.

It is not surprising, then, that the right-of-way rules in the traffic law are based on the sense of sight rather than hearing. A vehicle operator’s only hearing-related duty under the traffic law is to respond to special warning devices: horn, siren or bell. Despite this duty, no laws prohibit deaf persons from operating either a motor vehicle or a bicycle. Not only this, the only laws restricting sound systems on or in a vehicle are intended to reduce disturbance to people outside the vehicle. That is, except for except for laws which prohibit headphones. More about headphones later.

Contrast the facts I have just recited with the distorted, popular view of the role of the sense of hearing for bicyclists. This view is based on several assumptions, namely:

1) The incorrect assumption that bicycling is inherently very dangerous, and the related assumption that safety always outweighs all other considerations, for example bicyclists’ enjoyment of their sport or their need to communicate.

2) The assumption that a bicyclist can and should be held responsible for actively avoiding accidents for which the sense of hearing provides a warning;

3) The assumption that the sense of hearing is useful and reliable enough that it is essential to safe bicycle operation.

These assumptions most commonly are expressed as condemnations of headphone use while bicycling. Let’s turn to the headphone issue now.

Types of Headphones

There are three major types of headphones. They differ greatly in their effect on hearing of sound from outside:

1) Circumaural or “sealed” headphones. These form an airtight seal against the sides of the head, and greatly attenuate sound from outside. They are preferred in noisy environments such as an airplane cockpit. They once were popular for high-fidelity sound reproduction, but they are heavy, bulky, uncomfortable and sweaty, and with better options available, they are much less widely used now. They are very rarely used with portable sound equipment.

2) Supraaural or “open-air” headphones. These rest on the ear but form no seal. The conventional telephone earpiece is a common example, and so are most headphones used with portable sound equipment.

When supraaural headphones are used for high-fidelity sound reproduction, precise spacing of the headphone transducer from the ear is essential for predictable low-frequency response. The spacing is controlled by an open-cell foam pad which is transparent to sound from the headphone transducer and also to sound from outside. Sound from outside the headphone is attenuated slightly by the bulk of the transducer assembly and much less by the foam which surrounds it.

Small supraaural headphones 2 or 3 cm across, the most common type used with portable stereos, have very little effect on sound from outside — about as much as if you hold up two fingers next to each ear but not touching it. (Try this.) Such headphones produce essentially no hearing impairment, if silent, and increasing impairment the louder they are played –just as with a loudspeaker.

3) Intraaural or “in-the-ear” headphones. Hearing aids commonly use these. The effect of “in the ear” headphones on sound from outside to depends on their construction. Some intraaural headphones plug the ear canal, while others leave it partially open to the outside. Even a small opening will let most sound from outside pass.

A few headphone models electronically cancel out some of the outside noise, mostly in the low-frequency range. These headphones are expensive and uncommon, and they all have a switch to turn off the noise cancellation.

Headphone Laws

Several states have laws prohibiting headphone use by motor vehicle operators and/or bicyclists.

Some of these laws permit headphones which cover one ear. The idea behind these laws is that the other ear will then be able to receive sounds from outside. To be consistent, these laws should in principle also allow the wearing of only one earmuff in cold weather, though somehow, nobody has thought of banning earmuffs. One-ear laws don’t make scientific sense, since a single headphone can actually have worse effects on hearing than binaural (two-ear) headphones. The desensitization of one ear by a single headphone played loud enough to cut through background noise changes the apparent location of sound sources. This problem is much less likely with binaural (two-ear) headphones.

Except when very unusual recording techniques are used, all sound sources reproduced through headphones appear inside the head or at the ear(s), where they are difficult to confuse with other sounds. This effect is even more pronounced with binaural (two-ear) headphones, and allows the programming they convey to be intelligible at a lower volume.

The wording “covers the ear(s)” usually found in headphone laws is supposed to distinguish between headphones and loudspeakers, but it does so poorly. To “cover the ear(s)” is a visual concept, but the ears do not see, they hear. Open-air headphones do not cover the ears, impairing hearing, any more than goggles cover the eyes impairing sight or a scarf covers the nose, impairing the sense of smell.

It is also fair to point out that headphones have practical advantages and legitimate uses for bicyclists, more so than for other vehicle operators. This is, after all, precisely why headphones are popular with bicyclists. Headphones are lightweight and require very little electrical power to operate, important advantages on a human-powered vehicle. Headphones deliver sound to the bicyclist without disturbing other people. Headphones may be used for entertainment or to gather information unrelated to bicycling — listening to a news broadcast, holding a conversation via ham radio, auditing a correspondence course — but they may also be used for bicyclist-to-bicyclist communication. In this context, headphones make it possible to teach safe riding, give route directions or relay vital safety messages over a far greater distance and more reliably than by mouth.

Did you ever wonder why television news correspondents always appear on camera with a little headphone plugged into one ear? It’s because this eliminates the problem of feedback from loudspeaker to microphone. For the same reason, headphones make it possible for a bicyclist using a two-way radio to conduct a normal conversation, rather than having to shut off the microphone when receiving.

Headphone laws are very rarely enforced. Many bicyclists ignore them. But enforcement is not the only way that the law affects people. One important reason not to wear headphones — even if they are not playing — is that they make it harder to collect on an insurance claim after a crash.

The first question a bicyclist’s attorney should raise when faced with this problem is whether the bicyclist had any duty to act differently if alerted by sound. Only if this is true is it important under the law whether the bicyclist actually heard the sound. For example, if an overtaking vehicle strikes a bicyclist riding in the normal position on the road, the overtaking driver had the duty under the law to avoid striking the bicyclist. The bicyclist had no duty to swerve out of the motorist’s way, and it is unlikely that hearing the car would have made it possible to determine whether swerving was necessary to avoid a collision. Therefore, the wearing of headphones should not be an issue in such a case. A judge ought to prohibit it from being discussed in the jury’s presence — but a judge may not do this. I have seen cases lost over this false issue.

Conclusions

As I hope that I have shown, laws banning headphone use by bicyclists are based on inaccurate ideas about headphone design. These laws outlaw the special advantages of headphones for bicyclists, particularly for two-way communication. Furthermore, the bicyclist is unusual among vehicle operators in having good use of the sense of hearing. If we held all drivers to the standard of being able to hear well, the only street-legal motor vehicles would be quiet, unenclosed ones such as golf carts. A bicyclist’s decision whether to wear headphones — particularly, open-air headphones — and of how loud to play them, ought to be of as little concern in the law as is the question of how loudly a motorist may play a radio inside a car.

I think that it is important for a bicyclist to think carefully about when to use or not to use headphones, and I certainly don’t encourage playing them loudly. Not only does loud playing of headphones shut out the outside world, it can damage hearing. I agree that headphones (or any other extraneous sound source) can sometimes affect the safety of bicycle operation. But the role of headphones in causing bicycle accidents is, in my opinion, deeply confused by faulty assumptions about the sense of hearing, and by ill-conceived laws which place headphones in a special category separate from other factors affecting hearing.

***

1) (Don’t try this at home unless you are Bill Gates. If you do try it, you will probably find that you can ride just as steadily with your eyes closed as with them open.)

c. 1997, John S. Allen
From John Allen’s Home Office Home Page. (Republished with permission.)

Advertisement

Binaural In-Depth.

by John Sunier

So What Is Binaural?

The binaural experience places the listener sonically where the sounds on the recording or broadcast originated, and requires no special equipment of any sort other than the binaural source and a pair of stereo headphones. The listener experiences sounds quite accurately localized in a complete 360-degree sphere- a true virtual audio environment. It does this via two tiny omnidirectional mikes placed at the entrance of the ear canals on a replica of a human head (“dummy head”). The two signals are kept entirely separate all the way from this artificial head mike system to the corresponding left and right drivers of the headphones worn by listeners.Though all modern binaural recordings are perfectly compatible for loudspeaker playback, in a normal stereo speaker setup you will lose the “you are there” binaural effect due to leakage of the sound cues intended for one ear into the other ear and vice versa.

Even sophisticated audiophiles are often confused about binaural due to the wrongful use of the term back in the l950’s by many who used Binaural and Stereo as synonyms for one another. Recording pioneer Emory Cook (if you were around then you’ll remember his twin-tracked early stereo LPs) was one of these. Yet in the notes provided with all RCA Victor two-track stereo open-reel tapes starting around 1956 was the following:

Stereophonic recording differs from Binaural (a term sometimes incorrectly applied to stereophonic records) in that the microphone placements are selected for loudspeaker reproduction. Binaural properly applies to a two-channel system designed for headphone reproduction. It thus requires the use of two channels fed by microphones spaced about seven inches apart (normal ear separation).

That definition just about tells the tale. All of us have noticed the tremendous difference between hearing a stereo recording on speakers and hearing it on headphones. Headphones seem to put a giant sonic magnifying glass on all aspects of the recording, including stereo separation. Many recordings sound like half the band or orchestra is in one studio with its signal feeding your left ear, and the other half in another studio with its signal feeding your right ear. The sounds seems to be localized at your two ears and totally inside your skull rather than happening outside your head. Some persons also image a central area of sounds in their skull, so that it feels like three little separated groups of musicians inside your head. The HeadRoom circuit was developed to minimize this effect when listening to standard stereo recordings.

The truth is that over 200 million stereo headphones having been sold in the past decade (way over 600 million if you include all the throw-away headphones bought by those airlines no longer giving passengers primitive plastic tubing). But the source material that nearly everyone is listening to on their headphones was never designed for listening on headphones, but for playing via loudspeakers! With speaker playback, the left channel sounds are meant to reach the right ear and visa versa. Producers of commercial recordings almost always monitor with speakers rather than headphones. Binaural keeps the left and right channels absolutely separated from the original dummy head (or your actual head) all the way to the listener’s headphones without mixing. This applies whether the medium is a recording, live, or a radio broadcast.

Professional Mike Systems For Binaural

Commercial binaural recordings generally use one of two different expensive professional “dummy heads” (“Kunstkopf” in German). In fact, both come from Germany. The Neumann KU-81 or KU-100 head was probably used — often in conjunction with other mikes — on a CD or two in your collection. (Cost: about $6500.) The Aachen Head Acoustics system is more complex, with special equalization to achieve the most natural reproduction on both speakers and headphones. (Their current model is also used for precise acoustic measurement and runs about $29,000.) Some recording engineers feel either of these mikes is capable of making more natural and well-balanced ordinary stereo recordings for speaker playback than the best purist mike techniques. Of course, the full binaural effect is not present in speaker playback except with expensive specialized cross-cancellation electronics; which also force you to sit in a narrow “sweet spot” without the freedom of movement that headphones allow. However, any matrix surround processor using “ambience recovery” rather than “ambience synthesis” will give a better surround sound effect with binaural recordings than with most specially-encoded Dolby Surround CDs. Most Dolby Pro Logic decoders will suffice, though processes such as Circle Surround, Six Axes and EARS are even better. Just stay away from what colleague Dan Kumin calls “boingerizers” – those Hall/Stadium/Jazz Club processors that artificially generate reverberation (echo) to add to the original ambient signal on the recording.

A visible dropping of the jaw is the most frequent indication that someone who has put on headphones is hearing effective binaural for the very first time. Followed by exclamations of surprise, wonder and unbelievability. Binaural, rather than trying to bring the sounds into your listening room, takes you where the sounds originally occurred. You are aware of sounds 360 degrees around you ­ not just right & left but forward & back and up & down! Someone whispering in one ear can make you jump, and a good rainstorm in binaural will have you opening your eyes (if they’re shut – which helps the impression) to make certain you’re not actually getting soaked!

In Binaural, the pinna or outer ears of the dummy head or head of the original recordist set up subtle interference patterns that locate the sounds around the head quite specifically in space. These are known technically as HRTFs – Head Related Transfer Functions – and have become central to current audio research directed toward achieving virtual audio effects with two or more loudspeakers that approach the realism of binaural with headphones. Computer gaming and virtual reality software are fertile fields for this sort of enveloping sound. Sounds coming from directly in front of us bounce off the rear part of the outer ear; sounds from below bounce off the top part of the ear. When a sound is directly in line with the left or right ear there is a straight shot into the ear canal, and this provides different directional information from the other approaches. The ear/brain combination works together closely in binaural hearing. Take for example “the cocktail party effect” – in which we “steer” our binaural hearing around a noisy room and focus it on the one person we want to hear, while minimizing the distraction of other voices.

Early Binaural

The first experiment with binaural, way back in 1881, compared the effect to the popular stereoscopic views of the period. The inventor said of his binaural patent, “This double listening to sound produces the same effects on the ear that the stereoscope produces on the eye.” He set up a series of carbon telephone mikes in pairs (about 7 inches apart) along the edge of the stage of the Paris Opera. As the singers performed on stage, their voices were carried on twin pairs of telephone lines to a few subscribers homes who had two lines installed. They put the earpiece from one line to their left ear and the earpiece from the other to their right ear. Fortunately, a wide frequency response is not a requirement to convey the binaural effect, because the phone system of the time was surely quite primitive.

More Recent Binaural Activity

There has been sporadic interest and activity in binaural since those early days late in the 19th century. In the middle 1920’s some radio stations in Connecticut and elsewhere broadcast experimentally on two different frequencies — feeding each transmitter separately from a left-ear and right-ear mike in a dummy head in the studio. Listeners were already listening on headsets for the most part, since primitive speakers were just coming into fashion. So this worked out well — they merely put one mono headset, tuned to the left-ear station, to one ear and put the other mono headset tuned to the second station, to their right ear. Some of the West German radio stations have devoted time to special binaural transmissions — often of radio dramas which they call “horspiel.” There has also been interest in Japan. “The Cabinet of Dr. Fritz” series of binaural radio dramas from ZBS Productions was carried for some years on public radio stations here in the U.S. Many of those same stations also carried my own weekly program, AUDIOPHILE AUDITION, on which I presented All Binaural Special broadcasts once per quarter for over 13 years.

In 1970 Stereo Review offered a binaural demonstration LP of music and sound effects which used a homemade dummy head known as the Blue Max. There have been many binaural recordings available in Germany, mainly of classical material and on LP. The disadvantage of employing either analog LP or cassette for binaural material is the noise problem. The surface noise or hiss that we have become accustomed to when listening via loudspeakers can become intolerable with headphones. The greater clarity via headphones makes extraneous noises in the source stand out and detracts from the total sonic experience of binaural. Add to that a peaky high end in some headphones that further points up surface noise and hiss compared to speaker reproduction.

As a result of this, the compact disc and other digital media such as MiniDisc and DAT have proven the perfect medium for binaural. The excellent signal-to-noise lets the listener concentrate on the sounds and begin to forget that he or she is actually listening to a recording – one just starts to take part in the original music or sound-making!

You Can Do It Yourself

Their introduction to binaural makes a great impact on some listeners. Then when they learn how basically simple the recording process can be they are energized to make their very own binaural recordings. Some years ago consumer-level binaural mike systems were offered by Sennheiser, Sony and JVC, but have been long discontinued. Today several suppliers provide a variety of in-ear mike systems at a $70-$300 price range. They are usually paired with a DAT or MiniDisc portable recorder, though a good quality cassette recorder may also be used. [Editor: See the Commercial Links page for binaural resources.]

For such recording efforts, sounds in motion are especially effective in binaural, as well as sounds that are spatially separated. I have some binaural tapes of a symphony orchestral rehearsal, and for demo purposes, it must be admitted that feeling like you are sitting right on stage with the orchestra during the rehearsal, with music stands clanking, chairs squeaking, the conductor walking around to help some of the players with small problems, can sometimes be more exciting than hearing the final performance of the music. Sound effects such as a motorcycle or train passing by, take on a quantum step in “you are there” realism with binaural vs. the old-fashioned stereo demos of trains passing between your loudspeakers. Keep some of these tricks in mind when doing your own recording with binaural mike systems. For example, if you have a quartet of instruments or singers, have them perform in a circle around you instead of in a line in front of you! (I’m a nut on sax quartets and do they ever sound great recorded in this way!) Instead of sitting out in the front row of the audience to tape an early music ensemble, one recordist set up his dummy head with mics in a chair right in the middle of the group onstage – creating an effect as though the listener is one of the musicians performing! – most exciting early music recording I’ve every heard. The surrounding spatiality adds great interest to the music. Another recordist taped his taking an elevator, walking into the concert hall and settling in his seat at the beginning of a concert and then the reverse at the end to make it a more complete binaural experience for listeners. (Unfortunately, the elevator was totally silent, so he edited out that part.)

Headphones For Binaural

While binaural can be heard with any stereo headphones down to the simplest $5 “ear-buds,” the better the phones, the more amazing the experience. I have found some of the Sony phones around the $100 price point to be good. (The Grado SR-80 at the same price is excellent.) I can’t vouch for current Sony models, but do stay away from the MDR-V6 (once recommended by Consumer Reports) because it destroys much of the binaural effect. Among the best under-$600 phones I have heard for binaural are the Sennheiser HD 600, SONY MDR-CD3000, AKG K-501, Beyer 990 Pro, Etymotic ER-4S, and Grado RS-1. (No special order intended in that list.) The K-500 has many of the qualities of AKG’s flagship K-1000 ($895) which I find the best all-around binaural phone due especially to its ability to help image the sounds outside one’s head. The Jecklin and Ergo headphones from Switzerland, at about the same price point, also offer this advantage. The Etymotic are basically test probes inserted deeply into the ear canals – just the opposite of the off-ear-driver phones. However, their fans rave about them for binaural, and with the tight seal to the eardrum bass reproduction equals the most monster subwoofer you could fit in a room! Extra-cost custom ear molds make the Etymotic more comfortable for extended wear.

The Grado RS-1 Reference phones and the Sennheiser HD 600 are also excellent and of interest to those who find the AKGs too bizarre with their little earspeakers suspended on either side of your head. Both the Grado and Sennheiser provide more deep bass than any other on or off -ear headphones I have heard. The Stax electrostatic earspeakers have been the standard for binaural for years. Their top-of-line Omega has a dedicated tubed amp and goes for over $4000 but is probably the best-sounding headphone ever. Don’t worry about the suitability to binaural of feature differences such as circumaural vs. on-ear, free field vs. diffuse field or electrostatic vs. dynamic. Even extended frequency response is not a prerequisite for successfully transmitting the full binaural effect. Phase accuracy and flat response within the frequency spectrum are the most important parameters. A trend showing the increased interest in headphones and binaural is dedicated high end headphone amps — HeadRoom, Melos, Grado, Music Hall, Musical Fidelity and others have them. AKG will introduce a new model soon. Some of the high end phones practically demand a good dedicated amp, and even a modest amp can upgrade the sonics of a more modest headphone.

c. 1999, John Sunier.
From The Binaural Source. (Republished with permission.)

Virtual Audio For Headphones.

by Alastair Sibbald

sibbald1.gif

When sound waves arrive at the head, they are modified acoustically in a directionally dependent manner by diffractive effects around the head and by interaction with the outer ears, characterised by the Head-Related Transfer Function (HRTF). The HRTF can be synthesised electronically, using digital signal processing, and used to process various sound sources which can be delivered to a listener’s ears via headphones (or loudspeakers). This creates a virtual sound image in three dimensions and can provide a stunning, immersive listening experience.

Sensaura headphone processing employs HRTFs that have been created specifically for headphone listening. In order to provide the best possible audio experience, a method of creating pure, ‘uncoloured’ headphone-specific HRTFs has been devised. The resultant HRTFs are proprietary to Sensaura systems and, in their neutral state, have been optimised for universal headphone usage. Individual manufacturers can further customise these HRTFs, working with Sensaura engineers, so as to provide optimal performance for their individual headphone models.

1 Hearing in three dimensions

When we listen to the sounds around us in the real world, we can determine with great accuracy where each individual sound source is. Our head and ears operate as a very sophisticated ‘directional acoustic antenna system’ [1] , such that we are aware not only of the location and distance of the sound sources themselves, but also of the type of acoustic environment which surrounds us. When sound waves arrive at the head, they are modified acoustically in a directionally dependent manner by diffractive effects around the head and by interaction with the outer ears. Also, there is a ‘time-of-arrival’ difference between the ears, which the brain computes accurately. These acoustic modifications (‘sound cues’), are more fully described in another technical white paper in the present series: An introduction to sound and hearing [2] .

sibbald2.gif
Figure 1: Free-field acoustic transmission pathway into the ear

As the sound waves encounter the outer ear flap (the pinna), they interact with the complex convoluted folds and cavities of the ear, as shown in Figure 1. These support different resonant modes depending on the direction of arrival of the wave and these resonances can amplify or suppress the incoming acoustic signals at certain associated frequencies. For example, the main central cavity in the pinna, known as the concha, makes a major contribution at around 5.1 kHz to the total effective resonance of the outer ear, boosting the incoming signals by around 10 to 12 dB at this frequency. Consequently, the head and pinna (and auditory canal) can be considered as spatial encoders of the arriving sound waves (Figure 1), which are then spatially decoded by the two aural cortices in the brain.

sibbald3.gif
Figure 2: Typical HRTF characteristics (-50° azimuth)

The concerted actions of the various acoustic effects create a measurable characteristic known as the Head-Related Transfer Function (HRTF), which comprises three elements: (a) a near-ear response; (b) a far-ear response; and (c) an inter-aural time delay. In order to illustrate these effects, the characteristics of a typical horizontal-plane HRTF are shown in Figure 2 for an azimuth angle of -50°.

The upper (blue) plot represents the near- (left) ear spectral response, the lower (red) plot shows the far-ear spectral response and there is an inter-aural time delay (time-of-arrival difference) of about 0.43 ms. (These particular curves are typical of a measurement which includes an auditory canal element, contributing an additional mid-range signal boost.) Each different position in space around the listener has a different, associated HRTF.

The three characteristic elements of an HRTF can be synthesised electronically, using digital signal processing, and then delivered to a listener’s ears via headphones or loudspeakers. If this is done correctly, the synthesised 3D spatial sound cues are used by the listener’s brain to create a virtual sound image in three dimensions and can provide a stunning, immersive listening experience.

2 HRTFs: measurement and use

It is common to measure HRTFs using impulse response methods, but there are several different configurations of ‘head’ that can provide the physical source for the measurements. It will be appreciated, of course, that the more accurate the measurement, the more effective will be the 3D synthesis. The main sources of HRTFs are as follows.

(a) Artificial head system without auditory canal simulation, where the microphones are flush with the floor of the concha cavity.

(b) Artificial head system featuring an auditory canal simulator, where the microphones are mounted so as to emulate the eardum.

(c) Real head measurements on volunteers, using either probe microphones sited near the tympanic membrane, or ‘blocked meatus’ microphones (where the auditory canal is plugged and the microphone is sited flush with the concha floor, as in (a) above).

Option (b) is the best choice, and provides the most accurate results [3] . Option (a) is less accurate, providing poor elevation cues and front-back discrimination, whilst option (c) is subject to many artefacts and inaccuracies.

sibbald4.gif
Figure 3: Synthesis of 3D audio cues using HRTF

Once an accurate set of HRTF measurements have been made, they can be used in digital signal processing algorithms to synthesise 3D audio signals, as shown in Figure 3. At present, a typical processing scheme would use a pair of 25-tap FIR filters for the near-and far-ear responses, coupled with a time-delay corresponding to the inter-aural time-of-arrival difference, to create a near- and far-ear signal pair from a monophonic sound source.

Next, the resultant binaural signal pair must be introduced directly into the appropriate ears of the listener, either by headphones or loudspeakers. If this is carried out correctly, then he or she perceives the original sound to be at a position in space in accordance with the spatial location of the HRTF pair which was used for the signal-processing.

Surprisingly, this is not so straightforward as it might seem. Irrespective of whether headphones or speakers are chosen as the preferred audio delivery route, it is essential to ensure that the listener’s own HRTFs do not interfere with the synthesised HRTFs, as is described in the following section.

3 Synthesising 3D audio via loudspeakers

There are two main aspects to loudspeaker delivery of 3D audio. Firstly, it is essential to ensure that the listener perceives the sounds via only a single HRTF and, secondly, that the transaural crosstalk is neutralised effectively.

3.1 HRTF ‘standardisation’

When sounds are emitted from a loudspeaker, they are perceived via the listener’s own HRTFs (typically ±30° for a stereo configuration). This factor must be removed from the 3D audio synthesis chain and so it is common to ‘normalise’ or ‘standardise’ the synthesiser HRTFs by dividing them all by the 30° HRTF characteristics. (It is also possible to use other standardisation protocols, such as the 0° HRTF or a diffuse-field HRTF.) When this has been carried out, the standardised HRTFs occupy a smaller dynamic range than the full versions and hence the filter design process is made easier.

3.2 Transaural crosstalk cancellation

During loudspeaker playback, transaural crosstalk occurs. This means that the left ear hears a little of what the right ear is hearing (after a small, additional time delay of around 0.25 ms) and vice versa. In order to prevent this happening, appropriate ‘crosstalk cancellation’ signals must be created from the opposite loudspeaker. These signals are equal in magnitude and inverted (opposite in phase) with respect to the crosstalk signals and are designed to cancel them out. More advanced schemes exist which anticipate and prevent the effects of the cancellation signals themselves contributing to secondary crosstalk. This topic is described in some detail in another Sensaura technical white paper Transaural acoustic crosstalk cancellation [4] .

4 Synthesising 3D audio via headphones

Listening to 3D audio via headphones is not a recent phenomenon; it dates back to the 1930s when engineers at Bell Laboratories demonstrated an early form of artificial head microphone system. Clearly, little or no transaural crosstalk occurs during headphone listening and so there is no need for crosstalk cancellation. Nevertheless, it has long been recognised that the synthesis of a convincing ‘out-of-the-head’ sound image via headphones is difficult to achieve [5] . There have been many papers published on this topic in the scientific literature over the years, but it is very difficult to assess the relative merits of the various experiments because of (a) their subjective nature, (b) the variations in headphone types used and (c) the unknown quality and accuracy of the HRTFs which were used at the time.

A key factor in providing an effective external sound image for 3D headphone audio is the quality and format of the HRTF set. The previous remarks about delivering the sound so as to contain only a single HRTF are vital. If this is not achieved correctly, then the spatial properties of the image are degraded, preventing the ‘out-of-the-head’ sound image, and inhibiting the frontal image.

sibbald1 (1).gif
Figure 4: Acoustic transmission pathway into the ear from headphone driver

Figure 4 shows the acoustic configuration for headphone delivery. As in the loudspeaker case, the HRTFs used during synthesis must be standardised so as to take account of the headphone-ear characteristics. Setting aside the differing headphone types for the moment (namely circumaural, supra-aural with airtight seal, supra-aural with soft cushion and, finally, insert), the general situation is more complex than it is for loudspeaker listening because of the intimate interaction between the ear and the headphone unit. How might it be possible to standardise an HRTF set for headphone listening?

One elegant and thorough approach to this problem is that of Wightman and Kistler [6,7] , who measured the headphone-to-eardrum transmission factor using probe microphones deep inside the ear canal. These measurements were then used to standardise similar measurements made under free-field conditions. Prior to using this data, however, one must be careful to compensate for the spectral ‘colouration’ effects of the transducers used during the measurements. It is quite simple to compensate for the loudspeaker colouration by measuring its response and building the inverse response into the HRTF set. However, it is not obvious as to how to compensate for the headphone colouration introduced during the headphone measurements. For example, if the headphone-to-ear response were to be measured on an artificial head and then used to standardise the HRTF set, then the headphone colouration would become built-in to all of the data.

Sensaura technology includes HRTFs that have been devised specifically for headphone listening. In order to create the best possible audio experience, a method of creating pure, ‘uncoloured’ headphone-specific HRTFs has been devised. Individual manufacturers can further customise these HRTFs, working with Sensaura engineers, so as to provide optimal performance for their individual headphone models.

5 Room effects: reflections and reverberation

HRTF processing, in the first instance, creates an anechoic simulation. As is common knowledge, the sound reflections which occur in our everyday surroundings create additional spatial information for the brain to use [2] , from which further information is derived about the distances of sound sources and the acoustic environment. By adding simulated reflections and reverberation to an anechoic simulation, the headphone sound image can be enhanced considerably. This is especially useful in helping to ‘move’ the virtual sound sources away from the head. Much pre-recorded movie material on DVD actually incorporates many reverberant effects and so often it is preferred to listen to movie material without much added reverberation. However, the amount of reflection and reverberation information which is preferred is very much a matter of personal taste and so, in addition to the Anechoic option, there are several optional listener settings built in to Sensaura systems, including Living Room and Concert Hall.

6 Dolby Digital virtualisation and other applications

One of the prime applications of Sensaura headphone technology is the recreation of Dolby Digital cinema-type surround-sound via headphones. A typical Dolby Digital listening configuration employs three frontal loudspeakers (left, centre and right) and two rearward loudspeakers (left and right surround). There is also the low-frequency effects channel, but this is non-directional and therefore does not require virtualization.

sibbald5.gif
Figure 5: Dolby Digital emulation for headphones: positions of virtual loudspeakers

By virtualising the five surround channels, the surround-sound listening experience can be recreated for the headphone user (Figure 5).

Sensaura technology is particularly well suited to this application because of the accurate Digital Earä headphone-specific HRTFs that are used. Future developments will include the use of Sensaura ZoomFX to provide diffuse virtual sound-fields for the surround channels and Sensaura Virtual Earä technology [8] for accommodating a range of ear and head sizes and types.

Other applications include the virtualization of conventional stereo material and the creation of a personal virtual environment for use with ‘silent’ musical instruments so that practise sessions become a ‘real’ experience. This could be programmed with any acoustic environment that the musician prefers, from a recording studio to a concert hall containing, perhaps, a very appreciative virtual audience!

7 Benefits of Sensaura headphone technology

 

  • Sensaura Digital Ear HRTF filters are used, providing accurate 3D placement and smooth spatial movement.
  • Sensaura Virtual Ear technology provides a library of head and ear sizes and types.
  • Colouration-free headphone drive processing.
  • Headphone-specific HRTF deployment (not just crosstalk cancellation inhibition).
  • Supports Sensaura MacroFXä and ZoomFX in gaming applications.
  • Can be licensee-customised for optimal performance on specific headphone models.
  • Incorporates a variety of room reflection/reverberation options based on Sensaura EnvironmentFXä.
  • Very efficient algorithms and filters (requires only modest signal-processing power).
  • Sensaura ZoomFX available for creating diffuse rear sound-field.

 

8 References

1. Acoustical characteristics of the outer ear. E A G Shaw, in Encyclopedia of Acoustics, M J Crocker (Ed.), John Wiley and Sons (1997), pp. 1325-1335.
2. An introduction to sound and hearing. A Sibbald Sensaura White Paper (devpc005.pdf).
3. An introduction to Digital Earä technology. A Sibbald Sensaura White Paper (devpc003.pdf).
4. Transaural acoustic crosstalk cancellation. A Sibbald Sensaura White Paper (devpc009.pdf).
5. Binaural auralization. Simulating free field conditions by headphones. D Hammershoi and J Sandvad Proc. AES, 96th Convention, Feb 26 – Mar 1, 1994, Amsterdam.
6. Headphone simulation of free-field listening. 1: Stimulus synthesis. F L Wightman and D J Kistler J. Acoust. Soc. Am., February 1989, 85, (2), pp. 858-867.
7. Headphone simulation of free-field listening. 2: Psychophysical validation. F L Wightman and D J Kistler J. Acoust. Soc. Am., February 1989, 85, (2), pp. 868-878.
8. Virtual Earä technology. A Sibbald Sensaura White Paper (devpc011.pdf).

c. 1999, Sensaura Ltd..
Reproduced with permission.

3-D Audio Primer.

by Aureal Corporation

This document presents an introduction to the general concepts and performance of three-dimensional audio technology. Several audio technology categories are defined with the purpose of creating a common understanding of “better-than-stereo” audio playback methods.

Contents:

1. Introduction to 3-D Audio
2. What is and What isn’t 3-D Audio
3. The Basics of Acoustics
4. The Basics of Human Hearing
5. How A3D Works
6. Advantages of A3D As Illustrated by Research
7. Summary

1. INTRODUCTION TO 3-D AUDIO

Since the late 1970’s, several audio technologies have been developed to advance the state of the art in audio reproduction beyond stereo. Most of them are focused on increasing the dimensionality of sound playback beyond the one-dimensional stereo sound field created by conventional playback on a left/right speaker pair. Furthermore, the advent of digital audio signal processing has enabled interactive audio experiences: similar to live music, sounds are created on-the-fly based on user input (for example in video games), rather than being based on playback of a pre-recorded soundtrack (as in movies).

A3D from Aureal is a digital audio technology that has been developed to provide maximum performance in both areas of dimensionality and interactivity. A3D technology is based on the principles of binaural human hearing. Binaural means that we hear using two ears. From the two signals that our ears perceive, we can extract enough information to tell where a sound is located in the three dimensional space around us. The functioning of the human hearing system has been researched successfully over the last two decades by psycho-acoustic researchers around the world. They have provided us with the necessary findings and understanding that today’s A3D audio systems are based on.

To put it in simpler terms: since we can hear three-dimensionally in the real world using just two ears, it must be possible to achieve the same effect from just two speakers or a set of headphones. On this basic assumption, 3D audio products have been successfully built.

This document starts by explaining how different forms of audio processing compare against each other (“What is and What isn’t 3D Audio”). It then focuses on the concepts of acoustics and human hearing that A3D is based on, and details the digital audio building blocks that make up an A3D system.

2. WHAT IS AND WHAT ISN’T 3-D AUDIO

As mentioned in the introduction, there are two key pieces to a 3D audio system: 3D positioning and interactivity.

A full-featured 3D audio system provides the ability:

  • To define a three-dimensional space
  • To position multiple sound sources and a listener in that 3D space
  • To do all processing it in real-time, or interactively, for example based on the users inputs in a video game (the opposite of interactive audio playback is a pre-recorded soundtrack).

Certain technologies, namely stereo extension and surround sound, offer some aspects of 3D positioning or interactivity. They are discussed here to explain what applications they are geared towards, and why they are not considered to be part of a new category of technologies, called Positional 3D Audio. This new category combines full 3D positioning and interactivity to offer a new kind of audio listening experience. A3D is the industry leading positional 3D audio technology. A comparison chart of different audio playback methods is included to help differentiate the features of each technology.

2.1 Extended Stereo

Extended stereo technologies and products process an existing stereo (two channel) soundtrack to add spaciousness and to make it appear to originate from outside the left/right speaker locations.

These products are particularly useful to restore stereo performance to low-end PC multimedia sound systems that typically contain low-quality speakers that are placed very closely together. Extended stereo effects can be achieved via various, fairly straight-forward methods. Additionally, their performance is often evaluated based on subjective criteria such as listening tests. For those reasons it is somewhat difficult to compare products in this area. Some of the differentiators include:

  • Size of the listening area (areain which the listener has to be placed withrespect to speakers to hear the effect, alsocalled sweet spot)
  • Amount of spreading of stereo images (more spreading, or user variable spreading, is better)
  • Amount of coloring (tonal changes)of audio content introduced by processing (no coloring is best)
  • Amount of stereo left/rightpanning information that is lost during processing (no panning loss is best)
  • Ability to achieve effect on headphones as well as speakers

Although sometimes marketed under the name “3D Sound” or “3D stereo” extended stereo technologies are not considered to be 3D audio technologies, because they only offer passive spreading of an existing soundtrack, and not interactive 3D positioning of individual sounds.

2.2 Surround Sound

Technologies and products that create a larger-than-stereo sound stage by playing back multi-channel Dolby® or Mpeg surround sound soundtracks on multi-speaker setups. Surround sound is based on using audio compression technology (for example Dolby ProLogic® or Digital AC-3®) to encode and deliver a multi-channel soundtrack, and audio decompression technology to decode the soundtrack for delivery on a surround sound 5-speaker setup. Additionally, virtual surround sound systems use 3D audio technology to create the illusion of five speakers emanating from a regular set of stereo speakers, therefore enabling a surround sound listening experience without the need for a five speaker setup. Aureal’s A3D Surround is a Virtual Surround technology.

Because they are pre-recorded, surround sound soundtracks are most suitable for movies. They are non-interactive, and therefore not particularly useful in interactive software such as video games and Web Sites. Because of their limitations when it comes to interactivity, surround sound systems are not considered for the interactive 3D audio category.

Ways to evaluate the performance of a surround sound system:

Physical Speakers
  • Presentation accuracy of individual channels, clarity of spatial imaging (size of sound stage)
Virtual Speakers
  • Listening comparison to a physical 5-speaker setup (accuracy of virtual to physical speaker mapping, as well as accuracy of reproduction of original soundtrack mix-down)
  • Amount of audio coloring (tonal changes) introduced by processing (no coloring is best)
Both Physical and Virtual Setups
  • Size of the listening area (area in which the listener has to be placed with respect to speakers to hear the effect, also called sweet spot)

2.3 Positional 3D Audio (A3d Interactive)

Positional 3D audio (a.k.a. interactive 3D audio) allows for interactive, on-the-fly positioning of sounds anywhere in the three-dimensional space surrounding a listener. Support for such technologies can be incorporated into software titles such as video games to create a natural, immersive, and interactive audio environment that closely approximates a real-life listening experience. This category can be described as the audio equivalent of 3D graphics. Aureal’s A3D Interactive is a positional 3D audio technology.

3D audio technologies create a more life-like listening experience by replicating the 3D audio cues that the ears hear in the real world. The following two sections, “The Basics of Acoustics” and “The Basics of Human Hearing”, explain what those listening cues are and how they can be reproduced. For maximum flexibility and usability, a 3D audio algorithm should support all possible audio playback environments: headphones, stereo speakers and multi-speaker (surround or quad) arrays. In the case of stereo speakers or headphones more demands are placed on the algorithm and less demands on the end-user, because stereo setups are most common and easy to setup. Multi-speaker arrays require less complex 3D audio rendering algorithms, but put more demands on the end-user’s playback setup (cost and setup complexity of extra amplifiers and speakers). In both cases, the desired 3D effects are controlled by software applications which position 3D sound sources and listeners via an API (Application Programming Interface) such as Microsoft’s DirectSound3D API for the Windows® platform, or the VRML 2.0 standard.

Ways to evaluate the performance of a 3D interactive sound system:

  • Listening tests to evaluate howwell sounds are projected in all three dimensions(left/right, up/down, front/back), and how much realism they provide
  • Number and quality of softwaretitles that take advantage of 3D technology
  • Number of concurrent 3D soundsources system provides at a given quality or sample rate
  • Ability to achieve effect onheadphones as well as speakers
  • Size of the listening area (areain which the listener has to be placed with respect to speakers to hear the effect, alsocalled sweet spot)
  • Amount of coloring (tonal changes)of audio content introduced by processing (no coloring is best)

Table1.jpeg

2.4 Headphone Versus Stereo Speaker Playback Devices

In terms of 3D sound processing, these two playback media offer different challenges and advantages. Headphones have the advantage of always being in a known position with respect to the listener’s ears. This means that two separate audio signals (left and right) are guaranteed to go directly into the two ears of a listener. With speakers, this is only the case if the listener is sitting in the ideal listening position, the sweet spot, and processing methods are employed to insure that the left ear does not receive any audio content from the right speaker, and vice versa (cross-talk cancellation).

3. THE BASICS OF ACOUSTICS

Human beings extract a lot of information about their environment using their ears. In order to understand what information can be retrieved from sound, and how exactly it is done, we need to look at how sounds are perceived in the real world. To do so, it is useful to break the acoustics of a real world environment into three components: the sound source, the acoustic environment, and the listener:

primer1.gif

Figure 1 – Typical soundfield with a source, environment and listener.

  • The sound source: this is an object in the world that emits sound waves. Examples are anything that makes sound – cars, humans, birds, closing doors, and so on. Sound waves get created through a variety of mechanical processes. Once created, the waves usually get radiated in a certain direction. For example, a mouth radiates more sound energy in the direction that the face is pointing than to side of the face.
  • The acoustic environment: once a sound wave has been emitted, it travels through an environment where several things can happen to it: it gets absorbed by the air (the high frequency waves more so than the low ones. The absorption amount depends on factors like wind and air humidity); it can directly travel to a listener (direct path), bounce off of an object once before it reaches the listener (first order reflected path), bounce twice (second order reflected path), and so on; each time a sound reflects off an object, the material that the object is made of has an effect on how much each frequency component of the sound wave gets absorbed, and how much gets reflected back into the environment; sounds can also pass through objects such as water, or walls; finally, environment geometry like corners, edges, and small openings have complex effects on the physics of sound waves (refraction, scattering).
  • The listener: this is a sound receiving object, typically a “pair of ears”. The listener uses acoustic cues to interpret the sound waves that arrive at the ears, and to extract information about the sound sources and the environment.

4. THE BASICS OF HUMAN HEARING

As explained above, people can be considered sound receiving objects in an environment. We have an auditory sensing system consisting of two ears and a brain. Additionally, very low frequency sounds can be sensed through the human body. The brain uses a number of cues that are embedded in the two sound signals it receives from the two ears to learn about the sounds and their environment. Most people are unaware that the effects described in the following sections greatly impact our continuous perception of reality, every day of our lives. On the other hand, there are certain people, for example non-sighted people, that are very much aware of these effects, because they heavily rely on their ears for querying and navigating their surroundings.

4.1 Primary Localization Cues – IID and ITD

The two primary localization cues are called interaural intensity difference (IID) and interaural time difference (ITD). IID refers to the fact that a sound is louder at the ear that it is closer to, because the sound’s intensity at that ear will be higher than the intensity at the other ear, which is not only further away, but usually receives a signal that has been shadowed by the listener’s head (see fig. 2). ITD means that a sound will arrive earlier at one ear than the other (unless it is located at exactly the same distance from each ear – for example directly in front). If it arrives at the left ear first, the brain knows that the sound is somewhere to the left (see fig. 3).

primer2.gif

Figure 2 – Illustration of IID.

primer3.gif

Figure 3 – Illustration of ITD.

The combination of these two cues allows the brain to narrow the position of an individual sound source to somewhere on a cone centered on the line drawn between the listeners ears (see fig.4 ).

primer4.gif

Figure 4 – ITD Cone.

4.2 The Outer Ear Structure – Pinna

Before a sound wave gets to the ear drum, it passes through the outer ear structure, called the pinna. The pinna accentuates or suppresses mid- and high-frequency energy (see fig. 5) of a sound wave to various degrees, depending on the angle at which the sound wave hits the pinna (see fig. 6). This means that the two pinnae act as variable filters that effect every sound that passes through them. The brain knows how to figure out the exact location of a sound in space by receiving a signal that has been filtered in a way that is unique to the sound source’s position relative to the listener.

primer5.gif

Figure 5 – Spectrum differences between original and pinna.

primer6.gif

Figure 6 – Pinnae frequency modulation sound source and pinna reception at varying elevations.

The pinnae are the key to accurately localizing sounds in space. However, since the outer ear and its folds are on the scale of a few centimeters, only sound waves with wavelengths in the centimeter range or smaller can be affected by the pinna. In addition, the two ears are about 15 centimeters apart, so even IID and ITD cues are greatly reduced for wave lengths bigger than that. For example, a 3.3 kHz sound signal oscillates 3300 times per second, while sound travels at about 330 meters per second. The wave length is therefore about 330/3300 = 0.1 meters, or 10 centimeters. This means that a sound at 3300 Hz lies in the area where primary cues are still noticeable, but pinna cues start to be diminished. In general, the higher the frequency of a sound, the shorter its wave length, and the better it can be localized. This phenomena can be verified by placing two speakers, a sub-woofer and a high-frequency tweeter, in a room and playing music through them. With closed eyes you will be able to immediately tell where the tweeter is located, the sub-woofer however will sound like it is “coming from everywhere”.

4.3 Propagation Effects, Range Cues, and Reflections

Many things happen to a sound as it travels through an environment before it is received by a listener. All of these effects allow us to learn more about what we are hearing and what kind of environment we are in:

  • A somewhat muffled, quiet sound is likely off in the distance (see fig. 7).
  • If it is heavily muffled, we might be in a enclosed space, listening through glass, or other wall materials.
  • The effect of sound reflections in an environment is very important, because we are able to hear the difference in time of arrival and location between the direct path signal, first-order, and n-th order reflections (see fig. 8). The reflections give us a way to further pin-point a sound source’s location, as well as the size, shape and type of room or environment that we are in (people with very “good ears” are able to exactly locate a wall, or tell the difference between a open or closed door, simply by listening to reflections). While humans are capable of individually perceiving first order reflections, second and higher order reflections usually combine to form what are called late field reflections, or reverb.

primer7.gif

Figure 7 – Source attenuation and absorption.

primer8.gif

Figure 8 – Direct path, first and second order due to range (listener-source distance) reflections in a typical room.

5. HOW A3D WORKS

A 3D audio system aims to digitally reproduce a realistic sound field. To achieve the desired effect a system needs to be able to re-create portions or all of the listening cues discussed in the previous chapter: IID, ITD, outer ear effects, and so on. A typical first step to building such a system is to capture the listening cues by analyzing what happens to a single sound as it arrives at a listener from different angles. Once captured, the cues are synthesized in a computer simulation for verification.

5.1 What is an HRTF?

The majority of 3D audio technologies are at some level based on the concept of HRTFs, or Head-Related Transfer Functions. An HRTF can be thought of as set of two audio filters (one for each ear) that contains in it all the listening cues that are applied to a sound as it travels from the sound’s origin (its source, or position in space), through the environment, and arrives at the listener’s ear drums. The filters change depending on the direction from which the sound arrives at the listener. The level of HRTF complexity necessary to create the illusion of 3D realistic hearing is subject to considerable discussion and varies greatly across technologies.

HRTF Analysis

The most common method of measuring the HRTF of an individual is to place tiny probe microphones inside a listener’s left and right ear canals, place a speaker at a known location relative to the listener, play a known signal through that speaker, and record the microphone signals. By comparing the resulting impulse response with the original signal, a single filter in the HRTF set has been found (see fig. 9). After moving the speaker to a new location, the process is repeated until an entire, spherical map of filter sets has been devised.

primer9.gif

Figure 9 – Combining speaker output and microphone input to compute impulse response.

Every individual has a unique set of HRTFs, also called an ear print. However, HRTFs are interchangeable, and the HRTF of a person that can localize well in the real world will let most people localize well in a simulated world. While generic, interchangeable HRTFs are suitable for general applications such as video conferencing or games, individualized HRTFs are useful for performance critical applications of binaural audio, such as jet fighter cockpit threat warning systems, or air traffic control systems.

HRTF synthesis

Once an HRTF has been devised, real-time DSP (digital signal processing) software and algorithms are designed. This software has to be able to pick out the critical (psycho-acoustically relevant) features of a filter and apply them in real-time to an incoming audio signal to spatialize it. The system works correctly if a listener cannot tell the difference between listening to a sound over the speaker setup from the analysis process above (the speaker is in a specific position), and the same sound played back by a computer and filtered by the HRTF impulse response corresponding to the original speaker location (see fig. 10).

primer10.gif

Figure 10 – Applying synthetic impulse response synthetically to create illusion of a virtual speaker.

Playback Considerations

HRTFs can be used with great effectiveness in all audio playback configurations: headphones, stereo speakers, or multi-speaker arrays. On headphones, HRTF output is sent directly to the users ears. On stereo or multi-speaker setups, an additional audio processing step called cross-talk cancellation is employed to ensure proper signal separation between left and right ears.

5.2 Aureal Wavetracing (A3D)

Once HRTFs have been captured and can be rendered, a sound can be made to appear from any 3D location. To compute and render the additional effects that the 3D environment can have on a sound, A3D employs proprietary Wavetracing algorithms. Among other features, the addition of Wavetracing technology distinguishes A3D 2.0 systems from A3D systems. Developed over many years in conjunction with clients such as NASA, Matsushita and Disney, Aureal’s Wavetracing technology parses the geometry description of a 3D space to trace sound waves in real-time as they are reflected and occluded by passive acoustic objects in the 3D environment. With Wavetracing, sounds cannot only be heard as emanating from a position in 3D space, but also as they reflect off of walls, leak through doors from the next room, get occluded as they disappear around a corner, or suddenly appear overhead as you step into the open from a room. Reflections are rendered as individually imaged early reflections and as reverb late field reflections. Acoustic space geometries and wall surface materials are specified via the A3D 2.0 API (Application Programming Interface). The result is the final step towards true audio rendering realism: the combination of 3D positioning, room and environment acoustics and proper signal presentation to the user’s ears.

5.3 The A3D API

The A3D API (Application Programming Interface) delivers A3D into the hands of the software content developer. It allows games, 3D Internet browsers, and other 3D software applications to harness the full power of A3D. The API allows the application developer to do the following:

  • position sound sources and listeners in 3D space
  • define the 3D environment and its acoustic properties such as wall materials
  • synchronize 3D graphics and A3D audio representations of objects (see section of Audio-Visual Synergy below)
  • synchronize user inputs with A3D rendering (see section on Head Movement below)

Audio-Visual Synergy

The eyes and ears often perceive an event at the same time. Seeing a door close, and hearing a shutting sound, are interpreted as one event if they happen synchronously. If we see a door shut without a sound, or we see a door shut in front of us, and hear a shutting sound to the left, we get alarmed and confused. In another scenario, we might hear a voice in front of us, and see a hallway with a corner; the combination of audio and visual cues allows us to figure out that a person might be standing around the corner. Together, synchronized 3D audio and 3D visual cues provide a very strong immersion experience. Both 3D audio and 3D graphics systems can be greatly enhanced by such synchronization.

Head Movement and Audio

Audio cues change dramatically when a listener tilts or rotates his or her head. For example, quickly turning the head 90 degrees to look to the side is the equivalent of a sound traveling from the listener’s side to the front in a split second. We often use head motion to track sounds or to search for them. The ears alert the brain about an event outside of the area that the eyes are currently focused on, and we automatically turn to redirect our attention. Additionally, we use head motion to resolve ambiguities: a faint, low sound could be either in front or back of us, so we quickly and sub-consciously turn our head a small fraction to the left, and we know if the sound is now off to the right, it is in the front, otherwise it is in the back. One of the reasons why interactive audio is more realistic than pre-recorded audio (soundtracks) is the fact that the listeners head motion can be properly simulated in an interactive system (using inputs from a joystick, mouse, or head-tracking system).

5.4 The Vortex A3D Silicon Engines

Aureal has developed a line of PCI-bus based digital audio chips called Vortex. These chips, among many other features, contain silicon implementations of A3D algorithms, including HRTF and Wavetracing rendering engines. Vortex is a no-compromise PCI audio chip architecture. It takes true advantage of the PCI bus by streaming dozens of audio sources to on-board audio processing engines: A3D, DirectSound, Wavetable synthesis, legacy audio, multi-channel mixers, sample rate converters, etc. Vortex delivers highest quality A3D capabilities for sound cards and PC motherboards at maximum price/performance points.

6. ADVANTAGES OF A3D AS ILLUSTRATED BY RESEARCH FINDINGS

Results from decades of psycho-acoustic research on binaural audio offer scientific explanations of why real-time binaural audio technologies such as A3D are highly effective in a range of applications.

6.1 Binaural Gain

Probably the single most important fact about binaural audio is that if an audio signal is played on top of white noise it will appear 6 to 8 dB louder if that signal is a binaural signal versus a non-binaural signal. This means that the exact same audio content is more audible and intelligible in the binaural case, because the brain can localize and therefore “single out” the binaural signal, while the non-binaural signal gets washed into the noise.

6.2 The “cocktail party effect”

At a cocktail party, a listener is capable of focusing on a conversation, while there are hundreds of other conversations going on all around. If that party was recorded and then played back using a regular mono or stereo procedure, all the conversations would be combined into one (mono) or two (stereo) locations. The result would in most cases be unintelligible. With a binaural recording, or recreation of that party, a listener would still be able to tune into and understand individual conversations, because they are still spatially separated, and “amplified by” binaural gain.

6.3 Faster reaction time

In an environment such as a jet cockpit, where a lot of critical information is displayed to a user, reaction time is crucial. Research documents that audio information can be processed and reacted to more quickly if presented in binaural form, because such a signal mirrors the ones received in the real world. In addition, binaural signals can convey positional information: a binaural radar warning sound can warn a user about a specific object that is approaching (with a sound that is unique to that object), and naturally indicate where that object is coming from.

6.4 Less listening fatigue

Phone operators that listen to a mono headphone signal all day long, experience listening fatigue. If those same signals are presented as binaural signals, listening fatigue can be reduced substantially. Humans are used to hearing sounds that originate outside of their heads, as is the case with binaural signals. Mono or stereo signals appear to come from inside a listener’s head when using headphones, and produce more strain than a natural sounding, binaural signal.

6.5 Increased perception and immersion

Some of the most interesting research into binaural audio shows that a subject will consistently report a more immersive, and higher quality (“nicer” colors, or “better” graphics) environment when visuals are shown in synch with binaural sound, versus stereo sound, or no sound at all.

7. SUMMARY

For well over ten years, real-time binaural, or “3D”, audio technology has been the subject of intense research and development in the psycho-acoustic research community. The findings of a large number of research studies indicate that interactive 3D audio is an important technology that enables an entirely new level of audio experience: a three-dimensional sound field is created in real-time to continuously envelop a listener. The listener is no longer aware of the audio system that is rendering the sounds – the application communicates directly with the user, creating levels of awareness, realism, immersion and increases in reaction time and communication of audio information previously only possible in real-life situations.

Besides understanding real world sounds and the hearing process, the biggest challenges associated with building an effective positional 3D audio solution are:

1. The measurement and operation of exact HRTF filters.
2. The development of efficient, high quality algorithms that allow for real-time rendering of a 3D soundfield using minimal computational resources.
3. The deployment and support of a technology enabling API into the application development communities to ensure proper software content support.
4. The development of feature and cost competitive silicon engines to enable products based on the technology.
5. The definition and launch of a successful consumer brand and consumer products that will get the technology into the hands of the end-users.

A3D has mastered all of the above challenges. A3D is based on the world’s most advanced algorithms and HRTF measurement and compression techniques, that have been developed in high-performance, mission-critical application areas such as NASA simulators, jet fighter cockpits, and Virtual Reality systems. Aureal has created free software tools, SDKs, and APIs and evangelized them successfully to over 100 top tier PC software development houses. Aureal’s breakthrough Vortex PCI audio chips render A3D on dozens of new sound cards and PCs. Finally, Aureal has created the A3D brand that is actively promoted with a simple message: if you take a software application with the A3D logo on it, and a sound card or PC with the same logo on it, they will combine to deliver the most amazing, immersive and realistic interactive audio experience.

c. 1998, Aureal Corporation.
From Aureal Corporation Site. (Republished with permission.)