by Mike Sokol
As regular readers of this column know, I’m a big advocate of producing more realistic sound for video and film. I’ve experimented with stereo miking techniques for the last 20 years in order to improve the quality of my recordings, and also designed and installed satellite surround systems for live performance. I’ve even experimented with binaural and discreet 4 channel mixes for my own sadistic aural pleasures. So when the chance was offered to play with some processing gear claiming to bring most of the benefits of surround sound from only two normally positioned loudspeakers, as they say: I’m there, Dude….
We are creatures of stereo. Two ears gives us the ability to localize the position of a sound source. This is an important survival tool for helping to avoid predators either in the jungle or the business world. Here’s how it works:
Timing Cues: Since our ears are spaced about 8 inches apart, there’s a natural delay between the sound that strikes each side of our head. For instance a sound source located directly on your right side will be delayed by up to 1 millisecond on its trip to the left ear.(Sound travels about 1132 feet per second in air) Our brain then does some pretty fancy calculations to say: “hey guy…. that growl came directly from the right…. we better look to see what’s up before we get eaten…..”
Level Cues: The second set of stereo cues is the relative volume of a sound in each ear. If it’s very loud in the left ear, and soft in the right ear, even though the sound arrived at both ears simultaneously, our brain will interpret the sound as coming from the left side. You can move the apparent source of a sound from the left side to the right side of your head, just by changing its relative level in each ear.
Equalization Cues: The third set of stereo cues is the frequency balance of the sound as it arrives in each ear. You may have noticed that our ears point forward. Sounds sources directly to the front or rear of our heads arrive at the same time in each ear, but frontal positioned sounds are directed into our eardrums by the cup formation of our external ears, while sounds from the rear receive a natural high frequency roll off. In fact, some acoustic models of the head suggest that the ridges in our external ears act as comb filters of sorts, causing all kinds of ripples in the frequency response as a function of the angle of sonic impact. This also helps with vertical specialization, so you can tell if the sound source is up a tree or not.
Room Acoustics: To top it off, we also subconsciously build a model of the room we’re in. Sounds not only hit our ears directly, they bounce off of various surfaces in the room around us and are either dispersed or absorbed in varying degrees before reaching our ears multiple times. These multiple reflections are what give a sense of “air” or “presence” to the world around us.
All of the above:
It’s the interplay of these effects that produce the sonic field we are always surrounded by. Normally we can’t isolate any single effect, except under very special circumstances like headphones and anechoic chambers.
Notice in Figure 1, there are at least four different paths the sound can take to our two ears. And that’s just for one instrument in the two dimensions illustrated. In reality , there are hundreds of reflections from a multitude of surfaces in all 3 dimensions. As complicated as this is in reality, we donít have to model the whole process to make good stereo. We can cheat!
How Does Two-Channel Stereo Work?
Most of the above effects aren’t necessary to fool the brain into thinking that a stereo sound field exists. We can use a few tricks……
So how do you produce a regular stereo music track? We’ll it’s hard to believe, but most stereo music relies on only #2 above, level cues. Here’s the procedure: You have 4 tracks on your tape deck, so you put one mike in front of each musician, let’s say a guitar, a harmonica, a tuba, and a vocalist. You record all four tracks separately with as much separation as possible. Each track now contains the sound of an individual instrument, usually monophonic, even if the final mixdown will be stereo.
Not everyone likes the quasi-stereo sound of a bunch of single tracks panned to produce a 2 channel mix. Some of the most realistic recordings you can make use just two microphones spaced a few inches apart that capture the multi-reflection soundfield in total. For acoustic instruments this is still the method of choice, but it doesn’t allow for the fixes we’ve all grown to depend on: overdubbing, single track punch-in, and final mixdown with individual track processing. Plus it’s heavily dependent on perfect room acoustics. Instead of a bunch of microphones closely positioned next to the instruments, those two mikes may be up to 10 or 20 feet away from the stage. ANY room problems will be heard and impossible to “fix in the mix”. Still, for those with the resources, it’s a rewarding medium to work in.
Enter 3-D Processing
3-D processors such as the SPATIALIZER do a little mind game by inserting reflective cues into the sound stream. They work, in part, by using a DSP chip to produce a cancellation signal based on the physical model of a typical human head (to help overcome left to right ear leakage), and then calculate timing cues for a variety of sound source locations. The effect can vary from a simple widening of the apparent sound field beyond the boundaries of the stereo speakers, all the way to nearly binaural effects, where front to rear localization is apparent. With the proper program material you can literally fly things around the room. Processors such as the SPATIALIZER were originally designed to be used in a studio environment, where you would encode an audio piece for playback on standard radios and televisions.
There are multi-channel SPATIALIZERS that allow you to position multiple objects during mixdown with full MIDI control of the position, and consumer-oriented stereo units that simply provide a ěwideningî of existing material. These latter units are being produced as an OEM chip, to allow Spatializer to be designed into consumer products such as VCRs and Boom Boxes. Other devices such as the ConvolvoTron from Crystal River Engineering rely on a computer card with it’s own DSP chip onboard that allows a computer program to determine the 3D localization of a wavetable or synthesized sounds in real time. These effects are best heard binaurally over headphones, which is great for “virtual reality” type programs. For instance, a video game can have an explosion take place next to your left ear or a rocket fly around your head, if desired by the programmers. Properly implemented binaural stereo can be so realistic that many people will tear off their headphones in fright the first time a missile passes through their head. Pretty cool stuff……
How Do They Sound?
A Spatializer demo unit was sent to me a few weeks ago for testing, with some interesting observations. First, the program material’s method of mixdown seems critical to the overall believability of the 3-D processing. On traditional multi-track recordings that have been panned to stereo, the effect is very impressive. Stereo reverb (with multiple reflection paths) is widened considerably, while dry vocals still stay locked to center. This made some of my lead-vocal / harmony mixes change from their original balance, but it was a good thing. It sounded like the lead-vocalist was more “up front” and the backup singers in a separate area behind the speakers. There was an apparent widening of the sound stage, not conciously noticable while listening to the effect, but when the Spatializer was switched out of the sound chain, the stereo image seemed to collapse to between the speakers. This is how good effects should be, you really miss them when they’re gone!
Two-microphone acoustic recordings didn’t do as well. Last year I was contracted to make a two-track to DAT recording of the 2nd Maryland Fife and Drums Corps utilizing a pair of Nuemann mics spaced about a foot apart. This particular recording has received much comment for it’s spatial realism. When I turned up the Spatializer, the original sound field smeared and lost focus. There was also an apparent mid-range boost affecting the timbre of the instruments which I didn’t notice with any of the multi-track / panned recordings. Viewing an oscilloscope set to lissagous mode (a practice I normally use to look for out-of-phase programs) showed a large amount of phase-shifted content, an effect that’s perfectly normal for sounds that have been delayed and recombined with the original source. In this case, the processor seemed to exaggerate the room-ambiance to direct-sound balance, a balance Iíd achieved after several hours of trying different positions for the microphones and the musicians. What do I think about the whole effect of spatially-enhanced material over stereo speakers vs. multiple-channel surround speakers? Much of this is subjective, so a psychoacoustic test was performed on a number of subjects. Read the side bar for the results of this test along with some opinions from my musical and audiophile associates.
c. 1995, Mike Sokol
From Sound Advice. (Republished with permission.)