CAS Quarterly

Winter 2018

Issue link: https://digital.copcomm.com/i/933783

Contents of this Issue

Navigation

Page 33 of 55

34 W I N T E R 2 0 1 8 C A S Q U A R T E R L Y Figure 5: A section of spectrogram in Audionamix's TRAX Pro 3. The heavy blue and red lines are speech fundamentals. Separated noise and dialogue envelopes are across the top. Hands-on. This is not a full review, but rather a first glance at differences between the products. The biggest difference is the sound—they do their jobs differently, so we've run test clips and posted the result. But the products also have different interfaces, and different computing requirements, which we discuss. They also have one important thing in common: These are not like the multiband expanding noise reducers you're already used to. They don't use thresholds set by a noise-only sample, and don't rely on psychoacoustic masking to bury noise when there's something going on in dialogue. In fact, either one can create two separate stems, one for dialogue and one for non-dialogue. A noisy recording—or one with audience reactions or music—might have things going on in both stems at the same time. iZotope's Dialog Isolate and De-rustle modules (figure 4) are part of their RX 6 Advanced package, and use a spectrogram and multiple previews similar to previous RX versions. Isolate has three knobs: wide-range volume sliders for Dialog and Noise, and a Separation Strength control that sets how strictly the process defines "dialogue": higher values will trap more noise, but are more likely to add artifacts to speech. De-rustle, which has been specifically trained for wardrobe sounds picked up by a lav, has only two knobs: Strength (similar to the one in Dialog Isolate) and Ambience preservation to keep the lav from sounding too sterile. RX 6 runs completely in your local computer, and processing time for these modules is similar to that of other processor- intense RX modules: a 45-second sample of exterior dialogue with street noises took about 21 seconds 8 . If you change the settings, it has to run the process again. Audionamix's TRAX Pro 3 is a single-purpose program with a lot more control than the iZotope module. When you open an audio clip, it sends compressed spectral data to Audionamix's powerful servers via internet. The servers perform the separation and send resulting spectral data back to your computer. The program attempts to identify all the speech fundamentals automatically (heavy blue line in the screenshot). If some speech gets missed because it's atypical or there's too much competing noise, you can add your own identification (red line), helped by an automatic tracing tool. Consonants are a special case because of their high- frequency energy; any that get missed can be marked by the user. TRAX Pro 3 also has simpler versions: the interfaces are somewhat similar to iZotope's Dialog Isolate, but their engine also relies on processing at Audionamix's server. Our 45-second sample took about 34 seconds to separate. If you change the fundamental line, it has to run the process again. 8 Times were tested on a 3.2 GHz Quad-core Mac Pro, with a 95 Mbps symmetrical internet connection where needed. They'll be different on your computer, and things will probably speed up in future software versions. Figure 4: Dialog Isolate window in iZotope's RX 6 Advanced, shown against a section of spectrogram and the Compare window. These latter two functions work like their counterparts in previous RX versions. We've posted short samples of exterior city dialogue, exterior dialogue in a busy woods, and concert hall performer dialogue with mixed audience reactions and simultaneous acoustic guitar strumming, in original production versions and processed through Audionamix and iZotope software, at http://cinemaaudiosociety.org/over-the-net/. Acknowledgements. Thanks to Audionamix's lead researcher François Rigaud and iZotope principal DSP engineer Alexey Lukin for each spending more than an hour with me explaining their techniques, walking through their processing, and reviewing the manuscript. To DSP guru Dr. Barry Blesser, who gave me important insights into the network training process. And to re-recording mixer Stephen Fitzmaurice CAS, who walked through this neural network wonderland with me. Despite all their help, however, I take full blame for any inaccuracies or sloppy writing. The audio samples are original production recordings from Tom Rush: No Regrets, courtesy of BlueStar Media and Ezzie Films, © Ezzie Films LLC and used by permission. Article text and illustrations © 2018 Jay Rose, except the drawing of a human neuron (derived from Juoj8 in Wikimedia Commons, and available under Creative Commons License). To learn more about neural networks, and to try training one of your own with voice samples, you might try downloading University of Amsterdam's free Praat software (Mac/Windows/Linux at www.praat. org). It's a comprehensive tool for speech processing research, and comes with a very informative set of online and web-based help files. •

Articles in this issue

Links on this page

Archives of this issue

view archives of CAS Quarterly - Winter 2018