CineMontage

Q1 2022

Issue link: https://digital.copcomm.com/i/1456671

Contents of this Issue

Navigation

Page 36 of 47

37 S P R I N G Q 1 I S S U E T E C H "Roadrunner: A Film About Anthony Bourdain" used AI-crafted dialogue. Al Nelson. subtitled, director Morgan Neville wanted it to be read because it was important to the opening. We had gotten a few words that were AI-reproduced and we plugged them in to our edit. It was about trying to make it a little bit more believable, to engage the viewer and not have it feel disjunct. Morgan's intention was to immerse the audience, not to mislead them. This can be a useful tool when it's not used in an oppo- sitional way. And in this case, it was useful." As author J.K. Rowling via Albus Dumb- ledore has said, "Understanding is the first step to acceptance." So let's dive into two leading AI voice generators being used in post-production — Respeecher and Sonan- tic — to understand how they work, how they're being used, and how they might play a role in the future. Respeecher I n a d d i t i o n t o " I n E v e n t o f M o o n Disaster," Respeecher was used to create the voice of Young Luke Skywalker for the Disney+ series "The Mandalorian," and for the voice of Vince Lombardi in last year's pre-Super Bowl commercial "As One." It was most recently used to clone the voice of famous Puerto Rican sportscaster Manuel Rivera Morales for a Medalla commercial. Respeecher's API is used as part of Ver- itone's MARVEL.ai platform. And their project list keeps growing. Respeecher's main approach to AI-voice generation is through speech-to-speech. This means a line of dialogue is recorded in one voice and then converted to a target voice. The benefit of speech-to-speech is that emotional nuance, inflection, pacing, and projection level (whisper to shout) are directly imparted to the target voice. A l ex S e rd i u k , R e s p e e c h e r C EO, ex - plained that their deep learning models were built to understand the differences be- tween voices in the acoustic domain. Their models compare timbres between two voices, so that one voice is literally driving another. "Respeecher is not tied to any specific vocabulary. That's why our models are emotional to the extent that humans are emotional. Your performance isn't being touched or changed during the conversion. Only the vocal timbre is being changed. So you can perform the emotion you need but your voice will sound very different, like you're using different vocal chords," he said. To create a voice clone, Respeecher needs roughly 40 minutes of dialogue from the target voice. The emotional range in that dataset should be similar to the emo- tional range needed for the conversion. "For most of our projects, we've had to deal with existing data because we've done quite a bit of voice de-aging and voice resurrection. So we were limited to what data has already been recorded," said Serdiuk. When creating a voice clone from archi- val material, Respeecher prefers to handle the cleaning and de-noising, "because we know how our models react to this type of processing," Serdiuk explained. They use a variety of audio restoration tools, from third-party noise reduction solutions like iZotope RX to proprietary tools for dialogue enhancement. It's a process that's continu- ously improving. "Many of the projects we worked on in 2020 had a very slight tape hiss on the output, but on a project we deliv- ered this December, we were able to remove it all. The target material sounded like it was recorded yesterday and not 40 years ago." For a simpler use of Respeecher (for those not looking to clone a voice), they offer a Voice Marketplace and web browser application called TakeBaker that can be licensed for a reasonable monthly (or yearly) fee. The Marketplace offers over 40 target voice options across a range of ages, different genders, with even non-human options such as dogs. Since TakeBaker is a web browser appli- cation, you'll need to connect a microphone to your computer and perform the line(s) yourself, or upload short dialogue clips (as .ogg, .wav, .flac, or .mp3) that you want to convert. Again, performance is key. There are currently no editing options in TakeBak- er, so if you don't perform the line perfectly, then you'll have to re-record it. A u d i o q u a l i t y i s i m p o r t a n t , to o. A good-sounding mic, proper input levels, and a quiet recording environment will result in a more accurate conversion. Pitch adjustments can be made to the

Articles in this issue

Links on this page

Archives of this issue

view archives of CineMontage - Q1 2022