CineMontage

Q1 2022

Issue link: https://digital.copcomm.com/i/1456671

Contents of this Issue

Navigation

Page 37 of 47

38 C I N E M O N T A G E T E C H Alex Serdiuk. Abigail Savage. target voices. For instance, if a deep male voice option needs to be a bit deeper, that can be adjusted prior to the conversion. The conversion takes a few minutes as the model analyzes the input voice and the target voice, and then renders the output. The take(s) can then be downloaded as 48k/16-bit .wav files. For supervising sound editor/sound designer/actor Abigail Savage at Red Hook Post in Brooklyn, NY, Respeecher's Voice Marketplace and TakeB aker tool have been helpful for creating background vocal textures for film projects. For instance, if a scene has a group of people chatting together in the background, Savage would typically record a loop group, or record a few people from around the studio, depend- ing on time and budget. In the latter case, though, she's limited to the range of voices available in the studio at that time. "Instead, I've been using Respeecher to perform the lines. I can record both sides of any conversation and then go through their AI voices to find ones that fit. I'm creating my own loop group that way. It's incredibly fun," said Savage. "You're com- pletely controlling the performance, too. You're not trying to tease a performance out of somebody when you know exactly what you want. You can just record yourself saying it and then transpose that onto a different voice." Savage noted that while the technology has improved since she started working with Respeecher, it's still not perfect. "Ini- tially, there was a performative flatness. The target voice I was leaning towards wouldn't necessarily sync with the per- formance I was giving it. There is still a disconnect there, but it's something that's already gotten better with updates," said Savage. "Another issue is the amount of time it takes for the conversion — how long it takes to generate a voice. Making that workflow faster is something that's constantly improving, too." According to Serdiuk, Respeecher is currently working on a possible standalone app, or plugin for Pro Tools or Audacity. For Savage, having a Respeecher plugin that could be inserted on a track, with real-time voice transformation, would be ideal. "The next best thing would be a solution similar to iZotope's RX Connect, which allows you to send clips from Pro Tools to the standalone application for processing," noted Savage. Respeecher does offer a text-to-speech option — available in TakeBaker — but unlike speech-to-speech, it doesn't have an emotional range. Another issue with t e x t - t o - s p e e c h i s t h a t i t 's l i m i t e d t o language models and vocabulary. For text- to-speech options, Respeecher offers four accents: US English, GB English, CA French, and FR French. Sonantic Sonantic's main approach to AI voice generation is through text-to-speech. If you're imagining a robotic readout, then you're miles off the mark. Sonantic offers "fully expressive voice models," meaning you can choose from a range of voices and then change the emotion of the read — like "happy," "sad," "fear," "shouting," and "an- ger" — and alter the emotional intensity of the read — low, medium, and high. There are two options for using Sonan- tic. The most basic option is to license the software and use Sonantic's desktop appli- cation to choose from a variety of pre-made voice models on the platform. Option two is to work with the Sonantic team to create a custom voice model. According to John Flynn, Co-Founder and CTO of Sonantic, "We're actually in the process of building out plug-ins for various audio softwares and game engines and look forward to shar- ing these with our customers." Inside the application, text is converted into AI-generated voice files that appear on a timeline. In the upper area of the application window, you can choose a voice model, type what you want it to say, choose the emotion and intensity of the read, and also adjust the pace of the read to control timing, rhythm, and emphasis of particular words or phrases. The audio editing area in the lower por- tion of the window looks like a traditional Digital Audio Workstation (DAW). You have the ability to zoom, trim, and ripple edit the AI-generated voices as sound files. Sound files can be exported as .wav files in 44.1k/16-bit or 48k/24-bit, with higher sample-rate options coming later this year. Flynn said: "Text-to-speech is great for two things: speed and experimentation. Sonantic can batch-generate thousands of lines of dialogue and export them into different file names and folder structures very quickly. Text-to-speech also allows o u r c u s to m e rs to t r i a l d i f fe re n t vo i ce models to see which fits a character best." It's an approach that works well for game developers and animation studios during

Articles in this issue

Archives of this issue

view archives of CineMontage - Q1 2022