CAS Quarterly

Page 31 of 55

32 W I N T E R 2 0 1 8 C A S Q U A R T E R L Y C B E F A D 7 8 9 .4 -.7 .8 .3 .2 .-.4 -.2 .9 .3 Figure 3: Weightings in the network. the one described below. In these audio networks, each output neuron represents resulting energy in a single band. For example, if the network has been trained to discriminate dialogue from music, the output could be thought of as a spectrogram representing just the dialogue. Or we can subtract dialogue values from the input, and get just the music. While we don't know what the hidden layers are doing, how they're connected is critical. There are a lot of connections: every neuron in one layer passes data to every neuron in the following layer. As the network learns a task, each connection may handle that data slightly differently. Making connections. Human neurons are connected by synapses. When humans learn something, the synaptic connections between some neurons get strengthened. The next time we encounter the same stimulus, those neurons are more likely to communicate. In a neural network, synaptic strengthening is replaced by stored numeric weights for each connection. These are multipliers, usually a fraction between -1 and +1. Each neuron in a layer can have a different weight assigned to each connection from the previous layer. It's easy to understand this process by showing typical weighting for a portion of our diagram. We've assigned colors to make the paths easier to follow. In figure 3, neuron A is firing and outputting a one. It passes its 1 to neurons D, E, and F. But each branch is multiplied by a different weight: neuron D gets .4 (that is, the product of 1 x .4). Neuron E gets -.7, and so on. Different weights, shown in different colors, affect the signals from neurons B and C. Neuron C isn't firing at all for this particular input—that's why it's drawn as a darkened circle—and sends a zero. So even though it's got its own weightings, they get multiplied by zero. No signal is passed from C to the next layer. The resulting math is grade school simple: • Neuron D sees the .4 weighting from neuron A, and the .3 from neuron B. (It ignores the zero coming from neuron C.) The sum is .7, so it fires and sends a one to the weightings for the next layer. • Neuron E sees -.7 from neuron A, and .2 from neuron B. These add to -.5. That's less than 0, so E doesn't fire. It has zero output. If the input changes over time, like audio does, you need to add temporal factors. One way is to process the input with overlapping windows, each containing a few milliseconds' worth of audio. Or you can let the weighted neural outputs ramp relatively slowly between values, adding a "reaction time" when the signal changes. Or you can give the neurons memory, storing values from one time-slice to be considered during the next. Dendrites Axon Terminals Inputs Outputs Human Neuron Computer Neuron Algorithms can be powerful. But they need precise definitions of what you're doing, with a step-by-step signal flow. Some processing chores don't lend themselves to that approach. For example, you might want a processor that identifies and reduces the ers and ums of natural ad-libbed speech, without affecting similar sounds that are part of a word. Even if you could write rules covering all the possible variations to this problem, there'd be so many rules that the algorithm would be too unwieldy for most studio computers. Compare that with how human dialogue editors approach the task. They identify those vocal fillers intuitively, replace them with room tone as necessary, and never even think about an algorithm. They don't use a flow chart; they use their brains and experience. Neural networks let the machine also have a "brain." How to build a brain. First, a reassurance. Neural networks aren't the "Artificial Intelligence" of sci-fi. They need rigorous training sequences. Your nifty new processor won't decide on its own to Kill All Producers 3 . But their basic building block is very similar to a human neuron 4 . Both can have multiple inputs. Both generate an output based on the combined states of these inputs. In computer neurons, all the inputs are summed and the result compared to a threshold (usually "is the sum greater than zero"). If the sum reaches the threshold, the neuron outputs a binary 1. If the sum doesn't reach it, the output is 0. While neural networks are computer code, you can visualize them as a matrix of neurons organized into columns—called layers—and rows, as in figure 2. The input layer receives information about the signal being considered. The processors in this article have multiple neurons in the input and output layers because they look at a signal that's broken into frequency bands, and each band goes to a different neuron. (iZotope's processors use 1024 bands.) Input Hidden Layers Output more rows… more layers… Figure 2: Signal flow in a simple neural network. The numbers then pass through internal columns of neurons before reaching the output. These are called hidden layers because the programmer doesn't determine how their neurons respond to any input. The software itself defines those reactions through a training process, like 3 And it's going to be a very long time before they can attempt even simple dialogue editing with the judgment a human provides. 4 To make the distinction, some engineers refer to the computer ones as "artificial neural networks."

Articles in this issue

Archives of this issue

view archives of CAS Quarterly - Winter 2018

Winter 2018

Contents of this Issue

Navigation

Page 31 of 55

Articles in this issue

Archives of this issue