Issue link: https://digital.copcomm.com/i/997232
e d i t i o n 2 , 2 0 1 8 | c g w 3 3 shop. Nothing was really hitting the mark for exactly what I was looking for." FuseFX already had a Qumulo File Fabric (QF2) cluster on premis- es. QF2 is a modern, highly scalable file storage system. It can scale to billions of files, handles small and large files with equal efficiency, and gives administrators real-time insight and control. Fotter had spoken with Qumulo about his need for a cloud-based solution. When he learned that the company was working on extending QF2 to AWS, Fotter jumped at the chance to try it out. The team experimented with a single instance early on and liked what they saw. When the four-node cluster became available, he was ready to integrate it into his production workflow. The Test of The Tick The QF2 cluster was put to the test when the company was working on an episode of The Tick. Fotter describes the situation: "Our pro- cess is that people work during the day, submit their jobs, then we render overnight. When they come in the next day, they look at the frames, evaluate where they're at, and either send it off to the next task or they might decide they need to re-render something. "And again, we only have two to three weeks for a single episode. We oen start a project close to the delivery of the first episodes. We don't have a lot of time to waste. If we have a problem, it's always a critical problem. We came in one morning and discovered there had been problems overnight. There must have been 50 jobs queued up that hadn't rendered a single frame. The stress level of the production team was pretty high at that moment. We had been targeting 1,000 machines as a maximum target for capacity. I knew that a moment would come where we would want to burst that high, and it was apparent that now was that time. Each EC2 Spot instance was 32 cores, so that's 32,000 cores at one time!" Fotter told his render wranglers that if they had a frame to render, to turn on a node for it. "Just get it done," he recalls saying. "We knew that with QF2, we would be able to support that kind of throughput. And we did it. We got the frames rendered in the cloud and got them back down on premise." He says they were actually rendering so fast that the bottleneck was getting the frames back from the cloud cluster. "We saved ourselves. That's actual proof that the solution works. There's no possible way I could install 1,000 machines in our network here. I don't have the power or cooling to support them," says Fotter. "We were able to make the decision, and in less than one hour be rendering on 1,000 machines. Aer the jobs finished, we simply terminated the instances. When I think about how easy it was, it still doesn't sound real." Chris Leslie is the supervising systems engineer at FuseFX. To quantify QF2 performance, he offers the following: "At the peak we saw 40,000 IOPS. The highest throughput was 3.87GB/sec." The Pipeline Besides QF2, the FuseFX pipeline uses EC2 Spot Instances for scal- able, low-cost computing, Deadline for queue management and man- aging bids for the spot instances, Thinkbox Marketplace usage-based licensing (UBL) for flexible licensing, and V-Ray for rendering. Fotter explains how the UBL store works. "If you exhaust your local licenses, you can purchase per-minute or per-hour licenses of Dead- line and V-Ray. Once your local license limit is reached, the soware sends those requests to the store, monitors the usage, and deducts from that time. It's like a calling card. You buy a calling card with an hour of calling time on it and every call you make deducts from that." Everything is coordinated by the on-premise server, which is connected to the cloud instances with a VPN. Once it's synchronized to the QF2 cluster in AWS, rendering can occur both locally and in the cloud at the same time. A local ma- chine can, for example, pick up the first frame, and a cloud node can pick up the second frame. Deadline manages the distribution so that the cloud is simply an extension of the on-premise renderfarm. FuseFX is still working on automation. Leslie explains, "We use a custom AMI that has some internal automation. For that, we use CloudFormation. It gets itself on the network, mounts the Qumulo storage, sets up the Deadline slaves, and a few other things. Right now, we start and terminate the QF2 instances manually." Fotter adds, "If we have a long-term timeframe where we know we're not going to use QF2, we terminate it and we tell the Qumulo support team. We've learned that we should tell them when we're turning it off because they monitor it so nicely that, otherwise, when we do termi- nate it, people start calling me to tell me my cloud cluster is down." Lessons Learned Fotter has learned quite a bit since FuseFX first began using the cloud. He explains, "Getting the workflow right is the biggest chal- lenge. Rendering is complicated, and visual effects is an inherently inefficient process. The more that you can create efficiencies in the workflow, the better off you're going to be." Solving the data synchronization issue is the hardest part, contends Fotter, because render jobs require a lot of assets, textures, geometry, simulation caches, and whatever else you need to create the final image. When you're rendering in the cloud, if you're missing one little texture and that job renders incorrectly, you've wasted all that money, he adds. "We've gone through those pains," Fotter notes. "We've learned the hard way, but being committed to the process and knowing that you can create a solution has always been my focus. So, to boil it down, my advice is to test it. Come up with a plan, test it, be committed to it, and really understand your workflow from start to finish." Fotter also affirmed the importance of file-based data to his workflow. "It would be nice to be able to use object storage, but we don't have a single product in our environment that uses it. It doesn't make sense. We're a file-based workflow. That's the way the visual effects process works. We have a large amount of files on a file sys- tem. We read them. We pull them into our applications," he says. "We work on them. We do our creative work, and we create more files." Files are the medium of exchange between applications that were not necessarily written by the same company. How do you get something from the animation package into the rendering package? Those are two different disciplines, two different areas of focus, so you must create workflows that integrate across applications, and a file is the way to do that, according to Fotter. "It follows then, that without a high-performance file system in the cloud, our workflow would be impossible," says Fotter. "QF2 is at the foundation of our AWS storage solution. Without it, we wouldn't be able to expand to the capacity that we have."