Computer Graphics World

November / December 2017

Issue link:

Contents of this Issue


Page 20 of 35

n o v e m b e r . d e c e m b e r 2 0 1 7 c g w 1 9 S P E C I A L S E C T I O N : W O R K S T A T I O N S Turbo frequency capable of 4 tflops, and 112 threads. Four times faster than the fastest supercomputer just 20 years ago for less than one ten-thousandth the cost, and uses less than one-thousandth the power. Not only that, almost anyone can use it; conversely, very few people could use that magnificent ACSI-RED – which, by the way, is still working, giving the US taxpayers a pretty good ROI. What's Possible? We made a comparison of professional work- station workloads. Compared to a four-year- old E5-1680v2, 3.9 ghz-based workstation, the new Skylake-based Xeon W provides an average of 87 percent more performance. That kind of improvement would be fine if all you wanted to do was render faster, or maybe load files faster, but the real payoff comes in being able to do what you couldn't do before. Famous computer graphics scientist Jim Blinn has an adage: "As technology advances, rendering time remains constant." The point being, artists and directors, for example, aren't trying to reduce the time to produce a movie (although their bosses tell them that should be their goal), but rather, they want to make the most beautiful movie they can. The same is true for engineers who are running simulations on ever more complex parts. Each time Intel raises the performance and the number of threads that can be pro- cessed simultaneously, sim users celebrate. Why? Because they can make their model more fine-grained. Finer granularity in an FEA simulation makes for a more reliable, more efficient, and more fine-tuned final product. Consider what that means when designing an airplane wing strut: a lighter and stronger airplane that is not only more durable, but also more fuel-efficient. The Age of Threads It's taken a long time, but the benefit of parallel processing is undeniable – per- forming processes simultaneously provides huge gains in productivity and accuracy. The problem has been the legacy apps that simply couldn't be threaded and recom- piled. Slowly, the industry has built new apps with threading as an intricate part. And ironically, the ISVs doing that haven't made any fanfare about it, it has just been a kind of given that any new app naturally would be multi-threaded. The good news/bad news is if a user is stuck with old-fashioned, single-threaded legacy soware, the gains from a new pro- cessor and GPU are going to be slight. This is due primarily to clock speeds (CPU, GPU, and memory). But few users are only using one application, and most apps (except some de- veloped in-house) have been upgraded and/ or replaced completely with new versions. When you consider the new Intel Xeon SP (Scalable Performance) Platinum series Skylake processors with 28 physical cores, running up to 3.8 ghz Turbo frequency that is capable of 2 tflops across 56 threads, and then double that in a dual-socket system, you have 112 threads at 3.8 ghz approaching 4 tflops. It's almost unbelievable. Drop a modern add-in board into the system, such as a GPU designed for compute, and you have a theoretical 16 tflops in a system that can fit under your desk, use conventional wall socket power, and doesn't require any extra air-conditioning. Oh, and the whole thing would cost under $15,000. Performance at Hand In terms of performance, benchmarks tell part of the story. Intel will tell you one can get a 300 percent performance improve- ment over a machine that is four years old (based on best-published two-socket SPECfp_rate_base2006 result submitted to/published at results/ as of July 11, 2017), or an 80 percent improvement from the last generation to this one, based on the same data. And that's all true. It just may not apply to you. Every user has his or her own workload, so the best that benchmarks can do is give an indication of what a person might achieve. However, over the years, I have yet to hear people saying they didn't get their money's worth by getting a new workstation. The math is simple: Do more, or better work, in the same time, and calculate that against the cost of an engineer doing the work. Furthermore, the generational differenc- es are impressive and illustrate what you can do when you make billions of tiny tran- sistors available to computer architects. However, as mentioned above, it's the application of all those speedy little tran- sistors that is the real magic and primary benefit to users and organizations. Final Thoughts Workstations don't break and aren't cheap, so they don't get replaced every year, or even every other year. In fact, they seldom get replaced more oen than three or four years, and only then if there is a significant improve- ment in an application and/or the hardware. Although Moore's law has been fairly predictable over the last 40 years, with the move to 14nm processes, there is more be- ing accomplished than just clock speed- ups. With a smaller feature size, more transistors can be stuffed in a chip. When that is done, more functions and faster, wider communications are realized, as well as specialized capabilities such as security, AI, and power management. Intel has always been a leader in process technology and, therefore, in a perfect place to recognize and exploit the inher- ent opportunities of compute density and throughput. The Skylake processor is the latest instantiation of that skill, and the users are the beneficiaries. Jon Peddie ( is president of Jon Peddie Research, a Tiburon, CA-based consultancy specializing in graphics and multimedia that also publishes JPR's "TechWatch." CPU TDP (with IVR) Spec Socket Scalability Cores Memory PCIe PCH Entry Workstation Greenlow with Kaby Lake CPU Professional Workstation Basin Falls with Skylake-W CPU Expert Workstation Purley with Skylake CPU Up to 80W Socket FCLGA1151 1S Up to 4C (w/ GT2 & GT0) with Intel HT Technology 140W Socket CLGA2066 1S Up to 18C with Intel HT Technology 70-205W Socket FCLGA3647 2S Up to 28C per socket with Intel HT Technology 2 channels DDR4 UDIMM ECC, SODIMM ECC 2400 2DPC 4 channels DDR4 RDIMM, LRDIMM Up to 2666MT/s DDR4 6 channels per socket, up to 2 DIMMS per channel RDIMM & LRDIMM support Up to 2666MT/s 16 lanes Bifurcation support: x16, x8 PCH: Up to 20 lanes 48 lanes Bifurcation support: x16, x8, x4 PCH: 24 lanes, 16 ports (6 controllers) 48 lanes per socket Bifurcation support: x16, x8, x4 PCH: Up to 20 ports PCIe 3.0 Intel C230 series chipset (Skylake PCH: SATA Gen3 – 6 lanes USB 3.1 Gen1 – Up to 10 lanes, DMI-x4 Gen3 Intel C422 chipset (Kaby Lake: SATA Gen3–8 Ports USB 3.1 Gen 1–10 Ports, DMI – x4 Gen3 Intel C620 series chipset (Lewis- burg): SATA Gen3 – Up to 14 ports USB 3.1 Gen 1 – Up to 10 ports, USB 2.0 – Up to 14 ports, DMI-x4 Gen3 COMPARISON OF INTEL PROCESSORS PCIe 3.0 (2.5, 5.0, 8.0 GT/s) PCIe 3.0

Articles in this issue

Links on this page

Archives of this issue

view archives of Computer Graphics World - November / December 2017