Geek Alert! The following article contains material of a geeky nature and should not be read by persons who are sensitive to discussions which are written by nerds or geeks.
One of the developments in computing which are making a more widespread impact at NAB this year is Nvidia’s Fermi Technology being harnessed to process image data. You may have encountered this technology before in high end graphics cards and possibly not realised it. Fermi is Nvidia’s Graphics Processing Unit (GPU) technology and, in this case, is being utilised for its massively parallel computer processing power and not necessarily as a graphics card to drive external displays.
Nvidia now manufacture two lines of GPU cards for professional use – their Quadro line and their Tesla line. The former being used both for its processing power and for driving external displays whereas the second line is used primarily for its computation power and may not have connectors for external displays. You might reasonably ask what good is a graphics card which does not have connectors to drive a display and the answer is the now widespread use of GPU processing power to calculate and process a wide range of data (and not necessarily image data) – also known as General Purpose GPU (or GPGPU). Nvidia have developed a programming language named CUDA that enables software developers to take advantage of both the traditional processing power available in the host computer’s traditional Central Processor Unit (CPU) but also that of the GPU. It is not uncommon these days to find muti-core processors at the heart (or perhaps brain) of even the most humble of computers. The power of these processors can be augmented by the resources provided by a GPU equipped graphics card to process certain types of data. This can accelerate the capability of software to for example calculate the vectors required for ray tracing in 3D visualisation, scaling video imagery, analysing air traffic flow and many other historically processor intensive (read time consuming) processing tasks.
In talking both to reresentatives at the Nvidia booth and to software development companies at the NAB show, it is clear that, for the Media and Entertainment industry, GPU technology is being used to massively reduce the time taken to render imagery of various types. Where previously, massive clusters of render nodes would be employed to process calculation tasks in parallel, the number of nodes utilised can be reduced for a given task, power consumption is reduced, processing time is reduced and throughput increased by employing GPU equipped servers or workstations. The secret to optimising the handling of material processed by imaging software is in knowing which calculations and manipulations are best handled by traditional CPUs and which are better handled by GPUs. One developer I spoke to today referred to this knowledge as his “secret sauce” and in his opinion he was positioned to hugely decrease the amount of time taken to process image sequences used by Video-on-Demand service providers.
Nvidia Quadro and Tesla boards can be used in conjunction with each other to deliver workstations that are highly augmented with GPU processing power. In the following picture taken at the SuperMicro stand, the machine uses the Quadro 6000 board to provide both display driving capability and GPU processing power and multiple Tesla boards to futher up-the-ante in the GPU processing department. Both Nvidia and SuperMicro representatives explained that equipping a workstation like this requires some degree of forethought towards ensuring that the unit contains sufficient ventilation fans and power supply capacity. The heat produced and power required by these additional cards will quickly overcome averagely equipped units.
For those interested in the numbers, the Tesla cards in the above illustration contain 448 processing cores and 6 GBs of optimised memory per board! To give you some idea of how quickly these boards can perform their calculations, each Tesla board is capable of processing double precision (or 64 bit) floating-point calculations at a peak of 515 GigaFLOPS – that is 515 x 10 to-the-power-of 9 Floating-Point Operations Per Second! (to give some perspective, a six-core PC Intel Core i7 processor can achieve around 109 GFLOPS performing double precision calculations).
Nvidia Tesla and Quadro boards can be seen utilised by a variety of Media and Entertainment industry applications at NAB this year including augmenting Adobe’s Mercury Engine used by their Premiere software, Avid’s Media Composer, Symphony and DS, Elemental Technology’s Elemental transcoding engine, Radiant Grid Technology’s RadiantGrid platform, Root6 Technology’s Content Agent’s H264 encoding and Standards Conversion, as well as a host of other video editing applications, virtual set applications, Transcoding and Transwrapping applications, 3d Visualisation applications and Colour Correction applications.