Home Applications NVIDIA TensorRT 3 Dramatically Accelerates AI Inference for Hyperscale Data Centers

NVIDIA TensorRT 3 Dramatically Accelerates AI Inference for Hyperscale Data Centers

NVIDIA TensorRT 3 Dramatically Accelerates AI Inference for Hyperscale Data Centers

Alibaba, Baidu, Tencent, JD.com and Hikvision Adopt NVIDIA TensorRT for Programmable Inference Acceleration

GTC China – NVIDIA today unveiled new NVIDIA® TensorRT™ 3 AI inference software that sharply boosts the performance and slashes the cost of inferencing from the cloud to edge devices, including self-driving cars and robots.

The combination of TensorRT 3 with NVIDIA GPUs delivers ultra-fast and efficient inferencing across all frameworks for AI-enabled services — such as image and speech recognition, natural language processing, visual search and personalized recommendations. TensorRT and NVIDIA Tesla® GPU accelerators are up to 40 times faster than CPUs(1) at one-tenth the cost of CPU-based solutions.(2)

“Internet companies are racing to infuse AI into services used by billions of people. As a result, AI inference workloads are growing exponentially,” said NVIDIA founder and CEO Jensen Huang. “NVIDIA TensorRT is the world’s first programmable inference accelerator. With CUDA programmability, TensorRT will be able to accelerate the growing diversity and complexity of deep neural networks. And with TensorRT’s dramatic speed-up, service providers can affordably deploy these compute intensive AI workloads.”

More than 1,200 companies have already begun using NVIDIA’s inference platform across a wide spectrum of industries to discover new insights from data and deploy intelligent services to businesses and consumers. Among them are Amazon, Microsoft, Facebook and Google; as well as leading Chinese enterprise companies like Alibaba, Baidu, JD.com, iFLYTEK, Hikvision, Tencent and WeChat.

“NVIDIA’s AI platform, using TensorRT software on Tesla GPUs, is an outstanding technology at the forefront of enabling SAP’s growing requirements for inferencing,” said Juergen Mueller, chief innovation officer at SAP. “TensorRT and NVIDIA GPUs make real-time service delivery possible, with maximum machine learning performance and versatility to meet our customers’ needs.”

“JD.com relies on NVIDIA GPUs and software for inferencing in our data centers,” said Andy Chen, senior director of AI and Big Data at JD. “Using NVIDIA’s TensorRT on Tesla GPUs, we can simultaneously inference 1,000 HD video streams in real time, with 20 times fewer servers. NVIDIA’s deep learning platform provides outstanding performance and efficiency for JD.”

TensorRT 3 is a high-performance optimizing compiler and runtime engine for production deployment of AI applications. It can rapidly optimize, validate and deploy trained neural networks for inference to hyperscale data centers, embedded or automotive GPU platforms.

It offers highly accurate INT8 and FP16 network execution, which can save data center operators tens of millions of dollars in acquisition and annual energy costs. A developer can use it to take a trained neural network and, in just one day, create a deployable inference solution that runs 3-5x faster than their training framework.

To further accelerate AI, NVIDIA introduced additional software, including:

  • DeepStream SDK:NVIDIA DeepStream SDK delivers real-time, low-latency video analytics at scale. It helps developers integrate advanced video inference capabilities, including INT8 precision and GPU-accelerated transcoding, to support AI-powered services like object classification and scene understanding for up to 30 HD streams in real time on a single Tesla P4 GPU accelerator.
  • CUDA 9: The latest version of CUDA®, NVIDIA’s accelerated computing software platform, speeds up HPC and deep learning applications with support for NVIDIA Volta architecture-based GPUs, up to 5x faster libraries, a new programming model for thread management and updates to debugging and profiling tools. CUDA 9 is optimized to deliver maximum performance on Tesla V100 GPU accelerators.

Inference for the Data Center
Data center managers constantly balance performance and efficiency to keep their server fleets at maximum productivity. Tesla GPU accelerated servers can replace over a hundred hyperscale CPU servers for deep learning inference applications and services, freeing up precious rack space, reducing energy and cooling requirements, and reducing cost as much as 90 percent.

NVIDIA Tesla GPU accelerators provide the optimal inference solution — combining the highest throughput, best efficiency and lowest latency on deep learning inference workloads to power new AI-driven experiences.

Inference for Self-Driving Cars and Embedded Applications
With NVIDIA’s unified architecture, deep neural networks on every deep learning framework can be trained on NVIDIA DGX™ systems in the data center, and then deployed into all types of devices — from robots to autonomous vehicles — for real-time inferencing at the edge.

TuSimple, a startup developing autonomous trucking technology, increased inferencing performance by 30 percent after TensorRT optimization. In June, the company successfully completed a 170-mile Level 4 test drive from San Diego to Yuma, Arizona, using NVIDIA GPUs and cameras as the primary sensor. The performance gains from TensorRT allow TuSimple to analyze additional camera data, and add new AI algorithms to their autonomous trucks, without sacrificing response time.

Keep Current on NVIDIA

Subscribe to the NVIDIA blog, follow us on FacebookGoogle+TwitterLinkedIn and Instagram, and view NVIDIA videos on YouTube and images on Flickr.


NVIDIA‘s (NASDAQ: NVDA) invention of the GPU iNVIDIA’s (NASDAQ: NVDA) invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots and self-driving cars that can perceive and understand the world. More information at http://nvidianews.nvidia.com/.

NVIDIA Since 1993, NVIDIA (NASDAQ: NVDA) has pioneered the art and science of visual computing. The company's technologies are transforming a world of displays into a world of interactive discovery -- for everyone from gamers to scientists, and consumers to enterprise customers.

Featured Resources:

Related Articles:


White Papers

    Application Lifecycle Management with Stratusphere UX – White Paper

    Enterprises today are faced with many challenges, and among those at the top of the list is the struggle surrounding the design, deployment, management and operations that support desktop applications. The demand for applications is increasing at an exponential rate, and organizations are being forced to consider platforms beyond physical, virtual and cloud-based environments. Users […]


      Download Commvault VM Backup and Recovery: end-to-end VM backup, recovery and cloud management

      Commvault’s ability to provide end-to-end VM backup, recovery and cloud management creates a significantly better way to build, protect and optimize VMs throughout their lifecycle. Our best-in-class software for VM backup, recovery and cloud management delivers a number of significant benefits, including: VM recovery with live recovery options; backup to and in the cloud; custom-fit […]

      On-Demand Webinars

        What’s Going on in EUC Printing – A Technical Deep Dive!

        The IGEL Community and ThinPrint invite you to watch the following technical deep dive webinar. The agenda is to technically bring you up to speed on what’s going on in the EUC Printing space today along with a deep dive into new methods, technologies, printing scenarios and a discussion on why printing still matters. You […]

        Latest Videos

          Views All IT News on DABCC.com
          Views All IT Videos on DABCC.com
          Win big $$, visit ITBaller.com for more info!

          Visit Our Sponsors