fbpx

NVIDIA Announces TensorRT 5 and TensorRT Inference Server

TensorRT5_Feature

September 12, 2018 – At GTC Japan, NVIDIA announced the latest version of the TensorRT high-performance deep learning inference optimizer and runtime.

TensorRT 5 provides support for the new Turing architecture, new optimizations and INT8 APIs that achieves up to 40x faster inference over CPU-only platforms.

This latest version dramatically speeds up inference of recommenders, neural machine translation, speech, and natural language processing apps.

TensorRT 5 Highlights:

  • Speeds up inference by 40x over CPUs for models such as translation using mixed precision on Turing Tensor Cores
  • Optimizes inference models with new INT8 APIs
  • Supports Xavier-based NVIDIA Drive platforms and the NVIDIA DLA accelerator for FP16

TensorRT 5 will be available to members of the NVIDIA Developer Program.

The TensorRT inference server is a containerized microservice that maximizes GPU utilization and runs multiple models from different frameworks concurrently on a node. It leverages Docker and Kubernetes to integrate seamlessly into DevOps architectures.

Learn more >

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

1646 N. California Blvd.,
Suite 360
Walnut Creek, CA 94596 USA

Phone
Phone: +1 (925) 954-1411
Scroll to Top