Tensor Streaming Processor breaks 21,700 inferences per second

Published in Graphics

Tensor Streaming Processor breaks 21,700 inferences per second

by Nick Farrell on08 January 2020

font size decrease font size increase font size

It just can't stop interfering

Groq, the inventor of the Tensor Streaming Processor (TSP) architecture, today announced that its processor has achieved 21,700 inferences per second (IPS) for ResNet-50 v2 inference.

Groq said that its level of inference performance exceeds that of other commercially available neural network architectures, with throughput that more than doubles the ResNet-50 score of the incumbent GPU-based architecture. ResNet-50 is an inference benchmark for image classification and is often used as a standard for measuring the performance of machine learning accelerators.

Earlier this week, the Linley Group released its latest Microprocessor Report titled “Groq Rocks Neural Networks,” which concludes that Groq’s “TSP stands out in both peak performance and ResNet-50 throughput,” and that “Groq’s deep-learning accelerator is the fastest available on the merchant market”.

Groq co-founder and CEO Jonathan Ross said: “These ResNet-50 results are a validation that Groq’s unique architecture and approach to machine learning acceleration delivers substantially faster inference performance than our competitors. These real-world proof points, based on industry-standard benchmarks and not simulations or hardware emulation, confirm the measurable performance gains for machine learning and artificial intelligence applications made possible by Groq’s technologies.”

Significantly, the Groq platform doesn’t require large batch sizes for optimal inference processing performance, with the TSP architecture achieving peak throughput even at batch size 1, when processing a single image at a time. The ability to quickly and efficiently process small batch sizes is especially important to minimize latency in real-time applications. While Groq’s architecture is up to 2.5 times faster than GPU-based platforms at large batch sizes, its tensor streaming processor is up to 17 times faster at batch size 1.

With the Groq architecture providing more than a 2x performance advantage over GPU-based solutions, engineering managers can deploy processing platforms that offer twice the inference performance without doubling infrastructure costs. Reducing the number of deployed systems can also save datacenter space, lower power usage, and decrease system complexity overall.

Last modified on 08 January 2020

Rate this item

(4 votes)

Tagged under