Published in AI

Intel Xeon AWS AI can read 4X faster than Nvidia

by on09 May 2018


Software optimizations to blame

There's an unexpected twist in the tangled tale as Intel managed to prove that for machine translation, which uses RNNs, the Intel Xeon Scalable processor outperforms NVidia V100 based systems by four times.

To bring you up to speed, RNN is a recurrent neural network, and this is a class of artificial neural network where connections between units form a directed graph along a sequence.

Intel did a demo in Oregon labs using AWS Sockeye Neural Machine Translation (NMT) model with Apache* MXNet* with Intel Math Kernel Library (Intel MKL).

Believe it or not, the library makes the whole difference and previously deep learning training and inferencing on CPU took an unnecessarily long time because the software was not written to take full advantage of the hardware features and functionality.

Intel Xeon beats Nvidia V100 by 4X

For machine translation which uses RNNs, the Intel Xeon Scalable processor outperforms NVidia V100 by 4x on the AWS Sockeye Neural Machine Translation (NMT) model with Apache MXNet when Intel Math Kernel Library (Intel MKL) is used.

Previously, deep learning training and inference on CPUs took an unnecessarily long time because the software was not written to take full advantage of the hardware features and functionality. That is no longer the case. The Intel Xeon Scalable processor with optimized software has demonstrated enormous performance gains for deep learning compared to the previous generations without optimized software.

Intel Xeon v3 processor better known to Fudzilla readers as Haswell, gains up to 198x for inference and 127x for training measured with GoogleNet v1 for inference and AlexNet for training using Intel Optimized Caffe.

Video of the live demo

This gains apply to various types of models including multi-layer perceptron (MLP), convolutional neural networks (CNNs), and the various types of recurrent neural networks (RNNs). The performance gap between GPUs and CPUs for deep learning training and inference has narrowed, and for some workloads, CPUs now even have an advantage over GPUs.

Intel has much more detail in its blogs and it even includes a demo on the video where Vivian TJanecek from Intel's data center marketing and Sowmya Bobba, Intel's ML engineer demonstrate this demo on video. You can clearly see that th Intel bases system scores 93 sentences per second while Nvidia V100 machine scores 22 sentences per second.

The experiments were conducted using the servers at Amazon Web Services (AWS) with the publicly available Apache MXNet framework. It is important to mention that a neutral framework was not maintained by neither Intel nor Nvidia. The benchmark used is the AWS Sockeye, an open source project for NMT.

We advise you to check the blog and look at the video - it definitely sounds interesting to any data scientist.
 

 

.

Last modified on 09 May 2018
Rate this item
(0 votes)

Read more about: