Published in PC Hardware

Microsoft shows off Milan-X benchmarks

by on09 November 2021

Large caches make it too sexy for Milan, Tokyo and Japan

Software King of the World has spilt the beans on Milan-X benchmarks to show the performance uplift that 3D V-Cache brings to the table.

Toms Hardware pointed out that Vole will tap into Milan-X to power its Azure HBv3 Series VMs, which are based on pair of EPYC 7V73X processors. Each processor devliers up to 64 Zen 3 cores for a total of 128 cores per server. However, eight cores from each server are reserved to feed the Azure hypervisor.

Microsoft offers its customers up to five configurations with different core counts: 120 cores, 96 cores, 64 cores, 32 cores and 16 cores. The EPYC 7V73X sports a peak clock speed up to 3.5 GHz.

According to Microsoft Milan-X features up to 768MB of L3 cache (L3 + 3D V-Cache) per chip so a dual-socket configuration delivers up to 1.5GB of L3 cache per system or in Microsoft's case, VM. Logically, the L3 allocation will depend on the setup.

For example, the 16-core VM has access to 96MB per core whereas the 32-core setup drops to 48MB per core. At any rate, Milan-X represents a 3x upgrade over current Milan chips or 6x improvement over the previous Rome processors.

The Azure HBv3's other hardware has not been changed. There is still 448GB of memory with a bandwidth of 350 GBps (measured with the STREAM TRIAD). Two 900GB NVMe SSDs provide high-speed storage with read and write speeds up to 6.9 GBps and 2.9 GBps, respectively, and a Mellanox ConnectX-6 NIC for 200 Gbps Ethernet connectivity.

Microsoft said that AMD’s large cache obviously boosts effective memory bandwidth and latency. Workloads, such as computational fluid dynamics (CFD), explicit finite element analysis (FEA), weather simulation and EDA RTL simulation will benefit from Milan-X's generous amount of L3 cache. On the contrary, workloads that are dependent on peak FLOPS, clock speeds or memory capacity are immune to large L3 caches. These include molecular dynamics, EDA full chip design, EDA parasitic extraction and implicit finite element analysis.

This means that Milan-X (EPYC 7V73X) had between 42- 50 per cent lower memory latency in comparison to Milan (EPYC 7V13). Milan-X presents one of the biggest jumps in relative performance in terms of memory latency ever since memory controllers have transitioned into the processor.

Vole said that the large caches allow for higher cache hit rates and created a combination of L3 and DRAM latencies for an improved real-world effective result. Due to the way how AMD is stacking up the L3 cache, the width of the L3 latency distribution has expanded. Nonetheless, Microsoft believes that Milan-X should have a L3 memory latency in the same ballpark as Milan.

 Milan-X puts up around 358 GB´s of throughput on the STREAM TRIAD benchmark. The result is identical to that of a conventional dual-socket server with Milan chips paired with DDR4-3200 memory in a single DIMM per channel setup.

Thanks to the implementation of AMD's 3D V-Cache, Milan-X's scaling efficiency was off the charts. Using the Ansys Fluent 2021 R1 benchmark with the f1_racecar_140 model as a point of reference, Milan-X demonstrated a scaling efficiency up to 200% when comparing 64 VMs to 1 VM. In other words, 64 HBv3 VMs with Milan-X get the job done in half if it would take one HBv3 instance. At the end of the day, customers benefit from as 50% reduction in VM costs at the rate of 127x faster solution time.

Last modified on 09 November 2021
Rate this item
(1 Vote)