Alibaba Cloud handles shedloads of images from its legions of third party vendors.
This requires oversight on images which results in a huge AI inference compute workload. To reduce operational expenses, Alibaba Cloud was seeking alternative, cost-effective processor solutions to detect harmful or un-wanted text information embedded in tens of millions of images every day.
A large portion of today’s internet traffic consists of images. Some images contain harmful unwanted text information such as unpaid advertisements, which have negative impacts on the paid advertisement business. In order to maintain a consistent experience on e-commerce sites, oversight on images is required and creates a large AI inference compute workload.
Alibaba historically used GPUs to run Yolo-v2 Tiny with Float32 data type to understand the content in tens of millions of images every day. As the architecture was not well optimized, the GPU could only achieve limited queries per second (QPS) throughput, which resulted in high costs in power and server footprint. To reduce operation expenses, Alibaba looked for a more cost-effective solution than GPUs for detecting harmful or un-wanted text information.
Using Xilinx FPGAs, the Alibaba Cloud FaaS team ran the Yolo-v2 Tiny model at Int16 to achieve superior QPS performance with similar accuracy to GPUs. Inspired by FaaS, with the similar optimization, GPU can achieve similar QPS; however, the Xilinx solution is much more cost effective per image because the GPU solution has a much higher TCO. In this project, the Alibaba FaaS team also used Vitis AI to expedite their development.
Xilinx used its 16nm Virtex UltraScale+ FPGA powered Alibaba Cloud FaaS and Xilinx Vitis AI development kit (formally called MLSuite) and scored Alibaba 75 per cent savings in total cost of ownership.
A single Xilinx UltraScale+ FPGA delivers hundreds of pictures per-second, representing a 3.5X performance improvement over initial GPU implementation.