Cerebras Systems is a California-based upstart which has just released its Cerebras Inference, a chip it claims is 20 times faster than anything Nvidia's can come up with.
Cerebras barks that its chips deliver groundbreaking performance in AI inference, where the chip can process inputs at speeds reportedly 20 times faster than Nvidia's. This is due to the direct integration of memory and processing power, enabling faster data retrieval and processing without the delays of inter-chip data transfers.
The company claims its chips are particularly suited for enterprises requiring extremely fast processing of large AI models, such as those used in natural language processing and deep learning inference tasks. Their system is ideal for organisations looking to minimize latency and process large volumes of data in real-time.
According to ZDNet Cerebras has concocted the Wafer Scale Engine, now in its third iteration, which powers the new Cerebras Inference. This colossal chip integrates 44GB of SRAM, doing away with the need for external memory and removes GPU bottlenecks.
By solving the memory bandwidth conundrum, Cerebras Inference can spit out 1,800 tokens per second for Llama3.1 8B and 450 tokens for Llama3.1 70B.
The Wafer Scale Engine is built on a single, massive wafer. The latest iteration has approximately 4 trillion transistors and integrates 44GB of SRAM directly on-chipCerebras aims to create the largest and most powerful chip capable of storing and processing AI models directly on the wafer, significantly reducing latency in AI computations.
Nvidia has been the King of AI and deep learning realms GPU solutions, but if Cerebras' can go anywhere with its tech it could upend the market dynamics. AMD and Intel, both heavyweights in the chip industry, might also feel the heat as Cerebras chips start carving out a niche in high-performance AI tasks.