Chinese come up with a new DeepSeek Coder

Published in AI

Chinese come up with a new DeepSeek Coder

by Nick Farrell on20 June 2024

font size decrease font size increase font size
Print
Email

Open saucy ChatGPT rival

The Chinese AI company DeepSeek, known for its ChatGPT rival trained on a massive 2 trillion English and Chinese tokens, has just unveiled DeepSeek Coder V2. This new release is an open-source code language model that uses a mixture of experts (MoE) approach.

Building on the success of last month's DeepSeek-V2, the Coder V2 version shines in coding and mathematical tasks, supporting over 300 programming languages.

DeepSeek claims the code surpasses leading closed-source models like GPT-4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro. DeepSeek boasts that this is the first open model to achieve such a milestone, ranking well above Llama 3-70B and similar models.

DeepSeek Coder V2 also retains strong general reasoning and language processing abilities.

DeepSeek, established just last year with the ambitious goal of exploring the enigma of AGI (Artificial General Intelligence) with a sense of wonder, has quickly become a prominent figure in China's AI landscape, alongside companies such as Qwen, 01.AI, and Baidu.

Within its first year, DeepSeek made several models publicly available, including the DeepSeek Coder series. The original version, featuring up to 33 billion parameters, performed well in benchmarks, offering project-level code completion and infilling. However, it was limited to 86 programming languages and a 16K context window.

The outfit said that the latest V2 iteration improves upon this, supporting 338 languages and a 128K context window, thus handling more complex coding challenges. In tests against MBPP+, HumanEval, and Aider benchmarks—which measure code generation, editing, and problem-solving in large language models.

DeepSeek Coder V2 scored 76.2, 90.2, and 73.7, respectively, outperforming most other models, including GPT-4 Turbo, Claude 3 Opus, Gemini 1.5 Pro, Codestral, and Llama-3 70B. It showed similar prowess in mathematical benchmarks (MATH and GSM8K), with only GPT-4o slightly outscoring it in some areas.

DeepSeek Coder V2 is available under the MIT license, permitting academic research and commercial use without restrictions. Users can download the model in both 16B and 236B sizes, in instruct and base versions, from Hugging Face.

DeepSeek also offers API access to the models on a pay-as-you-go basis. For those interested in experiencing the model firsthand, DeepSeek provides an interactive chatbot feature to explore the capabilities of DeepSeek Coder V2.

Last modified on 20 June 2024

Rate this item

(0 votes)

More in this category: « The Dot.com bubble is not similar to what is happening with AI Apple claims its Intelligence is illegal in the EU »

Chinese come up with a new DeepSeek Coder

Most popular - Notebooks

Latest comments

Read more about: