The French AI startup Mistral AI announced Devstral 2, a 123-billion-parameter open-weights coding model designed to function as part of an autonomous software engineering agent. The model posted a 72.2 percent SWE-bench Verified score, ranking among the top open-weights coding models.
Alongside the model, Mistral rolled out Mistral Vibe, a CLI that lets developers interact with the Devstral family directly from the terminal. It can scan directory structures, inspect Git status to preserve context, modify multiple files, and run shell commands autonomously. The company released the CLI under the Apache 2.0 license.
SWE-bench Verified tests 500 real software-engineering tasks drawn from Python GitHub issues; the AI must read issue descriptions, navigate code, and patch it to pass tests. Industry insiders say the benchmark is watched closely by major AI players, even if it tends to overrepresent simpler bug fixes in many tasks.
In parallel with Devstral 2, Mistral released Devstral Small 2, a 24B parameter model scoring 68% on SWE-bench. It is designed to run locally on consumer hardware, including laptops without internet access. Both versions support a 256,000-token context window, enabling medium-sized codebases, with licensing for Small 2 under Apache 2.0 and Devstral 2 under a modified MIT license.
The company frames Devstral 2 as a step toward more capable autonomous software engineering, though observers caution that benchmarks are not fully predictive of real-world performance. Still, the approach signals how open-weight models are narrowing the gap with proprietary rivals while highlighting ongoing debates around reliability and safety in AI-assisted coding.”