Mistral has introduced a new open-source model specifically designed for software engineering agents: Devstral-Small-2505. At 24 billion parameters, this agentic LLM is optimized for tasks like navigating codebases, editing files across projects, and automating software development workflows. Despite its relatively compact size, it achieves a leading 46.8% on the SWE-Bench Verified benchmark, outperforming both Claude 3.5 Haiku and GPT-4.1-mini under the same test conditions.
Fine-tuned from Mistral-Small-3.1, Devstral supports a context window of 128,000 tokens—a vital capability for managing large codebases or multi-file operations. The model was trained in collaboration with All Hands AI, and the results place it as the top-performing open-source model on SWE-Bench Verified to date. Importantly, it uses the Tekken tokenizer with a vocabulary size of 131k, offering high tokenization efficiency for code-related input.
Devstral’s design centers around “agentic coding,” meaning it’s built to power autonomous systems that can carry out software engineering tasks without constant supervision. It is fully open-source under the Apache 2.0 License, and is efficient enough to run locally on a single RTX 4090 or even a Mac with 32GB RAM.
For local usage, the model supports deployment via vLLM, Ollama, Transformers, LMStudio, and mistral-inference, with GGUF-format weights available for easy integration into quantized inference pipelines. For agent orchestration, the recommended setup involves the OpenHands runtime, which provides a Dockerized environment to facilitate tool use, sandboxing, and iteration loops.
Fine-tuning and experimentation are also streamlined with Unsloth Dynamic 2.0, enabling fast and accurate quantization and adaptation workflows. The Devstral ecosystem includes examples for building complete applications—like a FastAPI + React to-do app—through guided prompts, showcasing how quickly you can prototype and deploy with the agent.
This release reflects a growing trend in task-specialized LLMs, where performance is not just measured in tokens per second or raw benchmarks, but in how efficiently and safely a model can operate as part of a broader development agent. Devstral’s focus on real-world developer tasks, long-context support, and extensibility make it a strong contender for integration into modern AI-driven coding systems.