New model outperforms OpenAI and Salesforce with a more efficient solution
Qodo, the quality-first AI coding platform, announced today the release of a Qodo-Embed-1-1.5B, a new code embedding model that outperforms OpenAI and is the best overall model while being a fraction of the size, 1.5 billion parameters as opposed to 7 billion. The model sets a new standard for efficiency in code understanding, enabling AI systems to better process and work with code at any scale. With the ability to run on low-cost GPUs, it makes advanced code search and retrieval capabilities accessible to development teams of all sizes.
Code embedding models are essential to how AI systems understand and work with large-scale codebases – they enable accurate code search, help AI assistants retrieve relevant context, and ultimately allow AI coding agents to understand existing complex codebases. While much attention has focused on code-generating AI, the ability to accurately search and understand existing code remains crucial for both AI systems and human developers. These embedding models power everything from finding similar code patterns to enabling retrieval-augmented generation (RAG) systems that ground AI responses in real codebases.
Qodo-Embed-1-1.5B stands out for its exceptional efficiency-to-performance ratio. It surpasses larger competitors, including OpenAI’s text-embedding-3-large model (65.17), scoring 68.53 on CoIR (Code Information Retrieval Benchmark). It also surpasses models of comparable size, including Salesforce’s SFR-Embedding-2_R (67.41). Qodo-Embed-1-7B, Qodo’s larger model, also outperforms models of the same size, scoring 71.5. CoIR is the industry’s most comprehensive benchmark for code retrieval capabilities across multiple programming languages and search tasks. This efficiency of Qodo’s smaller model is crucial for large-scale embedding tasks, enabling teams to process and search through extensive codebases without requiring massive computing resources.
“While powerful new LLMs like OpenAI-o3 and DeepSeek-R1 are making headlines for reasoning and thinking capabilities, real-world development tasks require more than just logic from AI—they need to retrieve, interpret, and contextualize code” said Itamar Friedman, CEO of Qodo. “By focusing on code understanding and developer workflows, we’re creating AI that doesn’t just suggest code, but understands the entire software engineering context.”
The model’s performance stems from the creation of high-quality synthetic training examples based on permissive open-source code that enable it to better represent the relationships between code and natural language descriptions. This allows for more accurate code search when users make queries in plain English, which has been a weak point for existing models.
Qodo’s code embedding model is available on HuggingFace, with the 1.5B parameter version released under the OpenRAIL++-M license and additional model sizes under custom licensing terms. The model will also be available through NVIDIA’s NIM platform, and AWS Sagemarker Jumpstart, making it easily available to enterprise development teams.
Explore AITechPark for the latest advancements in AI, IOT, Cybersecurity, AITech News, and insightful updates from industry experts!