IBM just open sourced these generative AI coding models

IBM branding and logo pictured on the company pavilion during the Mobile World Congress 2024 in Barcelona, Spain, on February 28, 2024.
(Image credit: Getty Images)

IBM has open sourced a set of generative AI models for coding which it said offer the right combination of features to support enterprise developers.

The tech giant is releasing a series of models for code generation tasks, trained with code written in 116 programming languages.

IBM said there are plenty of use cases for tools built on these models, ranging from agents that could write code for developers, to tools that can explain why code isn’t working - and how to fix it.

“Many of the other quotidian but essential tasks that are part of a developer’s day — from generating unit tests, to writing documentation or running vulnerability tests — could be automated with these models,” the company said.

Tools for developers trained on software code have been one of the breakout successes of generative AI. These tools aim to improve the efficiency of developers by offering coding advice and even proposing actual snippets of code. Within a few years, three-quarters of developers will be using these assistants according to Gartner.

IBM already has its own suite of generative AI coding assistants – the watsonx Code Assistant (WCA) family which includes WCA for Ansible Lightspeed for IT automation, and WCA for IBM Z for application modernization. WCA for Z, for example, uses IBM’s 20-billion parameter Granite LLM to help developers turn COBOL applications into services for IBM mainframes.

IBM is now open-sourcing four variations of the IBM Granite code model.

These models range in size from three to 34 billion parameters and are optimized for enterprise software development workflows. That means tasks including code generation, fixing, and explanation, and could have use cases including application modernization or running on devices with limited memory.

IBM is confident about model performance

The company said these Granite code models consistently match state-of-the-art performance among open source LLMs currently available.

The models are available on Hugging Face, GitHub, watsonx.ai, and RHEL AI. The underlying base code models are the same as the one used to train WCA.

The generative AI market has been expanding rapidly, but it’s not always easy for business users to find the right way forward.

Many of the biggest LLMs have now grown to tens or even hundreds of billions of parameters in scale.

While these might be good if you are looking to build a chatbot that can cover a wide range of subjects, these models are computationally expensive to train and run, IBM said.

“For enterprises, massive models can become unwieldy for more specific tasks, full of irrelevant information and running up high inferencing costs.”.

On top of this, many enterprises have been reluctant to adopt LLMs for commercial purposes because the licensing of these models is often unclear, and the details of how these models were trained, or how the data was cleaned and filtered for things like hate, abuse, and profanity are often unknown.

Flexible capabilities

Ruchir Puri, chief scientist at IBM Research said that for many enterprise use cases, the 8B Granite code model variant released by IBM will be the right combination of weight, cost to run, and capability.

“With generative systems built on Granite models, developers can create new ways to translate legacy codebases like COBOL into more modern languages like Java. It’s one of the major uses for code models that IBM saw when first diving into the world of AI for code, and remains one of the most important,” the company said.

These Granite code models are released under the Apache 2.0 license.

During testing on benchmarks including HumanEvalPack, HumanEvalPlus, and RepoBench, IBM said it saw strong performances on code synthesis, fixing, explanation, editing, and translation, across most major programming languages, including Python, JavaScript, Java, Go, C++, and Rust.

“Our models can outperform some twice their size, such as with Code Llama, and while some other models may perform slightly better in some tasks like code generation, no one model could perform at a high level at generation, fixing, and explanation — apart from Granite,” it said.

IBM has also released a full technical paper detailing the code models. It said there are important gaps in the current field of LLMs for code, especially in the context of enterprise software development.

“First, while very large, generalist LLMs can achieve excellent coding performance, their size makes them expensive to deploy. Smaller code-focused models can achieve excellent code generation performance in a smaller and more flexible package, but performance in coding tasks beyond generation (e.g. fixing and explanation) can lag behind code generation performance,” the researchers wrote.

The researchers said they planned to release updates to these models to improve their performance, and in the near future plan to release long-context as well as Python- and Java-specialized model variants.

Steve Ranger

Steve Ranger is an award-winning reporter and editor who writes about technology and business. Previously he was the editorial director at ZDNET and the editor of silicon.com.