Four things you need to know about GitHub's AI model training policy – including how to opt out

(Image credit: Getty Images)

GitHub has announced plans to begin using customer interaction data to train AI models.

In a blog post detailing the move last week, GitHub’s chief product officer (CPO) Mario Rodriguez said the policy change will come into effect from 24 April onward.

Rodriguez said the move will enable the company to provide more intuitive AI capabilities for developers using the platform.

“By participating you’ll help our models better understand development workflows, deliver more accurate and secure code pattern suggestions, and improve their ability to help you catch potential bugs before they reach production,” he wrote.

Rodriguez noted that the company has already incorporated Microsoft interactions to fine tune model training processes, which so far have delivered marked improvements. Expanding the scheme to other users will support this at scale.

“The improvements we’ve seen by incorporating Microsoft interaction data indicate we can improve model performance for a more diverse range of use cases by training on real-world interaction data,” Rodriguez said.

So what do GitHub users need to know about the policy change?

What data will GitHub use to train AI models?

According to Rodriguez, customer interaction data set to be used by the company spans a range of areas, including:

Outputs accepted or modified by the user
Inputs sent to GitHub Copilot (including code snippets)
Code context surrounding your cursor position
Comments and documentation you write
File names, repository structure, and navigation patters
Interactions with Copilot features (including chat and inline suggestions)
User feedback on suggestions (thumbs up/down ratings)

GitHub emphasized that there are some data types it will not use in AI model training, which includes interaction data from Copilot Business, Copilot Enterprise, or enterprise-owned repositories.

Similarly, content from issues, discussions, or private repos “at rest” won’t be used by the company - although there is a caveat here.

“We use the phrase “at rest” deliberately because Copilot does process code from private repositories when you are actively using Copilot,” Rodriguez explained.

“This interaction data is required to run the service and could be used for model training unless you opt out.”

GitHub insists that data gathered as part of the program won’t be shared with third-party AI model providers or “other independent service providers”.

Notably, the company said that data “may be shared” with GitHub affiliates such as companies in its broader corporate family, including Microsoft.

In an FAQ section linked to the original blog post, GitHub did note that it may “engage service providers to assist with model training” on its behalf, albeit under the condition that this data is used “only for providing services to GitHub”.

What GitHub plans are affected?

The policy change from GitHub applies to specific subscription plans, with users on the aforementioned Business and Enterprise plans exempt.

Similarly, student and teacher accounts for GitHub Copilot are also exempt.

Customers who will be affected include those on Copilot Free, Pro, and Pro+ accounts, although you still have the option to opt out.

How to opt out

Opting out of the program is fairly straightforward, and Rodriguez confirmed that users who chose to opt out of product improvement data policies will retain these preferences.

“Your choice is preserved, and your data will not be used for training unless you opt in,” he wrote.

For those looking to opt out, users are advised to visit /settings/copilot/features. From there, under the “Privacy” section they will have an option to “Allow GitHub to use my data for AI model training” and disable the option.

Follow ITPro on Google News and add us as a preferred source to keep tabs on all our latest news, analysis, views, and reviews.

You can also follow ITPro on LinkedIn, X, Facebook, and BlueSky.

TOPICS

Ross Kelly is ITPro's News & Analysis Editor, responsible for leading the brand's news output and in-depth reporting on the latest stories from across the business technology landscape. Ross was previously a Staff Writer, during which time he developed a keen interest in cyber security, business leadership, and emerging technologies.

He graduated from Edinburgh Napier University in 2016 with a BA (Hons) in Journalism, and joined ITPro in 2022 after four years working in technology conference research.

For news pitches, you can contact Ross at ross.kelly@futurenet.com, or on Twitter and LinkedIn.