GitHub debuts Copilot tool making it easier to give credit to developers

GitHub Copilot code referencing: Image depicting Python source code

(Image credit: n/a)

published 4 August 2023

GitHub has revealed a new feature for its Copilot pair programmer that will inform developers when code suggested to them by AI is taken from public repositories.

It said the reasoning behind the move is to offer developers greater transparency when it comes to using Copilot’s code suggestions.

“Some want to learn from others’ work, others may want to take a dependency rather than introduce new app logic, and still others want to give or receive credit for similar work. Whatever the reason, it’s nice to know when similar code is out there,” it said.

Currently in private beta, Copilot’s new ‘code referencing’ feature will check code suggestions - including surrounding code of approximately 150 characters - against all public code on the GitHub platform. Any matches, along with repository information, will be shown to the developer.

It is then up to the developer to accept the code suggestion - including information about matches - or block the suggestion.

GitHub estimates that matches will occur in less than 1% of Copilot suggestions, although that figure depends on context.

“In the context of an existing application with surrounding code, we almost never see a match,” it said. “But in an empty or nearly empty file, we see matches far more often.”

The code referencing solution could make ground on easing the concerns shared among developers and organizations regarding the potential legal issues that could arise from using suggestions from Copilot.

The release will be seen as an acceptance from GitHub that developers wish to know when the suggestions match public code.

RELATED RESOURCE

The 5 pillars of personalization at scale is a whitepaper from IBM which covers coordinating all aspects of your operations to curate customer interaction

The five pillars of personalization at scale

Start delivering experiences that will delight and entice your customers.

DOWNLOAD FOR FREE

It comes amid scrutiny of the company over the training of Copilot on publicly available code, including public GitHub repositories.

GitHub continues to wrestle with a class action lawsuit it currently faces, one that was filed in the San Francisco US federal court last year.

The complainants’ case centers around allegations that the company committed “software piracy on an unprecedented scale” by allowing Copilot to suggest code snippets from publicly available open source code repositories.

The case noted that open source projects are protected by a number of different licenses which require developers to correctly attribute the project’s authors’ names and copyright when using its code in other applications.

GitHub faces allegations that by suggesting code snippets through Copilot, it is removing the licenses that protect the open source projects from which the code was suggested, illegally violating their terms of reuse.

GitHub and Microsoft, which owns the platform, have filed multiple motions to dismiss the case, including its most recent effort in June 2023. The former was rejected and the latter will be heard in court in September.

Why do developers need code referencing?

GitHub Copilot was launched in 2021 and is designed to assist developers by suggesting blocks of code based on context.

GitHub describes the tool as the world’s first at-scale AI pair programmer and, according to the company, has trained it on billions of lines of public code.

Recent figures from GitHub suggest it has been used by more than 1 million developers and more than 27,000 organizations to build code faster.

However, GitHub has acknowledged that developers would also like to know when suggestions made by Copilot match public code.

It has been possible to block matching code since 2022. However, in November last year, Ryan Salva, vice president of product at GitHub, said: “While useful in some contexts, blocking matching suggestions doesn’t address all use cases”.

With code referencing, the developer is shown the matching code, the repos where the code appears, and the license governing each repo.

Since the same code might appear in multiple places with occasionally conflicting licenses, developers can decide for themselves on attribution rather than having the matches blocked from the outset.

TOPICS

Richard Speed is an expert in databases, DevOps and IT regulations and governance. He was previously a Staff Writer for ITPro, CloudPro and ChannelPro, before going freelance. He first joined Future in 2023 having worked as a reporter for The Register. He has also attended numerous domestic and international events, including Microsoft's Build and Ignite conferences and both US and EU KubeCons.

Prior to joining The Register, he spent a number of years working in IT in the pharmaceutical and financial sectors.