The Free Software Foundation (FSF) has launched a call for white papers on GitHub Copilot. The papers submitted are intended to analyze the effects of the machine learning assistant on the free software community, which is associated with numerous questions. The appeal blog post promises that the organization will read all submitted white papers and pay a reward for every $ 500 published.
At the same time, the article makes it clear that, from the point of view of the FSF, Copilot is “unacceptable and unjust”, since the use with Microsoft products Visual Studio or Visual Studio Code requires software that, in their view, is not free / libre software. At this point it should be mentioned that the source code editor Visual Studio Code is free and essentially open source, but far from free software in the understanding of the FSF.
Copilot as a couple programmer
The essays should not deal with the tools, but with the open questions about the use of machine learning (ML) as a code aid. The Copilot service, introduced in June, helps you write code; GitHub calls it an “AI Pair Programmer”, which, like in pair programming, gives suggestions for improving and adding to the source code.
The technical basis is the Codex ML system developed by OpenAI, which converts natural language into source code. For example, Copilot tries to use a comment such as
// Get average runtime of successful runs in seconds to create suitable source code. He also creates boilerplate code such as getters and setters, adds repetitive definitions and suggests suitable unit tests.
Copyright and legal aspects
The copilot gets his “knowledge” from numerous openly accessible repositories without GitHub explicitly asking the responsible persons. This “Scraping Code” quickly brought up some allegations including on Twitter. The FSF blog post GitHub Copilot does not attack directly in this regard, but the questions asked allow the intention to shine through and therefore have a good rhetorical effect.
According to the FSF, the motivation for the Call for White Papers that has just started is a flood of inquiries regarding the Foundation’s position on the open questions about copilot. Developers want to know whether the training of an artificial neural network with their software can be described as fair use. Conversely, those who are fundamentally interested in Copilot are probably wondering whether elements copied from GitHub repositories, such as code snippets, could potentially lead to copyright infringement. There is also the question among activists whether it is not fundamentally unfair to set up a commercial service based on their work.
Copyright questions in particular come up again and again in connection with machine learning applications. In order for such systems to be able to create something, they must first undergo training. What the source code repositories are for copilot, are for language models such as the Generative Pre-trained Transformer 3 (GPT-3) from OpenAI texts. In the area of image generation, for example with the DALL-E, which was also developed by OpenAI and based on GPT-3, similar questions are likely to arise.
GitHub is well aware of the problem and addresses a few points in the FAQ at the end of the copilot page. Accordingly, large parts of the ML community consider training based on publicly available data to be fair use. However, since the area is new territory, GitHub is interested in a discussion with developers on copyright and other issues in order to develop appropriate standards for training ML models.
A questionnaire as a template
The Free Software Foundation leaves the specific formulation of the answers to those who contribute white papers on the topic. You should answer the following questions, among others:
- Does training based on public repositories violate copyright law? Is It Fair Use?
- How likely is it that Copilot’s output will lead to actionable claims for violations of GPL-licensed works?
- How can developers ensure that any copyrighted code is protected from being infringed by Copilot?
- Does copilot violate the AGPL (GNU Affero General Public License) when learning from AGPL protected code?
- Is the trained AI / ML model protected by copyright, and if so: who owns the copyright?
The Call for White Papers runs until August 23, and contributions should be submitted to the email address firstname.lastname@example.org. Papers should be no more than 3000 words long and target the free software movement as much as possible, but the organization is also considering texts written for lawyers.
The Free Software Foundation intends to review the submissions by September 20 and send notifications as to whether they will accept the respective papers for publication. Further details and the complete questionnaire can be found on the FSF blog.