Claire Wortsman is an IPilogue Writer and a 2L JD Candidate at Osgoode Hall Law School.
Is GitHub Copilot a Copyright Infringer?
At the end of June, GitHub CEO Nat Friedman announced the launch of a technical preview of GitHub Copilot. Much like the predictive text and search features we see in messaging, email applications and search engines, Copilot makes instant suggestions to users as they type. These suggestions can range from a line of code to an entire function.
The Guardian’s UK Technology Editor Alex Hern identified a couple of simple tasks that programmers can now hand off to Copilot, including sending a valid request to Twitter’s API (application programming interface) and pulling the time in hours and minutes from a system clock. Although the big-eyed Copilot mascot may look innocent, Hern also identified some functions that are a little less helpful. These range from allegedly violating copyright (the subject of much debate on forums) to leaking functional API keys (i.e., providing access to an app’s otherwise inaccessible databases).
On infringing copyright, GitHub’s staff machine-learning engineer Albert Ziegler published a research paper assuring users that while “Copilot can quote a body of code verbatim … it rarely does so, and when it does, it mostly quotes code that everybody quotes, and mostly at the beginning of a file, as if to break the ice.”
While Ziegler’s use of the word “mostly” may not reassure those fearing copyright infringement, his paper highlights two details that might. First, verbatim code is only suggested about 0.1% of the time. Second, GitHub plans to integrate a duplication search into the user interface. A duplication search would identify overlap with Copilot’s training set to flag instances of duplicating direct snippets of code and identify where they originate from.
Intellectual property law professor Andres Guadamuz argues that Copilot, as it stands, does not infringe copyright. This is because Copilot would copy small snippets of commonly used code which are unlikely to amount to substantial reproduction or meet the threshold of originality necessary to be protected under copyright. Guadamuz explains that machine learning (ML) training is “increasingly considered to be fair use in the US and fair dealing under data mining exceptions in other countries.”
On the question of which country’s law governs GitHub’s activities, Internet, telecoms, and tech lawyer Neil Brown suggests “a reasonable chance that GitHub will claim that its service is provided by GitHub, Inc., which is established in the USA, such that [any other country’s] law is irrelevant.”
What About Copyleft?
Some licensing agreements contain “copyleft” obligations. Copyleft allows for the use, modification, and distribution of a work, or a portion of it, on the condition that the resulting work is bound by the same license. Some disapprove of code licensed under GNU’s General Public License (GPL) being included in Copilot’s training set, given that Copilot is a commercial work and the GPL has copyleft obligations. However, Guadamuz explains that under GPL v3, this obligation only arises where the copying is substantial enough to warrant copyright permission. As previously mentioned, Copilot’s activities likely do not meet this standard.
What Comes Next?
Profiting off the work of others without remuneration or their consent goes against the spirit of copyright protection. But what if using the work of others to train a commercial product results in a tool like Copilot that lowers barriers to coding and permits a wider audience to engage in the creation process? After all, encouraging innovation should be one of the primary functions of any copyright regime. The opinions, and possible legal decisions, that follow in the wake of Copilot’s launch, and the launch of similar ML features, will reveal what we value about copyright law and the direction it takes as technological complications arise.
The buzz surrounding Copilot is not the first time an autocomplete feature has landed a company in hot water. In the 2018 Australian High Court case of Trkulja v Google LLC, the plaintiff argued that Google’s autocomplete predictive search suggestions were defamatory. Although no final conclusion was reached, I anticipate that we will see more definitive cases emerge as autocomplete and predictive text tools, whether suggesting text or code, continue to develop and more instances of potential defamation and IP infringement take place.