We Didn’t Steal Your Code, It Was The AI!
Microsoft Copilot is a very interesting tool, with a bit of a larcenous streak. It is a programming aide that can turn natural language into working code, mostly intended for Visual Basic but also for a variety of other languages. Microsoft adopted the OpenAI Codex and trained it using billions of lines of code to be able to do this trick. The problem seems to be they never trained it to read licenses.
The code Microsoft Copilot was trained on came from a variety of open source software repositories, and since Microsoft bought GitHub back in 2018 it is an easy guess as to which repository much of the code came from. There is a small problem however, which has cause the launch of what could be a rather large lawsuit.
A lot of the code which it trained on, and liberally uses in it’s translation from natural language to code is covered under GPL, Apache, MIT and other OSS licenses. Those licenses require the author’s name be attributed when the code is used, and Microsoft Copilot does not, even when the snippets are longer than 150 characters. To make matters worse, some of the code it grabbed contained secrets that were published on public repositories but were not for general consumption.
It will be interesting to see how this plays out.