Recently, I've been trying to find out whether it's legal to use tools like GitHub Copilot or AWS Code Whisperer on my customer's code. To be honest, I still don't know. I expected the answer to be a simple "Yes, you can!" but it doesn't seem that easy.
I'm no legal expert. This text is a layman's assessment of a legal issue. Please do not take it at face value. I've written it to ignite thought and prompt questions. It's by no means legal advice. But no matter what you make of it, you're still responsible, and I urge you to consult your lawyer. Copyright law is a hornet's nest.
One of the disturbing things to consider is that these AI assistants have been developed with American law in mind, but that's not necessarily the law that applies to you. If you don't live in the United States of America, you have to obey your local law. So the key takeaway of this article is: before using GitHub Copilot, AWS Code Whisperer, or any other AI assistant on your customer's code, ask your lawyer, your customer, your customer's lawyer, and your boss.
AI assistants are great tools
Don't get me wrong: I love my AI coding assistants. I've only just started to explore the possibilities, and it's just great. All this tedious, repetitive work driving me away from the Java language is a thing of the past. Just one example: everybody but me loves the builder pattern. To me, it's just bloatware. It's writing tons and tons of code for a benefit I don't believe in. AI coding assistants are a game-changer. All of a sudden, I noticed that I didn't mind using the builder pattern. I just hate implementing it. If I have an assistant doing all the tedious work, I'm okay with it.
I didn't check if one of the AI coding assistants could write a builder pattern for an arbitrary object for me. But that's not the point. At the moment, the capabilities of generative AI models are growing exponentially. What's science fiction today is tomorrow's reality. Maybe my AI assistant can't do that today, but I'm sure that's only a matter of time. Currently, I'm thinking one or two years ahead.
The Darwinist point of view
After using AI code assistants for several months, I can say they make me more productive. Sometimes, they are annoying, but surprisingly, often, I can accept the suggestions, which frequently cover multiple lines.
That, in turn, means Darwin's evolution kicks in. In the long run, your company will outrun every company that doesn't embrace AI assistants.
And that's a problem. At the moment, there's uncertainty (to put it mildly) about whether using an AI assistant for developing code copyrighted by your customer is legal. If you're using an AI assistant without being careful, you might get sued. If you don't use an AI assistant, you won't get sued, but you're more expensive than your competitor.
In other words, I'm afraid the law can't stop AI coding assistants. But it can cause a lot of trouble for developers.
To my surprise, most articles on the internet are about a topic I didn't even consider a problem. Call me naive, but recently, GitHub has cared a lot about the copyright of the source code they're hosting, so I didn't expect them to be careless. Nonetheless, rumors have it that GitHub Copilot sometimes suggests copyright-protected code.
GitHub employees tend to say they've trained their AI on a "fair use" basis. It's up to the consumer to double-check whether the code suggestion is subject to copyright or not.
Of course, that's an ivory-tower idea. It requires developers to run a plagiarism check. That, in turn, leads me to the next issue.
Sending your customer's intellectual property to a server
Running a plagiarism check means sending your source code to a server. That's just the way plagiarism checks work.
However, if the code is the intellectual property of your customer, that's a no-go. They must agree in the first place.
Here's the catch. Using an AI assistant means sending the line at the cursor to the server all the time. Over time, the server can collect your entire code base.
The AI assistant diligently assures you they care a lot about end-to-end encryption. That's good, but it still means you can't know what happens to your source code.
Sending your customer's intellectual property to an American server
It's worse if you're living in the European Union. The Safe Harbour agreement is a thing of the past. The successor, the Privacy Shield, has been declared invalid by the European Court of Justice. That means you mustn't send personal data to the United States of America. Source code usually isn't personal data, but this example shows you should be careful.
What about your log file? Or your password files?
It's getting worse. If you're like me, you open log-files or even files containing credentials in your IDE all the time. AI coding assistants send the content of your files to the server all the time. So you have to configure the AI assistant not to do that.
While that's possible theoretically, I doubt that's the way to go. Sooner or later, one of your developers forgets to configure the AI assistant before opening a sensitive data file. So, in practice, it's better to assume every file you're opening in your IDE will be sent to the AI assistant server.
In a nutshell, that amounts to sharing your project with a party that may be trustworthy now. But we've all read the stories about stolen databases. You can't predict what's going to happen with your intellectual property. Don't do that. You can do that if you're working on an open-source project, but every other case is a no-go. Better wait until offline AI assistants are available. That may happen sooner than you think. The other day, I've seen a live demo of a generative AI model running in a browser. It was eating battery power like crazy, but even so, it was a glimpse of the future.
Wrapping it up
At the moment, I'm pretty clueless. Ignoring AI coding assistants is not an option, but using them opens a Pandora's box of legal issues.
For what it's worth, I've been using an AI coding assistant in my open-source projects, but not at work. As a side effect, work has become a lot more tedious, while open-source is as much fun as it always has been.
My current solution is asking ChatGPT carefully edited and anonymized questions how to implement an algorithm. That's pretty cool. Now, I'm an almost productive Python programmer. However, the carbon footprint is enormous. Official numbers are hard to find, but I've read estimations that using a generative AI model several times an hour consumes more energy than your light bulb. Come to think of it, GitHub Copilot and AWS Code Whisperer make such a query after every keystroke, give or take a few. Maybe we should think twice before using them.
Interview with Steffen Brandt about GitHub Copilot and data protection (podcast in German language)