If you use AI-generated code, you currently cannot claim copyright on it in the US.

Jamie Gaskins

@jik In other parts of this thread, this is being discussed. I was limited on space, so I took shortcuts. What I meant is that, in order to enforce your copyright, you need to prove you own the copyright. Registering it is the single most effective way to do that.

If you can't register your copyright, you (effectively) can't enforce it.

If you can't enforce your copyright, your copyright vs public domain is a distinction without a meaningful difference.

I couldn't fit all that in the post.

Klaus Stein

@jamie
This is not my point. Even if it _is_ “fair use”: if the llm produces a 1:1 copy (minus some renamed variables) of some relevant piece of code it is not producing something “new”. As a human I can learn from any code (copyrighted or not), but I cannot just take the code, rename some variables and publish it as my own creation. I would loose in court.¹

So technically if you use an LLM to produce code for you you need to check if any relevant piece of it is a copy of anything that exists.

Clean room implementation requires the programmer to not have seen the original code but only the requirements.

__
¹otherwise you could just take any piece of copyrighted code, rename variables and say it is yours because an LLM has produced it.

Azuaron

@JeffGrigg @christianschwaegerl @jamie @fsinn I think this is missing the point and the law (at least, US copyright law).

I buy a book. I then own that book. I can cut that book into individual pages. I can scan all those pages into my computer. I can have an image-to-text algorithm convert the text in the images into an ebook. I can do this to a billion books. I can run whatever algorithms I want on the text of those books. I can store the resulting text of my algorithms on my computer, in any format.

This is all legal, for both me and for any company. Copyright does not prevent use of a work after it has been sold, "use" meaning just about anything--short of distributing the work.

Because what copyright protects against is the reproduction and distribution of copyrighted works. For AI companies, that "distribution" doesn't happen until somebody puts a prompt into the AI, and receives back a result. That result is the distribution. To sue an AI company for copyright infringement, you would have to have a result that infringes on your copyright, and you would have to prove that the AI company was more than just a tool that the prompter used to infringe your copyright.

For the Disney example, if somebody prompted, "Darth Vader in a lightsaber duel with Mickey Mouse," it would be an uphill battle to prove the AI company is responsible for that instead of just the prompter. The argument that the AI company would make is that the prompter clearly used the AI as a tool to make infringing work, but just like you can't sue Adobe if someone used Photoshop to make the same image, you can't sue the AI company because someone used it as a tool to infringe copyright.

Now, I don't find that a wholly persuasive argument because of the, frankly, complicity in the creation that AI has that Photoshop doesn't, but that's definitely the argument they would make, and judges have seemed receptive to that and similar (and even worse) arguments.

As far as I'm concerned, the original point of this thread proves that the AI company should be mostly-to-wholly responsible, even if the prompter was deliberately asking for infringing works. After all, AI-generated work is not copyrightable because it is not human created, it is computer created.

If it's not human created, how can the human be responsible for the infringement?

If it is computer created, then isn't the computer's owner responsible for the infringement?

After all, if I ask a digital artist to create me "Darth Vader in a lightsaber duel with Mickey Mouse," and they do, the digital artist is on the hook for that infringement. They reproduced the work, and they distributed it. There is a "prompter" and a "creator" in both scenarios; it seems illogical that if the "creator" is a human, they're responsible, but if the "creator" is a computer, they aren't responsible.

This is, per @pluralistic, "It's not a crime, I did it with an app!" Why we let apps get away with crimes we'd never tolerate from people, I don't know. But that's where we are.

Garrett Wollman

@jamie @starr (The Berne Convention allows this because the formalities are only required to file suit, so it's no different under the convention from having to present any other form of documentary proof before a court. Copyright law in general was built on a centuries-old threat model of "infringer produces 10,000 copies of one work" and not "infringer produces one copy of 10,000 works" let alone the millions in various pirated e-book collections.)

Jonathan Kamens 86 47

@jamie Right. You didn't have enough space. You couldn't have, oh, I dunno, posted correct information on multiple posts. You know, like the multiple posts on which you posted the incorrect information.
*plonk*

Christian Schwägerl

@Azuaron @JeffGrigg @jamie @fsinn @pluralistic For a start, you bought the book. I doubt AI hyperscalers have met that minimum requirement. Secondly, you buy the book for your private use, not for commercial purposes. Thirdly, you describe reproduction for private purposes. Reproduce and sell, and you infringe. Fourth, you don’t use the book to instruct a machine to paraphrase the content, produce quotes and false quotes, and to write in the style of the author in an infinite number of cases.

David

@jamie

"forfeit your copyright claim on *the entire codebase*" seems very unlikely since they're reissuing copyright on the human-authored parts of one of the works mentioned in your post:

> Because the current registration for the Work does not disclaim its Midjourney-generated content, we intend to cancel the original certificate issued to Ms. Kashtanova and issue a new one covering only the expressive material that she created.

https://www.copyright.gov/docs/zarya-of-the-dawn.pdf

Starr Horne

@wollman @jamie this is all really interesting. Sounds like I misunderstood the importance of registering. I had thought that as long as you could prove that you had created a work, you were good. And I had recently read an article about someone tracking down a lost pilot for a sitcom to the LOC where they were able to watch it, so I had assumed that was how it generally worked.

Garrett Wollman

@starr @jamie Audiovisual works being relatively easy to display in the library and also a part of the national cultural heritage, the LOC does tend to require deposit, but it's up to the librarians to decide whether they will add a copy to the nation's collection. I should read up on what they do for computer games, where the "work as a whole" may not even exist in one place or be in any way functional offline.

Azuaron

@christianschwaegerl @JeffGrigg @jamie @fsinn @pluralistic You're making a bunch of different arguments now. The topic at hand was, "Is it copyright infringement to make and have an AI model trained on millions of books?" The answer is no. This is wholly legal.

Storing copyrighted work is legal.

Modifying copyrighted work is legal.

Storing modified copyrighted work is legal.

It doesn't matter if they have a model that is literally just plain text of every book, or if the model is a series of mathematical weights that go into an algorithm. It's already legal to have and modify copyrighted works.

What becomes illegal is reproducing and distributing copyrighted material.

No, whether it was for "commercial" or "non-commercial" purposes doesn't matter when determining if something is infringing.

No, whether it was "sold" or "distributed for free" doesn't matter when determining if something is infringing.

"What about Fair Use?" Fair use is an affirmative defense. That means that you acknowledge you are infringing, but it's an allowed type of infringement. It's still an infringement, you just don't get punished for it.

But, as already stated, nothing is infringement until there's a distribution. Without a distribution, no further analysis is needed. When a distribution occurs, it is the distribution that is analyzed to determine if it is infringing, and, if so, if there is a fair use defense. Everything that happens prior to the distribution is irrelevant when determining if an infringement has occurred, as long as the accused infringer acknowledges they have the copyrighted work (which AI companies always acknowledge).

There is one further step, because it is illegal to make a tool that is for copyright infringement. The barrier to prove this is so high, though. As long as a tool has any non-infringing uses--and we must acknowledge AI can generate non-infringing responses--then it won't be nailed with being a "tool for copyright infringement". This has to be, like, "Hey, I made a cracker for DRM, it can only be used to crack DRM. It literally can't do anything legal."

Even video game emulators haven't been hit with being "tools for copyright infringement" because there are legitimate uses for them (personal backup, archival, etc.), even though everyone knows they're 99% for infringement.

Christian Schwägerl

@Azuaron @JeffGrigg @jamie @fsinn @pluralistic So if somebody invents a gun that simultaneously produces soap bubbles, shooting someone is ok? I doubt it.
You’re trying to normalise LLMs with analogies of profane private behaviour. That’s fundamentally flawed.
LLMs have new characteristics, capabilities. There hasn’t been a machine before that could churn out one million versions of a novel in the style of a contemporary author or art by living creators in no time after being fed their work.

Jamie Gaskins

@SRAZKVT It’s a bit more complicated than that for reasons other than copyright (mentioned in my next couple posts in the thread). TL;DR: you may still have to defend it even if they can’t enforce copyright, and they may also have other grounds for lawsuit.

Azuaron

@christianschwaegerl @JeffGrigg @jamie @fsinn @pluralistic Buddy, I'm not trying to "normalize" anything, especially not LLMs. I'm telling you how the law works. I never said the law was good.

Christian Schwägerl

@Azuaron @JeffGrigg @jamie @fsinn @pluralistic It’s wide open how existing law will be interpreted and applied here, and which new laws will be created to capture the novelty of the technology. The Anthropic case is interesting. A large number of court cases will proceed and the differences between a private book purchase and an all-purpose multi-billion content production technology will hopefully be apparent to judges.

Paco (2026: New) Hope

@jamie It would be so weird if people think wholesale copyright violation at a global scale to train a model is acceptable. But then, what? Individual project-sized chunks of output ARE copyrightable? But then if you hoover up all of those projects on GitHub to train the next AI, including supposedly private and copyrighted AI-generated repos? That’s fair?

A bizarre situation. Goldilocks copyright. “This data is too small to copyright.” “This data is too big to copyright” “But this data is just right.”

vekkq

@jamie @Yuki this aligns with "You'll own nothing and be happy"

JWcph, Radicalized By Decency

@jamie Also of note, AI booster companies like Microsoft have a specific interest in hiding how much of their code is human-written/curated, in order to overstate the usefulness of their AI tools - seems like that might come back to bite them on the ass...

Bruno Nicoletti

@jamie I can imagine the law being quickly changed to come to the aid of the vibe-coders.

Quinn Norton

@jamie I would not like to be in the position of trying to explain that to a judge though.

..But I don't see the argument that because there is some AI generated code in something that it would necessarily void the copyright on the human generated code.