If you use AI-generated code, you currently cannot claim copyright on it in the US.

Azuaron

@christianschwaegerl @JeffGrigg @jamie @fsinn @pluralistic You're making a bunch of different arguments now. The topic at hand was, "Is it copyright infringement to make and have an AI model trained on millions of books?" The answer is no. This is wholly legal.

Storing copyrighted work is legal.

Modifying copyrighted work is legal.

Storing modified copyrighted work is legal.

It doesn't matter if they have a model that is literally just plain text of every book, or if the model is a series of mathematical weights that go into an algorithm. It's already legal to have and modify copyrighted works.

What becomes illegal is reproducing and distributing copyrighted material.

No, whether it was for "commercial" or "non-commercial" purposes doesn't matter when determining if something is infringing.

No, whether it was "sold" or "distributed for free" doesn't matter when determining if something is infringing.

"What about Fair Use?" Fair use is an affirmative defense. That means that you acknowledge you are infringing, but it's an allowed type of infringement. It's still an infringement, you just don't get punished for it.

But, as already stated, nothing is infringement until there's a distribution. Without a distribution, no further analysis is needed. When a distribution occurs, it is the distribution that is analyzed to determine if it is infringing, and, if so, if there is a fair use defense. Everything that happens prior to the distribution is irrelevant when determining if an infringement has occurred, as long as the accused infringer acknowledges they have the copyrighted work (which AI companies always acknowledge).

There is one further step, because it is illegal to make a tool that is for copyright infringement. The barrier to prove this is so high, though. As long as a tool has any non-infringing uses--and we must acknowledge AI can generate non-infringing responses--then it won't be nailed with being a "tool for copyright infringement". This has to be, like, "Hey, I made a cracker for DRM, it can only be used to crack DRM. It literally can't do anything legal."

Even video game emulators haven't been hit with being "tools for copyright infringement" because there are legitimate uses for them (personal backup, archival, etc.), even though everyone knows they're 99% for infringement.

Christian Schwägerl

@Azuaron @JeffGrigg @jamie @fsinn @pluralistic So if somebody invents a gun that simultaneously produces soap bubbles, shooting someone is ok? I doubt it.
You’re trying to normalise LLMs with analogies of profane private behaviour. That’s fundamentally flawed.
LLMs have new characteristics, capabilities. There hasn’t been a machine before that could churn out one million versions of a novel in the style of a contemporary author or art by living creators in no time after being fed their work.

Jamie Gaskins

@SRAZKVT It’s a bit more complicated than that for reasons other than copyright (mentioned in my next couple posts in the thread). TL;DR: you may still have to defend it even if they can’t enforce copyright, and they may also have other grounds for lawsuit.

Azuaron

@christianschwaegerl @JeffGrigg @jamie @fsinn @pluralistic Buddy, I'm not trying to "normalize" anything, especially not LLMs. I'm telling you how the law works. I never said the law was good.

Christian Schwägerl

@Azuaron @JeffGrigg @jamie @fsinn @pluralistic It’s wide open how existing law will be interpreted and applied here, and which new laws will be created to capture the novelty of the technology. The Anthropic case is interesting. A large number of court cases will proceed and the differences between a private book purchase and an all-purpose multi-billion content production technology will hopefully be apparent to judges.

Paco (2026: New) Hope

@jamie It would be so weird if people think wholesale copyright violation at a global scale to train a model is acceptable. But then, what? Individual project-sized chunks of output ARE copyrightable? But then if you hoover up all of those projects on GitHub to train the next AI, including supposedly private and copyrighted AI-generated repos? That’s fair?

A bizarre situation. Goldilocks copyright. “This data is too small to copyright.” “This data is too big to copyright” “But this data is just right.”

vekkq

@jamie @Yuki this aligns with "You'll own nothing and be happy"

JWcph, Radicalized By Decency

@jamie Also of note, AI booster companies like Microsoft have a specific interest in hiding how much of their code is human-written/curated, in order to overstate the usefulness of their AI tools - seems like that might come back to bite them on the ass...

Bruno Nicoletti

@jamie I can imagine the law being quickly changed to come to the aid of the vibe-coders.

Quinn Norton

@jamie I would not like to be in the position of trying to explain that to a judge though.

..But I don't see the argument that because there is some AI generated code in something that it would necessarily void the copyright on the human generated code.