Take scraping. Companies like Clearview will tell you that scraping is legal under copyright law. They’ll tell you that training a model with scraped data is also not a copyright infringement. They’re right.
I love Cory’s writing, but while he does a masterful job of defending scraping, and makes a good argument that in most cases, it’s laws other than Copyright that should be the battleground, he does, kinda, trip over the main point.
That is that training models on creative works and then selling access to the derivative “creative” works that those models output very much falls within the domain of copyright - on either side of a grey line we usually call “fair use” that hasn’t been really tested in courts.
Lets take two absurd extremes to make the point. Say I train an LLM directly on Marvel movies, and then sell movies (or maybe movie scripts) that are almost identical to existing Marvel movies (maybe with a few key names and features altered). I don’t think anyone would argue that is not a derivative work, or that falls under “fair use.” However, if I used literature to train my LLM to be able to read, and used that to read street signs for my self-driving car, well, yeah, that might be something you could argue is “fair use” to sell. It’s not producing copy-cat literature.
I agree with Cory that scraping, per se, is absolutely fine, and even re-distributing the results in some ways that are in the public interest or fall under “fair use”, but it’s hard to justify the slop machines as not a copyright problem.
In the end, yeah, fuck both sides anyway. Copyright was extended too far and used for far too much, and the AI companies are absolute thieves. I have no illusions this type of court case will do anything more than shift wealth from one robber-barron to another, and won’t help artists and authors.
Say I train an LLM directly on Marvel movies, and then sell movies (or maybe movie scripts) that are almost identical to existing Marvel movies (maybe with a few key names and features altered). I don’t think anyone would argue that is not a derivative work, or that falls under “fair use.”
I think you’re failing to differentiate between a work, which is protected by copyright, vs a tool which is not affected by copyright.
Say I use Photoshop and Adobe Premiere to create a script and movie which are almost identical to existing Marvel movies. I don’t think anyone would argue that is not a derivative work, or that falls under “fair use”.
The important part here is that the subject of this sentence is ‘a work which has been created which is substantially similar to an existing copyrighted work’. This situation is already covered by copyright law. If a person draws a Mickey Mouse and tries to sell it then Disney will sue them, not their pencil.
Specific works are copyrighted and copyright laws create a civil liability for a person who creates works that are substantially similar to a copyrighted work.
Copyright doesn’t allow publishers to go after Adobe because a person used Photoshop to make a fake Disney poster. This is why things like Bittorrent can legally exist despite being used primarily for copyright violation. Copyright laws apply to people and the works that they create.
A generated Marvel movie is substantially similar to a copyrighted Marvel movie and so copyright law protects it. A diffusion model is not substantially similar to any copyrighted work by Disney and so copyright laws don’t apply here.
I think AI is a big Scam ( pattern matching has nothing to do with !!! intelligence !!! ).
And this Scam might end as the Dot-Com bubble in the late 90s ( https://en.wikipedia.org/wiki/Dot-com/_bubble ) including the huge economic impact cause to many people have invested in an “idea” not in an proofen technology.
And as the Dot-Com bubble once the AI bubble has been cleaned up Machine Learning and Vector Databases will stay forever ( maybe some other part of the tech ).
Both don’t need copyright changes cause they will never try to be one solution for everything. Like a small model to transform text to speech … like a small model to translate … like a full text search using a vector db to index all local documents …
They don’t want copyright power to expand further. And I agree with them, despite hating AI vendors with a passion.
For an understanding of the collateral damage, check out How To Think About Scraping by Cory Doctorow.
I love Cory’s writing, but while he does a masterful job of defending scraping, and makes a good argument that in most cases, it’s laws other than Copyright that should be the battleground, he does, kinda, trip over the main point.
That is that training models on creative works and then selling access to the derivative “creative” works that those models output very much falls within the domain of copyright - on either side of a grey line we usually call “fair use” that hasn’t been really tested in courts.
Lets take two absurd extremes to make the point. Say I train an LLM directly on Marvel movies, and then sell movies (or maybe movie scripts) that are almost identical to existing Marvel movies (maybe with a few key names and features altered). I don’t think anyone would argue that is not a derivative work, or that falls under “fair use.” However, if I used literature to train my LLM to be able to read, and used that to read street signs for my self-driving car, well, yeah, that might be something you could argue is “fair use” to sell. It’s not producing copy-cat literature.
I agree with Cory that scraping, per se, is absolutely fine, and even re-distributing the results in some ways that are in the public interest or fall under “fair use”, but it’s hard to justify the slop machines as not a copyright problem.
In the end, yeah, fuck both sides anyway. Copyright was extended too far and used for far too much, and the AI companies are absolute thieves. I have no illusions this type of court case will do anything more than shift wealth from one robber-barron to another, and won’t help artists and authors.
I think you’re failing to differentiate between a work, which is protected by copyright, vs a tool which is not affected by copyright.
Say I use Photoshop and Adobe Premiere to create a script and movie which are almost identical to existing Marvel movies. I don’t think anyone would argue that is not a derivative work, or that falls under “fair use”.
The important part here is that the subject of this sentence is ‘a work which has been created which is substantially similar to an existing copyrighted work’. This situation is already covered by copyright law. If a person draws a Mickey Mouse and tries to sell it then Disney will sue them, not their pencil.
Specific works are copyrighted and copyright laws create a civil liability for a person who creates works that are substantially similar to a copyrighted work.
Copyright doesn’t allow publishers to go after Adobe because a person used Photoshop to make a fake Disney poster. This is why things like Bittorrent can legally exist despite being used primarily for copyright violation. Copyright laws apply to people and the works that they create.
A generated Marvel movie is substantially similar to a copyrighted Marvel movie and so copyright law protects it. A diffusion model is not substantially similar to any copyrighted work by Disney and so copyright laws don’t apply here.
@FauxLiving @Jason2357
I take a bold stand on the whole topic:
I think AI is a big Scam ( pattern matching has nothing to do with !!! intelligence !!! ).
And this Scam might end as the Dot-Com bubble in the late 90s ( https://en.wikipedia.org/wiki/Dot-com/_bubble ) including the huge economic impact cause to many people have invested in an “idea” not in an proofen technology.
And as the Dot-Com bubble once the AI bubble has been cleaned up Machine Learning and Vector Databases will stay forever ( maybe some other part of the tech ).
Both don’t need copyright changes cause they will never try to be one solution for everything. Like a small model to transform text to speech … like a small model to translate … like a full text search using a vector db to index all local documents …
Like a small tool to sumarize text.
I agree, and I think your points line up with Doctorow’s other writing on the subject. It’s just hard to cover everything in one short essay.
Ahhh, it makes more sense now. Thank you!
Let’s give them this one last win. For spite.