We shouldn’t be affording companies the ability to profit off other people’s creations without their consent, and despite the intentions, that's basically how current copyright law works.
A long form response to the concerns and comments and general principles many people had in the post about authors suing companies creating LLMs.
I don't know what the authors are complaining about. All the AI is doing is trawling through a lexicon of words and rearranging them into an order that will sell books. It's exactly what authors do. This is about money.
In the article I explain that it is not exactly what authors do, we reading and writing are an inherently human activity and the consumption and processing of massive amounts of data (far more than a human with a photographic memory could process in a hundred million lifetimes) is a completely different process to that.
I also point out that I don't have a problem with LLMs as a concept, and I'm actually excited about what they can do, but that they are inherently different from humans and should be treated as such by the law.
My main point is that authors should have the ability to decree that they don't want their work used as training data for megacorporations to profit from without their consent.
So, yes in a way it is about money, but the money in question being the money OpenAI and Meta are making off the backs of millions of unpaid and often unsuspecting people.
I think it's an interesting topic, thanks for the article.
It does start to raise some interesting questions, if an author doesn't want they book to be ingested by a LLM, then what is acceptable? Should all LLMs now be ignorant of that work? What about summaries or reviews of that work?
What if from a summary of a book an LLM could extrapolate what's in the book? Or write a similar book to the original, does that become a new work or is it still fall into the issue of copyright?
I do fear that copyright laws will muddy the waters and slow down the development of LLMs and have a greater impact more than any government standards ever will!
I'm all for muddy waters and slow development of LLMs at this juncture. The world is enough of a capitalist horrorshow and so far all this tech provides is a faster way to accelerate the already ridiculously wide class divide. Just my cynical luddite take of the day...
I wish people would stop comparing AI to human beings, an AI using the product of your labor without your consent to emulate the characteristics of your work is not the same as an actual human being studying the works of someone for inspiration or to learn.
If you trained an AI to write a Sarah Silverman book, that is unethical, so I don't understand why it's ok to do that in just a more dispersed way. You're still profiting off someone else's work (sometimes life's work) without any kind of compenastion.
This fervor to shove the human being to the side in favor of AI is really dehumanizing and doesn't serve to foster creativity but to stifle and clip it for the profit of some company's bottom line. It's the worst aspect of capitalism made even more efficient, now not just stealing people's physical labor but scraping off the intangible qualities that define an individual to dump them into a machine. It's horrific.
They’re “complaining” about unique qualities of their art being used, without consent, to create new things which ultimately de-value their original art.
It’s a debate to be had, I’m not clearly in favour of either argument here, but it’s quite obvious what they’re upset with.
If it's a debate to be had then it's something that should have been debated hundreds of years ago when copyright was first invented, because every author or artist re-uses the "unique qualities" of other peoples' art when making their own new stuff.
There's the famous "good authors copy, great authors steal" quote, but I rather like the related one by C. E. M. Joad mentioned in that article: "the height of originality is skill in concealing origins."
I think the main difference between derivative/inspired works created by humans and those created by AI is the presence of "creative effort." This is something that humans can do, but narrow AI cannot.
Even bland statements humans make about nonfiction facts have some creativity in them, even if the ideas are non-copyrightable (e.g., I cannot copyright the fact that the declaration of independence was signed in 1776. However, the exact way I present this fact can be copyrightable- a timeline, chart, table, passage of text, etc. could all be copyrightable).
"Creative effort" is a hard thing to pin down, since "effort" alone does not qualify (e.g., I can't copyright a phone directory even if I spent a lot of effort collecting names/numbers, since simply putting names and numbers alongside each other in alphabetical isn't particularly creative or original). I don't think there's really a bright line test for what constitutes as "creative," but it doesn't take a lot. Randomness doesn't qualify either (e.g., I can't just pick a random stone out of a stream and declare copyright on it, even if it's a very unique-looking rock).
Narrow AI is ultimately just a very complex algorithm created based on training data. This is oversimplifying a lot of steps involved, but there isn't anything "creative" or "subjective" involved in how an LLM creates passages of text. At most, I think you could say that the developers of the AI have copyright over the initial code used to make that AI. I think that the outputs of some functional AI could be copyrightable by its developers, but I don't think any machine-learning AI would really qualify if it's the sole source of the work.
Personally, I think that the results of what an AI like Midjourney or ChatGPT creates would fall under public domain. Most of the time, it's removed enough from the source material that it's not really derivative anymore. However, I think if someone were to prompt one of these AI to create a work that explicitly mimics that of an author or artist, that could be infringement.
IANAL, this is just one random internet user's opinion.