"Copyright today covers virtually every sort of human expression" and cannot be avoided.
Apparently, stealing other people's work to create product for money is now "fair use" as according to OpenAI because they are "innovating" (stealing). Yeah. Move fast and break things, huh?
"Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials," wrote OpenAI in the House of Lords submission.
OpenAI claimed that the authors in that lawsuit "misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence."
There's not a musician that havent heard other songs before. Not a painter that haven't seen other painting. No comedian that haven't heard jokes. No writer that haven't read books.
AI haters are not applying the same standards to humans that they do to generative AI. Obviously this is not to say that AI can't plagiarize. If it's spitting out sentences that are direct quotes from an article someone wrote before and doesn't disclose the source then yeah that is an issue. There's however a limit after which the output differs enough from the input that you can't claim it's stealing even if perfectly mimics the style of someone else.
Just because DallE creates pictures that have getty images watermark on them it doesn't mean the picture itself is a direct copy from their database. If anything it's the use of the logo that's the issue. Not the picture.
That sucks for the creators ofcourse but if AI creates better content that's where people will go. That's a big if though especially in the near future
AI haters are not applying the same standards to humans that they do to generative AI
I don't think it should go unquestioned that the same standards should apply. No human is able to look at billions of creative works and then create a million new works in an hour. There's a meaningfully different level of scale here, and so it's not necessarily obvious that the same standards should apply.
If it’s spitting out sentences that are direct quotes from an article someone wrote before and doesn’t disclose the source then yeah that is an issue.
A fundamental issue is that LLMs simply cannot do this. They can query a webpage, find a relevant chunk, and spit that back at you with a citation, but it is simply impossible for them to actually generate a response to a query, realize that they've generated a meaningful amount of copyrighted material, and disclose its source, because it literally does not know its source. This is not a fixable issue unless the fundamental approach to these models changes.