"The machine did what we asked!!!" is an increasingly common and consistently stupid headline.
Absolutely nobody promised the draw-anything machine would invent perfectly unique images, unburdened by any prior intellectual property concerns. How the fuck would it. We fed it every image on the internet, with labels; no kidding it knows what Darth Vader looks like. And it'll emit images of him in a Star Wars-y context, with other related elements. That's what it's for.
Ask for a deer and you get a forest.
Ask for trees and it should pick a particular kind of tree, rather than making up some whole new species that's never been seen. Ask for images of a cartoon family, and yeah, sometimes that will be the Simpsons. What else was it supposed to show you? A lot of example trees are oak. A lot of example cartoons are Bart.
When you explicitly name a character and a movie, you have negative room to complain that the machine did its fucking job. Cutesy indirect descriptions are no better. You know damn well which cartoon families have yellow skin, and which sci-fi series have laser swords. Don't type "tree with needles instead of leaves" and go "Ah-HA! Spruce!"
Generative AI is based on "predicting" and generating the next token. Tune it one way and it will regurgitate its training data exactly. Tune it the other way and the words it comes up with are nonsense. Tune it just right and it comes up with something that seems creative.
The problem is that the training data is always in there somewhere. It can't generate something in the style of Shakespeare without containing Shakespeare as reference. That's probably fine for Shakespeare which is out of copyright, but if it contains say Stephen King's entire collected works, that's another issue.
If a human writer read all of Stephen King's books then tried to write in the style of King, that would be OK, but that's because a human can't memorize everything King has written word-for-word. When a human reads King, they don't build up a database of "probable next word frequency", instead they build heuristics having to do with how he approaches dialogue, how he reveals character, how he builds tension, etc. They may remember one especially memorable line or two, but the bits they remember, even if written down word-for-word would probably not be enough to be copyright infringing on their own.
I would bet that we've come too far to completely scrap generative AI. Too many billions have been invested, and the companies have too much political power. So, the question is whether there will be significant changes to copyright law. On one side of that fight will be the trillions of dollars behind the entertainment industry. On the other side of that fight will be the trillions of dollars behind the tech industry. Of course, individual artists will be trampled in the process.
A network containing much of its training set is broken.
Deep networks do find heuristics. That's what all the layers are for. That's why it takes abundant training, instead of abundant storage. We already had computers that can give you the next word of a Stephen King novel... they're called e-books.
Tune AI just right and it'll know that Stephen King writes horror, in English - having distilled both concepts from raw data. Grammar is a demonstration of novel output. The fact these things can conjugate a verb (or count fingers on a hand) is deep magic. There's hints of them being able to do math, which you'd think is trivial for a supercomputer, except it'd have to be doing math roughly the same way you do math.
Anyway: generative LLMs should ideally contain about as much original data per-subject as its Wikipedia article. Key names, general premise, relevant dates, and then enough labels to cobble together some kind of bootleg.
The trouble comes from people making question-answering LLMs, which for obvious reasons are supposed to contain all the details necessary to pass a pop quiz. This is fundamentally at-odds with making shit up. (It's also not very good at answering questions, so they should really focus on training a network that can evaluate text instead of training a network on that text.)
Image AI seems entirely focused on making shit up, which makes the blatant overfitting in MidJourney a head-scratcher. Knowing what Darth Vader looks like is a non-event. Everyone knows what Darth Vader looks like, and everyone knows he correlates strongly with laser-swords. Even being able to draw vaguely cinematic frames is whatever, because it turns out a lot of things look like a lot of other things. But some of those Dune examples are trying to pass a pop quiz. That's just incorrect behavior.
The draw-anything machine should absolutely be able to draw frames that look like they're from Denis Villeneuve's adaptation. Key words: look like. Floppy hair, muted colors, recognizable specific actors, sure. Probably even matching the framing of one shot or other, because again, movies look like movies. But if any specific frame is simply being reproduced, the process has gone wrong. That's simply not what it's for.
It seems though that in the long run, the line between a human reading Shakespeare and coming up with their own version and computer doing the same will be thinner and thinner. After all we are really just biological computers. One could imagine a computer "thinking" of things the same "way" that we do. What then?
One could imagine a computer “thinking” of things the same “way” that we do.
One can imagine it, but that's been the impossible nut to crack ever since the first computers. People were saying that artificial intelligence (what we now want to call AGI instead) was 5 years away since the 1970s, if not earlier.
The new generative systems seem intelligent, but they're just really good at predicting the next word. There's no consciousness there. As good as LLMs are, they can't plan for the future. They don't have goals.
The only interesting twist here is that consciousness / free will might not really exist, at least not in the form most people think of it. So, maybe LLMs are closer to being "thinking" computers not because they're getting closer to consciousness / free will, but because we're starting to realize free will was an illusion all along.
I'd be careful with the plagiarism argument. People uploaded content to Meta/reddit and tons of other where the use term allow them to make commercial usage of the content you uploaded and it's derivative. Not sure whether these use terms have been challenged in court but Meta and others has a massive database of image they can use to train their AI and reuse commercially. Artists knew it was going to happen when they started to post content on insta.
We do still have an issue though being able to generate Sailor Moon or the Simpson means they have copyrighted data in their training dataset, and it's a serious risk for free/libre model (while Meta is big enough to tell Toei animation to fuck off.)
Fascinating! I often wondered if corporations used hyper-specific prompts in an effort to get an image as close as possible to the original so they could blame the image generator for plagiarism, (then sue them and naturally get a crap ton of money from doing so), but the prompts used here seem very generic, yet beat an uncanny resemblance to these screencaps.
There is some debate about the ethics of it, but supposedly there should be no legal problem with using copyrighted images for a dataset so long as the outputs are transformative (i.e. don't resemble any one image too closely). I wonder if there's anything the developers can do anything to prevent it, or if it's just something an image model will inevitably do.