AI Models Show Signs of Falling Apart as They Ingest More AI-Generated Data

Ffs, neural networks and LLMs have their place and can be useful, but setting up datacentres that snort up the entire internet indiscriminately to create a glorified chatbot that spews data that may or may not be correct is insane.

Another problem I've realized today, is the proliferation of data that was originally hallucinated by AI.

I was discussing an issue on a software with a coworker and he asked an AI for help configure around it. He then sent me "apparently we can try changing this setting to this value". I told him to first validate if that setting really existed because AI tends to make up things like that when it's what you would want to hear and running a test would take us 20~30 minutes.

He found some discussions about that setting not working as people expected. "ok at least it exists then" and we tried it. It didn't work. I later cloned the source of that software and checked, the setting didn't exist - ever.

I love that you even specifically said, "Yea, let's check to make sure that setting exists to begin with." To which instead of actually fucking checking, they proceed to google more about the setting and use someone else's 'discussion' online of it not working as proof that it does exist, even though they were likely having that discussion because the setting didn't exist.
This is also how I can tell this story is 100% true.
I don't miss working support at all and am reminded of it like this daily
- The benefit of working with open source code bases or being able to check the source for existing features.
  It's very common for there to be hidden settings, with open source one can look at the code based but with closed source a search may be one's only hope.

AI ingesting AI slop and falling apart is not dissimilar to boomers ingesting rightwing slop and falling apart.

I predicted this. It is similar to a photocopy of a photocopy that eventually ends up a mess of garbage noise.

Garbage in, garbage out.

Who could have possibly predicted that?

The recycling industry begs to differ. Well, exceptions prove the rule.
- Depends what materials you're recycling. Glass and plastic both require virgin material, else you'd get garbage out.
This has always been true, but LLMs have expedited the process by taking the garbage out and sticking it right back into the input.

If I had the money and a computer able to handle the amount of stuff I'd be throwing at it with a local model, I would have a giant website full of AI generated nonsense purely for the purpose of letting AI gobble it up to help the AI incest problem.

Imagine if a whole metric ton of "websites" did this. The thieving AI companies would either have to start blocking all of these sites or deal with an issue they don't wanna because they're too stingy and will probably just have their AI try ( and fail ) to fix the problem.

Better yet, feed the nonsense to the crawlers that ignore robots.txt.
- That was certainly cool. Now I wish I could use tools like that nepenthes on my neocities page.

Good. Eat yourself you technological prion disease.

This is a rare insult and I like it

Good. Poison the AI well. Rot this shit to the ground.

It's very tempting to have schadenfreude about this failure but also disgusting that so much has been invested in it that should have been put to better use.

It's just another example of a system whose narrow definition of success is taking human and environmental value and using it to extract more. It's not aimed at solving worthwhile problems or making things better, which is why people are becoming more miserable and the planet is getting wrecked.

You could say that it's the system we live in which is the AI, feeding on itself and becoming more sick.

The schadenfreude is what we’re here for! We can’t do anything about the waste of investors’ money. They could’ve spent it all on fireworks instead. That probably would’ve been more fun!
As for the system? I prefer not to think about it. Too much systemic thinking is bad for mental health. Much better to enjoy some schadenfreude and save your serious thinking energy for things you have the power to change, especially where they can make life better for you and those around you.
- I agree with all your points! What I will add though is that what we think of as 'investors money' is actually value that has been extracted from the environment and from workers.
In my case it's not so much schadenfreude as just wanting this nightmare era to end as quickly as possible. The sooner this LLM shit dies the sooner we can start to recover and move on, in terms of stopping the senseless waste of water and energy and maybe starting to rebuild some kind of useful internet.

Aww boo hoo, did someone generate a degenerative feedback loop? Yeah? Did someone make a big ol' oppsy whoopsy that's gunna accelerate in hallucinations and slop as it collapses in on itself? How's the coded version of a microphone whine going to go, you silly buttholes?

People are putting AI generated pitfalls to guard their content.
They reference nonsense links that usually cannot even be seen by normal users, the AI reads the pages and finds more garbage links even as more are generated by the site.
- It's just so unfortunate that, in causing AI to delve down these winding paths, to propagate these slopfest feedback loops, the computers that are running the AI are burning real resources, polluting our atmosphere.
  Unfortunate is not the right word to describe the deep lament I feel, to cause such destruction for so little, if any, gain at all. My heart is heavy with regret for us all. Not just you and I, but for beast, bird, plant as well. Such a shame.

So, reading this article, it's not about model collapse, but about RAG - letting the AI model google the question essentially. The problem is, the first 10 pages of google search results are all low effort adfarming slop sites, because of course it is, which is making the answers from the AI worse, as these slop sites often have incorrect or otherwise unproofed articles, which biases the AI to fork out the wrong answer.

I'm sure the major AI services will try and fix this with some slop site detection routines.

I'm sure the major AI services will try and fix this with some slop site detection routines.
Which will be run by AI 🙃
- Don't forget! A lot of the slop on those first few pages of results is AI-generated.
  Ouroboros is a very apt moniker for this phenomena.
Mad AI Disease.
some slop site detection routines.
Why would they? I mean how are their incentives different from that of the search engine operators themselves?
I can see a future when the internet is degraded to a point where if you try to find out how to peel an apple, you will get back word salad and 25 different porn ads.
Yesterday, there was the usual slew of artificial computer-generated news stories on YouTube about GM closing down all factories in north america (happens about once a month).
Well I typed in "is GM closing down in the US" in Google and the Gemini generated answer said "Yes, GM has announced the closure of all plants in the US" and put up those fake YT videos as reference....
I’m sure the major AI services will try and fix this with some slop site detection routines.
They already do this through data determination routines in LLMs, unfortunately they suffer from the same type of infection as the data itself.
- You probably would get better results from literally any other AI Gemini is routinely the worst. I don't know what Google are playing at surely they could actually put some real effort into this but they just seem to be doing it in the most naive way possible.
  It comes to something when the Chinese are been the most innovative.
I'm sure the major AI services will try and fix this with some slop site detection routines.
No they will not, because this will harm their short term bottom line, which is always, "add short term value for the shareholder"
- Unless the shareholder also owns the search slop site, it's competition to revenue and in good interest to filter out.
- Plus, it's not an easy task.

The silver lining of AI slop filling the WWW

I've been predicting this for a while now and people kept telling me I was wrong. Prepare for dot com burst two, electric boogaloo.

I hope it crashes but what if the market completely embraces feels-based economics and just says that incomprehensible AI slop noise is what customers crave? Maybe CEOs will interpret AI gibberish output in much the same way as ancient high priests made calls by sifting through the entrails of sacrificed animals. Tesla meme stock is evidence that you can defy all known laws of economic theory and still just coast by.

Oh no! I HOPE us Taxpayers can Bail Out these AI Companies when they go Under! AFTER ALL we CUT my Child's LIFESAVING MEDICATION so I KNOW we have the Funds to Help these Poor Billionaire CEOS!

I can't afford groceries now! I'm sure all those billionaires will help us out now that they've got a little but more though.
volume::normalize(that)
Help these Poor Billionaire CEOS!
Right, self-made billionaires for whom the way to success was already paved by subsidies. Yes, those surely need help to "build" absolutely pointless non-working projects that are supposed to "save humanity". That's great. /$

There is a solution to this. Make a **perfect ** AI detecting tool. The only way I can think of is through adding a tag to every bit of AI-generated data,

Though it could easily be removed from text, I guess.And no, training AI to recognize AI will never work. Also every model would have to join this, or it won't work.

Related XKCD

LOL you're suggesting people already doing something unbelievably stupid should do something smart to compensate.
- Not stupid, greedy.
Also people won't be able to pass AI work off as their own if it is labeled as such. Cheating and selling slop is the chief use for AI so any tag or watermark will be removed on the vast majority of stuff.
There's also liability. If your AI generates code that's used to program something important and a lot of people are injured or die, do you really want a tag that can be traceable to back the company to be on the evidence? Or slapped all over the child sex abuse images that their wonderful invention is churning out?

Human society does the same thing.

We're ok when we talk about what we saw.

Less so when we talk about what somebody else saw.

Crazier and crazier when we talk about what somebody said about what somebody said about what somebody saw. Which is arguably the internet.

fill up your free cloud services with ai generated info. i mean thousand text file. like "how to make homemade butterfly". all of them will scrap by ai.

Cue Price is Right failure trombone.

Hah! I heard that in my head!

makes me think about the human centipede

Him: Ugh I don't feel so good after all this data.

Her: Data is a nutritious source of information. However, ingesting too much data can trigger some unpleasant side effects. Here's what you can do to alleviate some of the symptoms:

Drink water
Lie down and rest
Listen to the sounds of nature

Is there anything esle I can do for you?

How much money was invested in reminding us that if the snake starts eating its tail it's eating itself?

Good.

The great news is that these ponze schemes will either collapse or spend the next decade trying to fix it by creating algorithms which detect AI content so as to filter it out.

AI dementia

Human cent-ipad.png

That is so much better than their attempt (the "Lord of the Flies for AI" byline). Captures the essence of the problem better than the ~~capitalism~~ cannibalism metaphor does, as well.
EDIT: That has to have been one of my favourite Freudian ADHD word-confusion typos I accidentally made there
Cuttlefish or asparagus?

Tragic and funny at the same time. As if consuming all of Reddit hadn’t already irreparably skewed things and that was still real people doing Reddit things. Now, released, it’s eating itself. This self-poisoning model seemed inevitable.

Snake eating its own ass.

Oroboros can have a little peice of oroboros, as a treat.
much hotter of a mental image than this deserves

good news everybody!

AI, the one currently used for actual productive work by scientific researchers, healthcare specialists, energy development, manufacturing, agriculture and such, is poised to be able to handle about 20% of all human related work by 2040.

By 2043, it will be able to handle 100% of any human related work in the fields. The takeoff is merely 3 years

It's fine if you guys want to live in a little mental bubble where this doesn't happen

But I'd suggest you start getting ready for what comes next.

Oh boy I can't wait for our currently robust social safety net and already existent universal basic income to allow us to live a life pursuing the things that make us happy, rather then multi-billionaires firing everyone and the world becoming a plutocracy where the average person struggles to get even the bare minimum.
You sound exactly like Christian doomsday cultists screaming about the end times. I'll believe it when I see it.
Should cite sources for this if you want it taken seriously.
- The source is a research paper that the AI I community have been going on about for a few days now. I can't link to it right now because I'm at work but I'll update when I can.
  But if you Google for it you will find it as it's been a fairly hot topic the last few days.
you were promised it will. You paid out the ass for it. Money's gone, that shit ain't happening :)
And what comes next is?
Death?
Oddly specific but okay
I think you should post sources for your claims. This sounds stupidly wrong. Are you American?
How exactly?

you realize what this means, right?

who is causing all the backwashed data? the peasants.

who is training the models? the peasants.

who benefits the most from AI? the oligarchy.

I bet in a year or two, access to AI will be cost prohibitive and will be illegal to host without an expensive license.

how does this benefit the oligarchy you ask?

because the oligarchy is the government now, and AI needed the support of the peasants to get infrastructure up and running well enough to run on its own.

they're just going to use AI to oppress the peasants and ensure they know their place as slave labor.

congrats everyone who supported AI by praising and promoting it as a solution, you fucked yourself.

We have accessible, open-source AI models - your predictions won't come to pass.
who is causing all the backwashed data? the peasants.
No, actually, it's the shitty slop sites. I mean they are usually not made by Big Tech, but it is also not your rando Twitter posts either.
I bet in a year or two, access to AI will be cost prohibitive and will be illegal to host without an expensive license.
I can run a Chinese model on my sub-1000 EUR GPU right now and generate all the word salad I want. I know, I know, they will make better models. But that's the point, if they lock away better models, all the slop will be made with the worse models.
The point is, all this means is that you can't infinitely train AI on random internet content, and the value of social media as an AI training data source is going down since they are also getting infected with slop. This is actually a good thing, because one way SaaS models could have gotten better than freely hostable ones is by having access to data that is not openly accessible.
These news mean that data they could have used as a differentiator is a pile of hot shit.
- you're a peasant and don't even realize it because you're not a part of the "club". same as all those slop sites. they aren't part of the club and so they're lowly peasants.
  there were talks of making those Chinese models illegal. not much harder to just say anyone that's not in the club can't have one either, and if you're caught you go to jail.
How would they stop going to a different country where the AI license doesnt exist?
- ever heard of the pirate bay? they certainly got fisted by the long arm of American Jurisprudence even though they weren't in the US...