It's best not to dwell on it
It's best not to dwell on it
It's best not to dwell on it
ChatGPT is a tool. Use it for tasks where the cost of verifying the output is correct is less than the cost of doing it by hand.
Honestly, I've found it best for quickly reformatting text and other content. It should live and die as a clerical tool.
Which is exactly why every time I see big tech companies making another stupid implementation of it, it pisses me off.
LLMs like ChatGPT are fundamentally word probability machines. They predict the probability of words based on context (or if not given context, just the general probability) when given notes, for instance, they have all the context and knowledge, and all they have to do it predict the most statistically probable way of formatting the existing data into a better structure. Literally the perfect use case for the technology.
Even in similar contexts that don't immediately seem like "text reformatting," it's extremely handy. For instance, Linkwarden can auto-tag your bookmarks, based on a predetermined list you set, using the context of each page fed into a model running via Ollama. Great feature, very useful.
Yet somehow, every tech company manages to use it in every way except that when developing products with it. It's so discouraging to see.
Youre still doing it by hand to verify in any scientific capacity. I only use ChatGPT for philosophical hypotheticals involving the far future. We’re both wrong but it’s fun for the back and forth.
Talking with an AI model is like talking with that one friend, that is always high that thinks they know everything. But they have a wide enough interest set that they can actually piece together an idea, most of the time wrong, about any subject.
Isn't this called "the Joe Rogan experience"?
I am sorry to say I can frequently be this friend...
I feel this hard with the New York Times.
99% of the time, I feel like it covers subjects adequately. It might be a bit further right than me, but for a general US source, I feel it’s rather representative.
Then they write a story about something happening to low income US people, and it’s just social and logical salad. They report, it appears as though they analytically look at data, instead of talking to people. Statisticians will tell you, and this is subtle: conclusions made at one level of detail cannot be generalized to another level of detail. Looking at data without talking with people is fallacious for social issues. The NYT needs to understand this, but meanwhile they are horrifically insensitive bordering on destructive at times.
“The jackboot only jumps down on people standing up”
Then I read the next story and I take it as credible without much critical thought or evidence. Bias is strange.
There is a name for this: Gell-Mann amnesia effect
“Wet sidewalks cause rain”
Pretty much. I never really thought about the causal link being entirely reversed, moreso that the chain of reasoning being broken or mediated by some factor they missed, which yes definitely happens, but now I can definitely think of instances where it’s totally flipped.
Very interesting read, thanks for sharing!
Can you give me an example of conclusions on one level of detail can't be generalised to another level? I can't quite understand it
Perhaps the textbook example is the Simpson’s Paradox.
This article goes through a couple cases where naively and statically conclusions are supported, but when you correctly separate the data, those conclusions reverse themselves.
Another relevant issue is Aggregation Bias. This article has an example where conclusions about a population hold inversely with individuals of that population.
And the last one I can think of is MAUP, which deals with the fact that statistics are very sensitive in whatever process is used to divvy up a space. This is commonly referenced in spatial statistics but has more broad implications I believe.
This is not to say that you can never generalize, and indeed, often a big goal of statistics is to answer questions about populations using only information from a subset of individuals in that population.
All Models Are Wrong, Some are Useful
The argument I was making is that the NYT will authoritatively make conclusions without taking into account the individual, looking only at the population level, and not only is that oftentimes dubious, sometimes it’s actively detrimental. They don’t seem to me to prove their due diligence in mitigating the risk that comes with such dubious assumptions, hence the cynic in me left that Hozier quote.
If the standard is replicating human level intelligence and behavior, making up shit just to get you to go away about 40% of the time kind of checks out. In fact, I bet it hallucinates less and is wrong less often than most people you work with
And it just keeps improving over time. People shit all over ai to make themselves feel better because scary shit is happening.
My kid sometimes makes up shit and completely presents it as facts. It made me realize how many made up facts I learned from other kids.
I did a google search to find out how much i pay for water, the water department where I live bills by the MCF (1,000 cubic feet). The AI Overview told me an MCF was one million cubic feet. It's a unit of measurement. It's not subjective, not an opinion and AI still got it wrong.
Shouldn't it be kcf? Or tcf if you're desperate to avoid standard prefixes?
I just think you need an abbrevations chart.
Yeah, that's an odd one. My city does water by the gallon, which is much more reasonable.
If it's being designed to answer questions, then it should simply be an advanced search engine that points to actual researched content.
The way it acts now, it's trying to be an expert based one "something a friend of a friend said", and that makes it confidently wrong far too often.
I use chatgpt as a suggestion. Like an aid to whatever it is that I’m doing. It either helps me or it doesn’t, but I always have my critical thinking hat on.
Same. It's an idea generator. I asked what kinda pie should I should make. I saw one I liked and then googled a real recipe.
I needed a SQL query for work. It gave me different methods of optimization. I then googled those methods, implemented, and tested it.
One thing I have found it to be useful for is changing the tone if what I write.
I tend to write very clinicaly because my job involves a lot of that style of writing. I have started asked chat gpt to rephrase what i write in a softer tone.
Not for everything, but for example when Im texting my girlfriend who is feeling insecure. It has helped me a lot! I always read thrugh it to make sure it did not change any of the meaning or add anything, but so far it has been pretty good at changing the tone.
Also use it to rephrase emails at work to make it sound more professional.
I do that in reverse, lol. Except I'm also not a native speaker. "Rephrase this, it should sound more scientific".
Most of my searches have to do with video games, and I have yet to see any of those AI generated answers be accurate. But I mean, when the source of the AI's info is coming from a Fandom wiki, it was already wading in shit before it ever generated a response.
I’ve tried it a few times with Dwarf Fortress, and it was always horribly wrong hallucinated instructions on how to do something.
I have frequentley seen gpt give a wrong answer to a question, get told that its incorrect, and the bot fights with me and insists Im wrong. and on other less serious matters Ive seen it immediatley fold and take any answer I give it as "correct"
come on guys, the joke is right there.... 60% of the time it works, every time!
Exactly my thoughts.
i mainly use it for fact checking sources from the internet and looking for bias. i double check everything of course. beyond that its good for rule checking for MTG commander games, and deck building. i mainly use it for its search function.
does chat gpt have ADHD?
same with every documentary out there
Exactly this is why I have a love/hate relationship with just about any LLM.
I love it most for generating code samples (small enough that I can manually check them, not entire files/projects) and re-writing existing text, again small enough to verify everything. Common theme being that I have to re-read its output a few times, to make 100% sure it hasn't made some random mistake.
I'm not entirely sure we're going to resolve this without additional technology, outside of 'the LLM'-itself.
I love that this mirrors the experience of experts on social media like reddit, which was used for training chatgpt...
it's much older than reddit https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect
i was going to post this, too.
The Gell-Mann amnesia effect is a cognitive bias describing the tendency of individuals to critically assess media reports in a domain they are knowledgeable about, yet continue to trust reporting in other areas despite recognizing similar potential inaccuracies.
Also common in news. There’s an old saying along the lines of “everyone trusts the news until they talk about your job.” Basically, the news is focused on getting info out quickly. Every station is rushing to be the first to break a story. So the people writing the teleprompter usually only have a few minutes (at best) to research anything before it goes live in front of the anchor. This means that you’re only ever going to get the most surface level info, even when the talking heads claim to be doing deep dives on a topic. It also means they’re going to be misleading or blatantly wrong a lot of the time, because they’re basically just parroting the top google result regardless of accuracy.
One of my academic areas of expertise way back in the day (late '80s and early '90s) were the so-called "Mitochondrial Eve" and "Out of Africa" hypotheses. The absolute mangling of this shit by journalists even at the time was migraine-inducing and it's gotten much worse in the decades since then. It hasn't helped that subsequent generations of scholars have mangled the whole deal even worse. The only advice I can offer people is that if the article (scholastic or popular) contains the word "Neanderthal" anywhere, just toss it.
There’s an old saying along the lines of “everyone trusts the news until they talk about your job.”
This is something of a selection bias. Generally speaking, if you don't trust a news broadcast then you won't watch it. So of course you're going to be predisposed to trust the news sources you do listen to. Until the news source bumps up against some of your prior info/intuition, at which point you start experiencing skepticism.
This means that you’re only ever going to get the most surface level info, even when the talking heads claim to be doing deep dives on a topic.
Investigative journalism has historically been a big part of the industry. You do get a few punchy "If it bleeds, it leads" hit pieces up front, but the Main Story tends to be the result of some more extensive investigation and coverage. I remember my home town of Houston had Marvin Zindler, a legendary beat reporter who would regularly put out interconnected 10-15 minute segments that offered continuous coverage on local events. This was after a stint at a municipal Consumer Fraud Prevention division that turned up numerous health code violations and sales frauds (he was allegedly let go by an incoming sheriff with ties to the local used car lobby, after Zindler exposed one too many odometer scams).
But investigative journalism costs money. And its not "business friendly" from a conservative corporate perspective, which can cut into advertising revenues. So it is often the first line of business to be cut when a local print or broadcast outlet gets bought up and turned over for syndication.
That doesn't detract from a general popular appetite for investigative journalism. But it does set up an adversarial economic relationship between journals that do carry investigative reports and those more focused on juicing revenues.
First off, the beauty of these two posts being beside each other is palpable.
Second, as you can see on the picture, it's more like 60%
No it's not. If you actually read the study, it's about AI search engines correctly finding and citing the source of a given quote, not general correctness, and not just the plain model
Read the study? Why would i do that when there's an infographic right there?
(thank you for the clarification, i actually appreciate it)
I've been using o3-mini mostly for ffmpeg
command lines. And a bit of sed
. And it hasn't been terrible, it's a good way to learn stuff I can't decipher from the man pages. Not sure what else it's good for tbh, but at least I can test and understand what it's doing before running the code.
In my experience plain old googling still better.
Totally didn't misread that as 'ffmpreg' nope.
I'm not judging. The LLM might though.
Are you me? I've been doing the exact same thing this week. How creepy.
we just had to create a new instance for coder7ZybCtRwMc, we'll merge it back soon
This, but for tech bros.
I just use it to write emails, so I declare the facts to the LLM and tell it to write an email based on that and the context of the email. Works pretty well but doesn't really sound like something I wrote, it adds too much emotion.
That sounds like more work than just writing the email to me
Yeah, that has been my experience so far. LLMs take as much or more work vs the way I normally do things.
This is what LLMs should be used for. People treat them like search engines and encyclopedias, which they definitely aren't
Deepseek is pretty good tbh. The answers sometimes leave out information in a way that is misleading, but targeted follow up questions can clarify.
I think that AI has now reached the point where it can deceive people ,not equal to humanity.
Oof let's see, what am I an expert in? Probably system design - I work at (insert big tech) and run a system design club there every Friday. I use ChatGPT to bounce ideas and find holes in my design planning before each session.
Does it make mistakes? Not really? it has a hard time getting creative with nuanced examples (i.e. if you ask it to "give practical examples where the time/accuracy tradeoff in Flink is important" it can't come up with more than 1 or 2 truly distinct examples) but it's never wrong.
The only times it's blatantly wrong is when it hallucinates due to lack of context (or oversaturated context). But you can kind of tell something doesn't make sense and prod followups.
Tl;dr funny meme, would be funnier if true
That's not been my experience with it. I'm a software engineer and when I ask it stuff it usually gives plausible answers but there is always something wrong. For example it will recommend old outdated libraries or patterns that look like they would work but when you try them out you figure out they are setup differently now or didn't even exist.
I have been using windsurf to code recently and I'm liking that but it makes some weird choices sometimes and it is way too eager to code so it spits out a ton of code you need to review. It would be easy to get it to generate a bunch of spaghetti code that works mostly that's not maintainable by a person out of the box.
I ask AI shitbots technical questions and get wrong answers daily. I said this in another comment, but I regularly have to ask it if what it gave me was actually real.
Like, asking copilot about Powershell commands and modules that are by no means obscure will cause it to hallucinate flags that don't exist based on the prompt. I give it plenty of context on what I'm using and trying to do, and it makes up shit based on what it thinks I want to hear.
This, but for Wikipedia.
Edit: Ironically, the down votes are really driving home the point in the OP. When you aren't an expert in a subject, you're incapable of recognizing the flaws in someone's discussion, whether it's an LLM or Wikipedia. Just like the GPT bros defending the LLM's inaccuracies because they lack the knowledge to recognize them, we've got Wiki bros defending Wikipedia's inaccuracies because they lack the knowledge to recognize them. At the end of the day, neither one is a reliable source for information.
The obvious difference being that Wikipedia has contributors cite their sources, and can be corrected in ways that LLMs are flat out incapable of doing
Really curious about anything Wikipedia has wrong though. I can start with something an LLM gets wrong constantly if you like
Do not bring Wikipedia into this argument.
Wikipedia is the library of Alexandria and the amount of effort people put into keeping Wikipedia pages as accurate as possible should make every LLM supporter be ashamed with how inaccurate their models are if they use Wikipedia as training data
TBF, as soon as you move out of the English language the oversight of a million pair of eyes gets patchy fast. I have seen credible reports about Wikipedia pages in languages spoken by say, less than 10 million people, where certain elements can easily control the narrative.
But hey, some people always criticize wikipedia as if there was some actually 100% objective alternative out there, and that I disagree with.
What topics are you an expert on and can you provide some links to Wikipedia pages about them that are wrong?
If this were true, which I have my doubts, at least Wikipedia tries and has a specific goal of doing better. AI companies largely don't give a hot fuck as long as it works good enough to vacuum up investments or profits
why don't you then go and fix these quoting high quality sources? are there none?
There's an easy way to settle this debate. Link me a Wikipedia article that's objectively wrong.
I will wait.
This, but for all media.