As always, never rely on llms for anything factual. They’re only good with things which have a massive acceptance for error, such as entertainment (eg rpgs)
I tried using it to spit ball ideas for my DMing. I was running a campaign set in a real life location known for a specific thing. Even if I told it to not include that thing, it would still shoe horn it in random spots. It quickly became absolutely useless once I didn’t need that thing included
Sorry for being vague, I just didn’t want to post my home town on here
You can say Space Needle. We get it.
The issue for RPGs is that they have such “small” context windows, and a big point of RPGs is that anything could be important, investigated, or just come up later
Although, similar to how deepseek uses two stages (“how would you solve this problem”, then “solve this problem following this train of thought”), you could have an input of recent conversations and a private/unseen “notebook” which is modified/appended to based on recent events, but that would need a whole new model to be done properly which likely wouldn’t be profitable short term, although I imagine the same infrastructure could be used for any LLM usage where fine details over a long period are more important than specific wording, including factual things
The problem is that the “train of the thought” is also hallucinations. It might make the model better with more compute but it’s diminishing rewards.
Rpg can use the llms because they’re not critical. If the llm spews out nonsense you don’t like, you just ask to redo, because it’s all subjective.
Or at least as an assistant on a field your an expert in. Love using it for boilerplate at work (tech).
Idk guys. I think the headline is misleading. I had an AI chatbot summarize the article and it says AI chatbots are really, really good at summarizing articles. In fact it pinky promised.
Turns out, spitting out words when you don’t know what anything means or what “means” means is bad, mmmmkay.
It got journalists who were relevant experts in the subject of the article to rate the quality of answers from the AI assistants.
It found 51% of all AI answers to questions about the news were judged to have significant issues of some form.
Additionally, 19% of AI answers which cited BBC content introduced factual errors, such as incorrect factual statements, numbers and dates.
Introduced factual errors
Yeah that’s . . . that’s bad. As in, not good. As in - it will never be good. With a lot of work and grinding it might be “okay enough” for some tasks some day. That’ll be another 200 Billion please.
that’s the core problem though, isn’t it. They are just predictive text machines, not understanding what they are saying. Yet we are treating them as if they were some amazing solution to all our problems
Well, “we” arent’ but there’s a hype machine in operation bigger than anything in history because a few tech bros think they’re going to rule the world.
I’ll be here begging for a miserable 1 million to invest in some freaking trains and bicycle paths. Thanks.
Do you dislike ai?
I don’t necessarily dislike “AI” but I reserve the right to be derisive about inappropriate use, which seems to be pretty much every use.
Using AI to find pertoglyphs in Peru was cool. Reviewing medical scans is pretty great. Everything else is shit.
I work in tech and can confirm the the vast majority of engineers “dislike ai” and are disillusioned with AI tools. Even ones that work on AI/ML tools. It’s fewer and fewer people the higher up the pay scale you go.
There isn’t a single complex coding problem an AI can solve. If you don’t understand something and it helps you write it I’ll close the MR and delete your code since it’s worthless. You have to understand what you write. I do not care if it works. You have to understand every line.
“But I use it just fine and I’m an…”
Then you’re not an engineer and you shouldn’t have a job. You lack the intelligence, dedication and knowledge needed to be one. You are detriment to your team and company.
That’s some weird gatekeeping. Why stop there? Whoever is using a linter is obviously too stupid to write clean code right off the bat. Syntax highlighting is for noobs.
I full-heartedly dislike people that think they need to define some arcane rules how a task is achieved instead of just looking at the output.
Accept that you probably already have merged code that was generated by AI and it’s totally fine as long as tests are passing and it fits the architecture.
You’re supposed to gatekeep code. There is nothing wrong with gatekeeping things that aren’t hobbies.
If someone can’t explain every change they’re making and why they chose to do it that way they’re getting denied. The bar is low.
It found 51% of all AI answers to questions about the news were judged to have significant issues of some form.
How good are the human answers? I mean, I expect that an AI’s error rate is currently higher than an “expert” in their field.
But I’d guess the AI is quite a bit better than, say, the average Republican.
I guess you don’t get the issue. You give the AI some text to summarize the key points. The AI gives you wrong info in a percentage of those summaries.
There’s no point in comparing this to a human, since this is usually something done for automation, that is, to work for a lot of people or a large quantity of articles. At best you can compare it to other automated summaries that existed before LLMs, which might not have all the info, but won’t make up random facts that aren’t in the article.
I’m more interested in the technology itself, rather than its current application.
I feel like I am watching a toddler taking her first steps; wondering what she will eventually accomplish in her lifetime. But the loudest voices aren’t cheering her on: they’re sitting in their recliners, smugly claiming she’s useless. She can’t even participate in a marathon, let alone compete with actual athletes!
Basically, the best AIs currently have college-level mastery of language, and the reasoning skills of children. They are already far more capable and productive than anti-vaxxers, or our current president.
It’s not the people that simply decided to hate on AI, it was the sensationalist media hyping it up so much to the point of scaring people: “it’ll take all your jobs”, or companies shoving it down our throats by putting it in every product even when it gets in the way of the actual functionality people want to use. Even my company “forces” us all to use X prompts every week as a sign of being “productive”. Literally every IT consultancy in my country has a ChatGPT wrapper that they’re trying to sell and they think they’re different because of it. The result couldn’t be different, when something gets too much exposure it also gets a lot of hate, especially when it is forced down on people.
Is it worse than the current system of editors making shitty click bait titles?
Surprisingly, yes
BBC is probably salty the AI is able to insert the word Israel alongside a negative term in the headline
Some examples of inaccuracies found by the BBC included:
Gemini incorrectly said the NHS did not recommend vaping as an aid to quit smoking
ChatGPT and Copilot said Rishi Sunak and Nicola Sturgeon were still in office even after they had left
Perplexity misquoted BBC News in a story about the Middle East, saying Iran initially showed “restraint” and described Israel’s actions as “aggressive”
Perplexity misquoted BBC News in a story about the Middle East, saying Iran initially showed “restraint” and described Israel’s actions as “aggressive”
I did not even read up to there but wow BBC really went there openly.
ShockedPikachu.svg
You don’t say.
But every techbro on the planet told me it’s exactly what LLMs are good at. What the hell!? /s
Not only techbros though. Most of my friends are not into computers but they all think AI is magical and will change the whole world for the better. I always ask “how can a blackbox that throws up random crap and runs on the computers of big companies out of the country would change anything?” They don’t know what to say but they still believe something will happen and a program can magically become sentient. Sometimes they can be fucking dumb but I still love them.
the more you know what you are doing the less impressed you are by ai. calling people that trust ai idiots is not a good start to a conversation though
But the BBC is increasingly unable to accurately report the news, so this finding is no real surprise.
Why do you say that? I have had no reason to doubt their reporting
Look at their reporting of the Employment Tribunal for the nurse from Five who was sacked for abusing a doctor. They refused to correctly gender the doctor correctly in every article to a point where the lack of any pronoun other than the sacked transphobe referring to her with “him”. They also very much paint it like it is Dr Upton on trial and not Ms Peggie.
It’s a “how the mighty have fallen” kind of thing. They are well into the click-bait farm mentality now - have been for a while.
It’s present on the news sites, but far worse on things where they know they steer opinion and discourse. They used to ensure political parties has coverage inline with their support, but for like 10 years prior to Brexit, they gave Farage and his Jackasses hugely disproportionate coverage - like 20X more than their base. This was at a time when SNP were doing very well and were frequently seen less than the UK independence party. And I don’t recall a single instance of it being pointed out that 10 years of poor interactions with Europe may have been at least partially fuelled by Nidge being our MEP and never turning up. Hell we had veto rights and he was on the fisheries commission. All that shit about fisherman was a problem he made.
Current reporting is heavily spun and they definitely aren’t the worst in the world, but the are also definitely not the bastion of unbiased news I grew up with.
Until relatively recently you could see the deterioration by flipping to the world service, but that’s fallen into line now.
If you have the time to follow independent journalists the problem becomes clearer, if not, look at output from parody news sites - it’s telling that Private Eye and Newsthump manage the criticism that the BBC can’t seem to get too
Go look at the bylinetimes.com front page, grab a random story and compare coverage with the BBC. One of these is crowd funded reporters and the other a national news site with great funding and legal obligations to report in the public interest.
I don’t hate them, they just need to be better.
I just tried it on deepseek it did it fine and gave the source for everything it mentioned as well.
Do you mean you rigorously went through a hundred articles, asking DeepSeek to summarise them and then got relevant experts in the subject of the articles to rate the quality of answers? Could you tell us what percentage of the summaries that were found to introduce errors then? Literally 0?
Or do you mean that you tried having DeepSeek summarise a couple of articles, didn’t see anything obviously problematic, and figured it is doing fine? Replacing rigorous research and journalism by humans with a couple of quick AI prompts, which is the core of the issue that the article is getting at. Because if so, please reconsider how you evaluate (or trust others’ evaluations of) information tools which might help or help destroy democracy.
Now ask it whether Taiwan is a country.
That depends on if you ask the online app (which will cut you off or give you a CCP sanctioned answer) or run it locally (which seems to give a normal answer)
I learned that AI chat bots aren’t necessarily trustworthy in everything. In fact, if you aren’t taking their shit with a grain of salt, you’re doing something very wrong.
This is my personal take. As long as you’re careful and thoughtful whenever using them, they can be extremely useful.
Extremely?
Treat LLMs like a super knowledgeable, enthusiastic, arrogant, unimaginative intern.
Super knowledgeable but with patchy knowledge, so they’ll confidently say something that practically everyone else in the company knows is flat out wrong.
I noticed that. When I ask it about things that I am knowledgeable about or simply wish to troubleshoot I often find myself having to correct it. This does make me hestitant to follow the instructions given on something I DON’T know much about.
Oh yes. The LLM will lie to you, confidently.
Exactly. I think this is a good barometer of gauging whether or not you can trust it. Ask it about things you know you’re good at or knowledgeable about. If it is giving good information, the type you would give out, then it is probably OK. If it is bullshitting you or making you go ‘uhh, no, actually…’ then you need to do more old-school research.
They are, however, able to inaccurately summarize it in GLaDOS’s voice, which is a strong point in their favor.
Yeah, out of all the generative AI fields, voice generation at this point is like 95% there in its capability of producing convincing speech even with consumer level tech like ElevenLabs. That last 5% might not even be solvable currently, as it’s those moments it gets the feeling, intonation or pronunciation wrong when the only context you give it is a text input, which is why everything purely automated tends to fall apart quite fast.
Especially voice cloning - the DRG Cortana Mission Control mod is one of the examples I like to use.
News station finds that AI is unable to perform the job of a news station
🤔
BBC finds lol. No, we slresdy knew about that
The owners of LLMs don’t care about ‘accurate’ … they care about ‘fast’ and ‘summary’ … and especially ‘profit’ and ‘monetization’.
As long as it’s quick, delivers instant content and makes money for someone … no one cares about ‘accurate’
That’s why I avoid them like the plague. I’ve even changed almost every platform I’m using to get away from the AI-pocalypse.
No better time to get into self hosting!