AI chatbots unable to accurately summarise news, BBC finds

@misk@sopuli.xyz · 3 months ago

AI chatbots unable to accurately summarise news, BBC finds

db0 · 3 months ago

As always, never rely on llms for anything factual. They’re only good with things which have a massive acceptance for error, such as entertainment (eg rpgs)

@kboy101222@sh.itjust.works · 3 months ago

I tried using it to spit ball ideas for my DMing. I was running a campaign set in a real life location known for a specific thing. Even if I told it to not include that thing, it would still shoe horn it in random spots. It quickly became absolutely useless once I didn’t need that thing included

Sorry for being vague, I just didn’t want to post my home town on here

Optional · 3 months ago

You can say Space Needle. We get it.

@1rre@discuss.tchncs.de · 3 months ago

The issue for RPGs is that they have such “small” context windows, and a big point of RPGs is that anything could be important, investigated, or just come up later

Although, similar to how deepseek uses two stages (“how would you solve this problem”, then “solve this problem following this train of thought”), you could have an input of recent conversations and a private/unseen “notebook” which is modified/appended to based on recent events, but that would need a whole new model to be done properly which likely wouldn’t be profitable short term, although I imagine the same infrastructure could be used for any LLM usage where fine details over a long period are more important than specific wording, including factual things

db0 · 3 months ago

The problem is that the “train of the thought” is also hallucinations. It might make the model better with more compute but it’s diminishing rewards.

Rpg can use the llms because they’re not critical. If the llm spews out nonsense you don’t like, you just ask to redo, because it’s all subjective.

kat · 3 months ago

Or at least as an assistant on a field your an expert in. Love using it for boilerplate at work (tech).

@mentalNothing@lemmy.world · 3 months ago

Idk guys. I think the headline is misleading. I had an AI chatbot summarize the article and it says AI chatbots are really, really good at summarizing articles. In fact it pinky promised.

Optional · 3 months ago

Turns out, spitting out words when you don’t know what anything means or what “means” means is bad, mmmmkay.

It got journalists who were relevant experts in the subject of the article to rate the quality of answers from the AI assistants.

It found 51% of all AI answers to questions about the news were judged to have significant issues of some form.

Additionally, 19% of AI answers which cited BBC content introduced factual errors, such as incorrect factual statements, numbers and dates.

Introduced factual errors

Yeah that’s . . . that’s bad. As in, not good. As in - it will never be good. With a lot of work and grinding it might be “okay enough” for some tasks some day. That’ll be another 200 Billion please.

@chud37@lemm.ee · 3 months ago

that’s the core problem though, isn’t it. They are just predictive text machines, not understanding what they are saying. Yet we are treating them as if they were some amazing solution to all our problems

Optional · 3 months ago

Well, “we” arent’ but there’s a hype machine in operation bigger than anything in history because a few tech bros think they’re going to rule the world.

@devfuuu@lemmy.world · edit-2 3 months ago

I’ll be here begging for a miserable 1 million to invest in some freaking trains and bicycle paths. Thanks.

@SamboT@lemmy.world · 3 months ago

Do you dislike ai?

@fine_sandy_bottom@discuss.tchncs.de · 3 months ago

I don’t necessarily dislike “AI” but I reserve the right to be derisive about inappropriate use, which seems to be pretty much every use.

Using AI to find pertoglyphs in Peru was cool. Reviewing medical scans is pretty great. Everything else is shit.

@WagyuSneakers@lemmy.world · 3 months ago

I work in tech and can confirm the the vast majority of engineers “dislike ai” and are disillusioned with AI tools. Even ones that work on AI/ML tools. It’s fewer and fewer people the higher up the pay scale you go.

There isn’t a single complex coding problem an AI can solve. If you don’t understand something and it helps you write it I’ll close the MR and delete your code since it’s worthless. You have to understand what you write. I do not care if it works. You have to understand every line.

“But I use it just fine and I’m an…”

Then you’re not an engineer and you shouldn’t have a job. You lack the intelligence, dedication and knowledge needed to be one. You are detriment to your team and company.

@5gruel@lemmy.world · 3 months ago

That’s some weird gatekeeping. Why stop there? Whoever is using a linter is obviously too stupid to write clean code right off the bat. Syntax highlighting is for noobs.

I full-heartedly dislike people that think they need to define some arcane rules how a task is achieved instead of just looking at the output.

Accept that you probably already have merged code that was generated by AI and it’s totally fine as long as tests are passing and it fits the architecture.

@WagyuSneakers@lemmy.world · 3 months ago

You’re supposed to gatekeep code. There is nothing wrong with gatekeeping things that aren’t hobbies.

If someone can’t explain every change they’re making and why they chose to do it that way they’re getting denied. The bar is low.

@Rivalarrival@lemmy.today · 3 months ago

It found 51% of all AI answers to questions about the news were judged to have significant issues of some form.

How good are the human answers? I mean, I expect that an AI’s error rate is currently higher than an “expert” in their field.

But I’d guess the AI is quite a bit better than, say, the average Republican.

Balder · edit-2 3 months ago

I guess you don’t get the issue. You give the AI some text to summarize the key points. The AI gives you wrong info in a percentage of those summaries.

There’s no point in comparing this to a human, since this is usually something done for automation, that is, to work for a lot of people or a large quantity of articles. At best you can compare it to other automated summaries that existed before LLMs, which might not have all the info, but won’t make up random facts that aren’t in the article.

@Rivalarrival@lemmy.today · 3 months ago

I’m more interested in the technology itself, rather than its current application.

I feel like I am watching a toddler taking her first steps; wondering what she will eventually accomplish in her lifetime. But the loudest voices aren’t cheering her on: they’re sitting in their recliners, smugly claiming she’s useless. She can’t even participate in a marathon, let alone compete with actual athletes!

Basically, the best AIs currently have college-level mastery of language, and the reasoning skills of children. They are already far more capable and productive than anti-vaxxers, or our current president.

Balder · edit-2 3 months ago

It’s not the people that simply decided to hate on AI, it was the sensationalist media hyping it up so much to the point of scaring people: “it’ll take all your jobs”, or companies shoving it down our throats by putting it in every product even when it gets in the way of the actual functionality people want to use. Even my company “forces” us all to use X prompts every week as a sign of being “productive”. Literally every IT consultancy in my country has a ChatGPT wrapper that they’re trying to sell and they think they’re different because of it. The result couldn’t be different, when something gets too much exposure it also gets a lot of hate, especially when it is forced down on people.

@MDCCCLV@lemmy.ca · 3 months ago

Is it worse than the current system of editors making shitty click bait titles?

Optional · 3 months ago

Surprisingly, yes

@Turbonics@lemmy.sdf.org · 3 months ago

BBC is probably salty the AI is able to insert the word Israel alongside a negative term in the headline

@Krelis_@lemmy.world · edit-2 3 months ago

Some examples of inaccuracies found by the BBC included:

Gemini incorrectly said the NHS did not recommend vaping as an aid to quit smoking

ChatGPT and Copilot said Rishi Sunak and Nicola Sturgeon were still in office even after they had left

Perplexity misquoted BBC News in a story about the Middle East, saying Iran initially showed “restraint” and described Israel’s actions as “aggressive”

@Turbonics@lemmy.sdf.org · 3 months ago

Perplexity misquoted BBC News in a story about the Middle East, saying Iran initially showed “restraint” and described Israel’s actions as “aggressive”

I did not even read up to there but wow BBC really went there openly.

@Petter1@lemm.ee · 3 months ago

ShockedPikachu.svg

@Etterra@discuss.online · 3 months ago

You don’t say.

@Prandom_returns@lemm.ee · 3 months ago

But every techbro on the planet told me it’s exactly what LLMs are good at. What the hell!? /s

https://lemm.ee/comment/18029491

@heavydust@sh.itjust.works · 3 months ago

Not only techbros though. Most of my friends are not into computers but they all think AI is magical and will change the whole world for the better. I always ask “how can a blackbox that throws up random crap and runs on the computers of big companies out of the country would change anything?” They don’t know what to say but they still believe something will happen and a program can magically become sentient. Sometimes they can be fucking dumb but I still love them.

shrugs · 3 months ago

the more you know what you are doing the less impressed you are by ai. calling people that trust ai idiots is not a good start to a conversation though

@TroublesomeTalker@feddit.uk · 3 months ago

But the BBC is increasingly unable to accurately report the news, so this finding is no real surprise.

@MoonlightFox@lemmy.world · 3 months ago

Why do you say that? I have had no reason to doubt their reporting

@StarlightDust@lemmy.blahaj.zone · 3 months ago

Look at their reporting of the Employment Tribunal for the nurse from Five who was sacked for abusing a doctor. They refused to correctly gender the doctor correctly in every article to a point where the lack of any pronoun other than the sacked transphobe referring to her with “him”. They also very much paint it like it is Dr Upton on trial and not Ms Peggie.

@TroublesomeTalker@feddit.uk · edit-2 3 months ago

It’s a “how the mighty have fallen” kind of thing. They are well into the click-bait farm mentality now - have been for a while.

It’s present on the news sites, but far worse on things where they know they steer opinion and discourse. They used to ensure political parties has coverage inline with their support, but for like 10 years prior to Brexit, they gave Farage and his Jackasses hugely disproportionate coverage - like 20X more than their base. This was at a time when SNP were doing very well and were frequently seen less than the UK independence party. And I don’t recall a single instance of it being pointed out that 10 years of poor interactions with Europe may have been at least partially fuelled by Nidge being our MEP and never turning up. Hell we had veto rights and he was on the fisheries commission. All that shit about fisherman was a problem he made.

Current reporting is heavily spun and they definitely aren’t the worst in the world, but the are also definitely not the bastion of unbiased news I grew up with.

Until relatively recently you could see the deterioration by flipping to the world service, but that’s fallen into line now.

If you have the time to follow independent journalists the problem becomes clearer, if not, look at output from parody news sites - it’s telling that Private Eye and Newsthump manage the criticism that the BBC can’t seem to get too

Go look at the bylinetimes.com front page, grab a random story and compare coverage with the BBC. One of these is crowd funded reporters and the other a national news site with great funding and legal obligations to report in the public interest.

I don’t hate them, they just need to be better.

Teknikal · 3 months ago

I just tried it on deepseek it did it fine and gave the source for everything it mentioned as well.

@datalowe@lemmy.world · 3 months ago

Do you mean you rigorously went through a hundred articles, asking DeepSeek to summarise them and then got relevant experts in the subject of the articles to rate the quality of answers? Could you tell us what percentage of the summaries that were found to introduce errors then? Literally 0?

Or do you mean that you tried having DeepSeek summarise a couple of articles, didn’t see anything obviously problematic, and figured it is doing fine? Replacing rigorous research and journalism by humans with a couple of quick AI prompts, which is the core of the issue that the article is getting at. Because if so, please reconsider how you evaluate (or trust others’ evaluations of) information tools which might help or help destroy democracy.

Prehensile_cloaca · 3 months ago

Now ask it whether Taiwan is a country.

qaz · edit-2 3 months ago

That depends on if you ask the online app (which will cut you off or give you a CCP sanctioned answer) or run it locally (which seems to give a normal answer)

@Phoenicianpirate@lemm.ee · 3 months ago

I learned that AI chat bots aren’t necessarily trustworthy in everything. In fact, if you aren’t taking their shit with a grain of salt, you’re doing something very wrong.

Redex · 3 months ago

This is my personal take. As long as you’re careful and thoughtful whenever using them, they can be extremely useful.

@Llewellyn@lemm.ee · 3 months ago

Extremely?

@Knock_Knock_Lemmy_In@lemmy.world · 3 months ago

Treat LLMs like a super knowledgeable, enthusiastic, arrogant, unimaginative intern.

@milicent_bystandr@lemm.ee · 3 months ago

Super knowledgeable but with patchy knowledge, so they’ll confidently say something that practically everyone else in the company knows is flat out wrong.

@Phoenicianpirate@lemm.ee · 3 months ago

I noticed that. When I ask it about things that I am knowledgeable about or simply wish to troubleshoot I often find myself having to correct it. This does make me hestitant to follow the instructions given on something I DON’T know much about.

@Knock_Knock_Lemmy_In@lemmy.world · 3 months ago

Oh yes. The LLM will lie to you, confidently.

@Phoenicianpirate@lemm.ee · 3 months ago

Exactly. I think this is a good barometer of gauging whether or not you can trust it. Ask it about things you know you’re good at or knowledgeable about. If it is giving good information, the type you would give out, then it is probably OK. If it is bullshitting you or making you go ‘uhh, no, actually…’ then you need to do more old-school research.

@tal@lemmy.today · edit-2 3 months ago

They are, however, able to inaccurately summarize it in GLaDOS’s voice, which is a strong point in their favor.

JohnEdwa · edit-2 3 months ago

Yeah, out of all the generative AI fields, voice generation at this point is like 95% there in its capability of producing convincing speech even with consumer level tech like ElevenLabs. That last 5% might not even be solvable currently, as it’s those moments it gets the feeling, intonation or pronunciation wrong when the only context you give it is a text input, which is why everything purely automated tends to fall apart quite fast.

Especially voice cloning - the DRG Cortana Mission Control mod is one of the examples I like to use.

@underwire212@lemm.ee · 3 months ago

News station finds that AI is unable to perform the job of a news station

🤔

@small44@lemmy.world · 3 months ago

BBC finds lol. No, we slresdy knew about that

IninewCrow · 3 months ago

The owners of LLMs don’t care about ‘accurate’ … they care about ‘fast’ and ‘summary’ … and especially ‘profit’ and ‘monetization’.

As long as it’s quick, delivers instant content and makes money for someone … no one cares about ‘accurate’

@buddascrayon@lemmy.world · 3 months ago

That’s why I avoid them like the plague. I’ve even changed almost every platform I’m using to get away from the AI-pocalypse.

@Opisek@lemmy.world · 3 months ago

No better time to get into self hosting!