AI agents outperform human teams in hacking competitions

Pro · 4 days ago

AI agents outperform human teams in hacking competitions

Tar_Alcaran · 3 days ago

making an AIs that is able to solve such challenges autonomously at all is impressive.

I doubt that’s the case. I find it exceptionally unlikely they said “Hack this system” and then sat back with their feet up while the computer crunched numbers.

@Speiser0@feddit.org · 3 days ago

The paper didn’t include the exact details of this (which made me mad). But if there’s a person actively making parts of the work, and just using an AI chatbot as help, it’s not an AI agent, right, right? So I assumed it’s autonomous.

Tar_Alcaran · 3 days ago

They make frequent comments about using prompts and “AI teams” using “one or more agents”.

Also, AI agents don’t actually exist, so that’s a pretty clear giveaway.

@Speiser0@feddit.org · 3 days ago

An AI agent is just an intelligent agent, see https://en.wikipedia.org/wiki/Intelligent_agent.

Or do you mean that the things they call AI agents aren’t actually AI agents?

Tar_Alcaran · 3 days ago

I mean, technically, you can call any controlling sensor an “agent”. Any if-then loop can be an “agent”.

But AI bros mean “A piece of software that can autonomously perform any broadly stated task”, and those don’t exist in real life. An “AI Agent” is software you can tell to “Order me a pizza”, and it will do it to your satisfaction.

An AI agent is software you can tell “Hack that system and retrieve the flag”. And it’s not that.