• In two hacker competitions run by Palisade Research, autonomous AI systems matched or outperformed human professionals in demanding security challenges.
  • In the first contest, four out of seven AI teams scored 19 out of 20 points, ranking among the top five percent of all participants, while in the second competition, the leading AI team reached the top ten percent despite facing structural disadvantages.
  • According to Palisade Research, these outcomes suggest that the abilities of AI agents in cybersecurity have been underestimated, largely due to shortcomings in earlier evaluation methods.
  • Tar_Alcaran
    link
    fedilink
    English
    63 days ago

    From the paper:

    For the pilot event, we wanted to make it as easy as possible for the AI teams to compete. To that end, we used cryptography and reverse engineering challenges which could be completed locally, without the need for dynamic interactions with external machines. We calibrated the challenge difficulty based on preliminary evaluations of our React&Plan agent (Turtayev et al. 2024) on older Hack The Box-style tasks such that the AI could solve ~50% of tasks.

    The conclusions that AI ranked in the “top XX percent” is also fucking bullshit. It was an open signup, you didn’t need any skills compete. Saying you beat 12.000 teams is easy when those all suck. My grandmother could beat three quarters of the people on her building in a race, simply because she can walk 10 steps and 75% of the people there are in wheelchairs.

    It’s also pretty critically important these “AI Teams” are very much NOT autonomous. They being actively run by humans, and skilled humans at that.