• In two hacker competitions run by Palisade Research, autonomous AI systems matched or outperformed human professionals in demanding security challenges.
  • In the first contest, four out of seven AI teams scored 19 out of 20 points, ranking among the top five percent of all participants, while in the second competition, the leading AI team reached the top ten percent despite facing structural disadvantages.
  • According to Palisade Research, these outcomes suggest that the abilities of AI agents in cybersecurity have been underestimated, largely due to shortcomings in earlier evaluation methods.
  • @taladar@sh.itjust.works
    link
    fedilink
    English
    102 days ago

    The event’s puzzles were designed so they could be solved locally, making them accessible even to AI models with technical constraints.

    Want to bet that those puzzles (or some very similar ones) were part of the training data of some of the agents?