- In two hacker competitions run by Palisade Research, autonomous AI systems matched or outperformed human professionals in demanding security challenges.
- In the first contest, four out of seven AI teams scored 19 out of 20 points, ranking among the top five percent of all participants, while in the second competition, the leading AI team reached the top ten percent despite facing structural disadvantages.
- According to Palisade Research, these outcomes suggest that the abilities of AI agents in cybersecurity have been underestimated, largely due to shortcomings in earlier evaluation methods.
Want to bet that those puzzles (or some very similar ones) were part of the training data of some of the agents?