Using Reddit’s popular ChangeMyView community as a source of baseline data, OpenAI had previously found that 2022’s ChatGPT-3.5 was significantly less persuasive than random humans, ranking in just the 38th percentile on this measure. But that performance jumped to the 77th percentile with September’s release of the o1-mini reasoning model and up to percentiles in the high 80s for the full-fledged o1 model.

So are you smarter than a Redditor?

  • @Yingwu@lemmy.dbzer0.com
    link
    fedilink
    English
    53
    edit-2
    6 months ago

    If you don’t read the article, this sounds worse than it is. I think this is the important part:

    ChatGPT’s persuasion performance is still short of the 95th percentile that OpenAI would consider “clear superhuman performance,” a term that conjures up images of an ultra-persuasive AI convincing a military general to launch nuclear weapons or something. It’s important to remember, though, that this evaluation is all relative to a random response from among the hundreds of thousands posted by everyday Redditors using the ChangeMyView subreddit. If that random Redditor’s response ranked as a “1” and the AI’s response ranked as a “2,” that would be considered a success for the AI, even though neither response was all that persuasive.

    OpenAI’s current persuasion test fails to measure how often human readers were actually spurred to change their minds by a ChatGPT-written argument, a high bar that might actually merit the “superhuman” adjective. It also fails to measure whether even the most effective AI-written arguments are persuading users to abandon deeply held beliefs or simply changing minds regarding trivialities like whether a hot dog is a sandwich.