I love to show that kind of shit to AI boosters. (In case you’re wondering, the numbers were chosen randomly and the answer is incorrect).

They go waaa waaa its not a calculator, and then I can point out that it got the leading 6 digits and the last digit correct, which is a lot better than it did on the “softer” parts of the test.

  • Redex
    link
    fedilink
    English
    -419 days ago

    Depending on the task it can significantly improve the quality of the output, but it doesn’t help with everything. It’s more useful for stuff that has to be reasoned about in multiple iterations, not something that’s a direct answer.

    • @Architeuthis
      link
      English
      619 days ago

      Except not really, because even if stuff that has to be reasoned about in multiple iterations was a distinct category of problems, reasoning models by all accounts hallucinate a whole bunch more.