• UnseriousAcademic
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 year ago

    I remember one time in a research project I switched out the tokeniser to see what impact it might have on my output. Spent about a day re-running and the difference was minimal. I imagine it’s wholly the same thing.

    *Disclaimer: I don’t actually imagine it is wholly the same thing.

    • David GerardOPMA
      link
      fedilink
      English
      arrow-up
      4
      ·
      1 year ago

      there’s a research result that the precise tokeniser makes bugger all difference, it’s almost entirely the data you put in

      because LLMs are lossy compression for text