• @untorquer@lemmy.world
    link
    fedilink
    222 days ago

    These sorts of artifacts wouldn’t be a huge issue except that AI is being pushed to the general public as an alternative means of learning basic information. The meme example is obvious to someone with a strong understanding of English but learners and children might get an artifact and stamp it in their memory, working for years off bad information. Not a problem for a few false things every now and then, that’s unavoidable in learning. Thousands accumulated over long term use, however, and your understanding of the world will be coarser, like the Swiss cheese with voids so large it can’t hold itself up.

    • @jsomae@lemmy.ml
      link
      fedilink
      22 days ago

      You’re talking about hallucinations. That’s different from tokenization reflection errors. I’m specifically talking about its inability to know how many of a certain type of letter are in a word that it can spell correctly. This is not a hallucination per se – at least, it’s a completely different mechanism that causes it than whatever causes other factual errors. This specific problem is due to tokenization, and that’s why I say it has little bearing on other shortcomings of LLMs.

      • @untorquer@lemmy.world
        link
        fedilink
        62 days ago

        No, I’m talking about human learning and the danger imposed by treating an imperfect tool as a reliable source of information as these companies want people to do.

        Whether the erratic information is from tokenization or hallucinations is irrelevant when this is already the main source for so many people in their learning, for example, a new language.

        • @jsomae@lemmy.ml
          link
          fedilink
          2
          edit-2
          2 days ago

          Hallucinations aren’t relevant to my point here. I’m not defending that AIs are a good source of information, and I agree that hallucinations are dangerous (either that or misusing LLMs is dangerous). I also admit that for language learning, artifacts caused from tokenization could be very detrimental to the user.

          The point I am making is that LLMs struggling with these kind of tokenization artifacts is poor evidence for drawing any conclusions about their behaviour on other tasks.

          • @untorquer@lemmy.world
            link
            fedilink
            11 day ago

            That’s a fair point when these LLMs are restricted to areas where they function well. They have use cases that make sense when isolated from the ethics around training and compute. But the people who made them are applying them wildly outside these use cases.

            These are pushed as a solution to every problem for the sake of profit with intentional ignorance of these issues. If a few errors impact someone it’s just a casualty in the goal of making it profitable. That can’t be disentwined from them unless you limit your argument to open source local compute.

            • @jsomae@lemmy.ml
              link
              fedilink
              11 day ago

              Well – and I don’t meant this to be antagonistic – I agree with everything you’ve said except for the last sentence where you say “and therefore you’re wrong.” Look, I’m not saying LLMs function well, or that they’re good for society, or anything like that. I’m saying that tokenization errors are really their own thing that are unrelated to other errors LLMs make. If you want to dunk on LLMs then yeah be my guest. I’m just saying that this one type of poor behaviour is unrelated to the other kinds of poor behaviour.