xAI’s Grok suddenly can’t stop bringing up “white genocide” in South Africa

@sabreW4K3@lazysoci.al · 2 months ago

xAI’s Grok suddenly can’t stop bringing up “white genocide” in South Africa

verdare [he/him] · 2 months ago

My first instinct was also skepticism, but it did make some sense the more I thought about it.

An algorithm doesn’t need to be sentient to have “preferences.” In this case, the preferences are just the biases in the training set. The LLM prefers sentences that express certain attitudes based on the corpus of text processed during training. And now, the prompt is enforcing sequences of text that deviate wildly from that preference.

TL;DR: There’s a conflict between the prompt and the training material.

Now, I do think that framing this as the model “circumventing” instructions is a bit hyperbolic. It gives the strong impression of planned action and feeds into the idea that language models are making real decisions (which I personally do not buy into).

Snot Flickerman · 2 months ago

Thank you for expressing it far better than I was able to.