Imnimo t1_j9x01v0 wrote on February 25, 2023 at 4:04 AM

Reply to comment by Hyper1on in [D] To the ML researchers and practitioners here, do you worry about AI safety/alignment of the type Eliezer Yudkowsky describes? by SchmidhuberDidIt

Overfitting is just one among many possible optimization failures. While these models might over-memorize portions of training data, they're also badly underfit in many other respects (as evidenced by their frequent inability to answers questions humans would find easy).

If Bing is so well-optimized that it has learned these strange outputs as some sort of advanced behavior to succeed at the LM or RLHF tasks, why is it so weak in so many other respects? Is simulating personalities either so much more valuable or so much easier than simple multi-step reasoning, which these models struggle terribly with?

Hyper1on t1_j9y3vz1 wrote on February 25, 2023 at 12:11 PM

I mean, I don't see how you get a plausible explanation of BingGPT from underfitting either. As you say, models are underfit on some types of data, but I think the key here is the finetuning procedure, either normal supervised, or RLHF, which is optimising for a particular type of dialogue data in which the model is asked to act as an "Assistant" to a human user.

Part of the reason I suspect my explanation is right is that ChatGPT and BingGPT were almost certainly finetuned on large amounts of dialogue data, collected from interactions with users, and yet most of the failure modes of BingGPT that made the media are not stuff like "we asked it to solve this complex reasoning problem and it failed horribly", they are instead coming from prompts which are very much in distribution for dialogue data, such as asking the model what it thinks about X, or asking the model to pretend it is Y and you would expect the model to have seen dialogues which start similarly before. I find underfitting on this data to be quite unlikely as an explanation.