Imnimo

Imnimo t1_j9x01v0 wrote

Overfitting is just one among many possible optimization failures. While these models might over-memorize portions of training data, they're also badly underfit in many other respects (as evidenced by their frequent inability to answers questions humans would find easy).

If Bing is so well-optimized that it has learned these strange outputs as some sort of advanced behavior to succeed at the LM or RLHF tasks, why is it so weak in so many other respects? Is simulating personalities either so much more valuable or so much easier than simple multi-step reasoning, which these models struggle terribly with?

1

Imnimo t1_j9ux0jn wrote

Well, I don't really think this is a semantic disagreement. I'm using their definition of the term.

If the issue is the danger of an AI arms race, what does a poorly-trained model have to do with it? Isn't the danger supposed to be that the model will be too strong, not too weak?

1

Imnimo t1_j9upa4x wrote

My point is that this isn't even misalignment in the first place. No more than an Imagenet classifier with 40% accuracy is misaligned. Misalignment is supposed to be when a model's learned objective is different from the human designer's objective. In their desperation to see threats everywhere, EZ et al resort to characterizing poor performance as misalignment.

1

Imnimo t1_j9rvl16 wrote

No, a lot of his arguments strike me as similar to arguments from the 1800s about how some social trend or another spells doom in a generation or two. And then his followers spend their time confusing "Bing was mean to me" with "Bing is misaligned" (as opposed to "Bing is bad at its job") and start shouting "See? See? Alignment is impossible and it's already biting us!"

14

Imnimo t1_iu50biy wrote

I don't use that sort of thing as part of a normal process, but I did run into a situation where I had an image dataset with small objects on potentially distracting backgrounds. Regular old CAM helped me check whether my misclassifications were finding the right object and just not understanding what it is, or missing the object all together (it was mostly the former).

5