Viewing a single comment thread. View all comments

AndromedaAnimated t1_j3cc5tg wrote

Despite this being not my idea of alignment approach (I am more into emergent moral abilities and the importance of choice), I love this article. It’s a new approach and this is always good.

I do see danger hidden in it though - think of „deceptive alignment“. My „prophecy“ here is that models that favor „harmlessness“ instead of „moral choice“ will be prone to deception.

5