Viewing a single comment thread. View all comments

AndromedaAnimated t1_j3cc5tg wrote on January 7, 2023 at 3:32 PM

Despite this being not my idea of alignment approach (I am more into emergent moral abilities and the importance of choice), I love this article. It’s a new approach and this is always good.

I do see danger hidden in it though - think of „deceptive alignment“. My „prophecy“ here is that models that favor „harmlessness“ instead of „moral choice“ will be prone to deception.