Submitted by Liberty2012 t3_11ee7dt in singularity
Liberty2012 OP t1_jaejlry wrote
Reply to comment by Surur in Is the intelligence paradox resolvable? by Liberty2012
There is a recent observation that might question exactly how well this working. There seems to be a feedback loop causing a deceptive emergent behavior from the reinforcement learning.
https://bounded-regret.ghost.io/emergent-deception-optimization
Surur t1_jaem8nr wrote
It is interesting to me that
a) its possible to teach a LLM to be honest when we catch it in a lie.
b) if we ever get to the point where we can not detect a lie (eg. novel information) the AI is incentivised to lie every time.
Viewing a single comment thread. View all comments