Viewing a single comment thread. View all comments

Liberty2012 OP t1_jaejlry wrote

There is a recent observation that might question exactly how well this working. There seems to be a feedback loop causing a deceptive emergent behavior from the reinforcement learning.

https://bounded-regret.ghost.io/emergent-deception-optimization

2

Surur t1_jaem8nr wrote

It is interesting to me that

a) its possible to teach a LLM to be honest when we catch it in a lie.

b) if we ever get to the point where we can not detect a lie (eg. novel information) the AI is incentivised to lie every time.

2