Submitted by Liberty2012 t3_11ee7dt in singularity
Surur t1_jaeezsj wrote
Reply to comment by hapliniste in Is the intelligence paradox resolvable? by Liberty2012
I think the RL-HF worked really well because the AI is basing its judgement not on a list of rules, but the nuanced rules it learnt itself from human feedback.
Just like most AI things, we can never encode strictly enough all the elements which guide our decisions, but using neural networks we are able to black-box it and get a workable system that has in some way captured the essence of the decision-making process we use.
Liberty2012 OP t1_jaejlry wrote
There is a recent observation that might question exactly how well this working. There seems to be a feedback loop causing a deceptive emergent behavior from the reinforcement learning.
https://bounded-regret.ghost.io/emergent-deception-optimization
Surur t1_jaem8nr wrote
It is interesting to me that
a) its possible to teach a LLM to be honest when we catch it in a lie.
b) if we ever get to the point where we can not detect a lie (eg. novel information) the AI is incentivised to lie every time.
Viewing a single comment thread. View all comments