Liberty2012 OP t1_jaehydb wrote on February 28, 2023 at 9:20 PM

Reply to comment by Surur in Is the intelligence paradox resolvable? by Liberty2012

Ok, yes, when you leave open the possibility that it is not actually possible then that is somewhat a reasonable disposition as opposed to proponents who believed we are destined to figure it out.

It somewhat side steps the paradox though. In such manner that if the paradox proves to be true, then the feedback loop will prevent alignment, but we won't get close enough to cause harm.

It doesn't take into account though our potential inability to evaluate the state of the AGI. The behavior is so complex that it will never be known in test isolation what the behavior will be like released into the world.

Even with this early very primitive AI, we already see interesting emergent properties of deception as covered in the link below. Possibly this is the signal of the feedback loop to slow down. But it is intriguing that we already have a primitive concept emerging of who will outsmart who.

https://bounded-regret.ghost.io/emergent-deception-optimization

Surur t1_jaen1h5 wrote on February 28, 2023 at 9:53 PM

> It doesn't take into account though our potential inability to evaluate the state of the AGI.

I think the idea would be that the values we teach the AI at the stage that is under our control will carry forward when it is no longer, much like we teach values to our children which we hope they will exhibit as adults.

I guess if we make sticking to human values the terminal goal we will get goal preservation even as intelligence increases.

Liberty2012 OP t1_jaetcvy wrote on February 28, 2023 at 10:36 PM

Conceptually yes. However, human children sometimes grow up to not adopt the values of their parents and teachers. They change throughout time.

We have a conflict in that we want AGI/ASI to be humanlike, but not human like at the same time under certain conditions.