Surur t1_jaedwk5 wrote on February 28, 2023 at 8:55 PM

Reply to comment by Liberty2012 in Is the intelligence paradox resolvable? by Liberty2012

Sure, but you are missing the self-correcting element of the statement.

Progress will stall without alignment, so we will automatically not get AGI without alignment.

An AGI with a 1% chance of killing its user is just not a useful AGI, and will never be released.

We have seen this echoed by OpenAI's recent announcement that as they get closer to AGI they will become more careful about their releases.

To put it another way, if we have another AI winter, it will be because we could not figure out alignment.

Liberty2012 OP t1_jaehydb wrote on February 28, 2023 at 9:20 PM

Ok, yes, when you leave open the possibility that it is not actually possible then that is somewhat a reasonable disposition as opposed to proponents who believed we are destined to figure it out.

It somewhat side steps the paradox though. In such manner that if the paradox proves to be true, then the feedback loop will prevent alignment, but we won't get close enough to cause harm.

It doesn't take into account though our potential inability to evaluate the state of the AGI. The behavior is so complex that it will never be known in test isolation what the behavior will be like released into the world.

Even with this early very primitive AI, we already see interesting emergent properties of deception as covered in the link below. Possibly this is the signal of the feedback loop to slow down. But it is intriguing that we already have a primitive concept emerging of who will outsmart who.

https://bounded-regret.ghost.io/emergent-deception-optimization

Surur t1_jaen1h5 wrote on February 28, 2023 at 9:53 PM

> It doesn't take into account though our potential inability to evaluate the state of the AGI.

I think the idea would be that the values we teach the AI at the stage that is under our control will carry forward when it is no longer, much like we teach values to our children which we hope they will exhibit as adults.

I guess if we make sticking to human values the terminal goal we will get goal preservation even as intelligence increases.

Liberty2012 OP t1_jaetcvy wrote on February 28, 2023 at 10:36 PM

Conceptually yes. However, human children sometimes grow up to not adopt the values of their parents and teachers. They change throughout time.

We have a conflict in that we want AGI/ASI to be humanlike, but not human like at the same time under certain conditions.