drsimonz t1_jaa68ou wrote on February 27, 2023 at 11:32 PM

Reply to comment by SnooHabits1237 in Leaked: $466B conglomerate Tencent has a team building a ChatGPT rival platform by zalivom1s

The thing is, by definition we can't imagine the sorts of strategies a superhuman intelligence might employ. A lot of the rhetoric against worrying about AGI/ASI alignment focuses on "solving" some of the examples people have come up with for attacks. But these are just that - examples. The real attack could be much more complicated or unexpected. A big part of the problem, I think, is that this concept requires a certain amount of humility. Recognizing that while we are the biggest, baddest thing on Earth right now, this could definitely change very abruptly. We aren't predestined to be the masters of the universe just because we "deserve" it. We'll have to be very clever.

OutOfBananaException t1_jacw2ry wrote on February 28, 2023 at 3:10 PM

Being aligned to humans may help, but a human aligned AGI is hardly 'safe'. We can't imagine what it means to be aligned, given we can't reach mutual consensus between ourselves. If we can't define the problem, how can we hope to engineer a solution for it? Solutions driven by early AGI may be our best hope for favorable outcomes for later more advanced AGI.

If you gave a toddler the power to 'align' all adults to its desires, plus the authority to overrule any decision, would you expect a favorable outcome?

drsimonz t1_jae6cn3 wrote on February 28, 2023 at 8:06 PM

> Solutions driven by early AGI may be our best hope for favorable outcomes for later more advanced AGI.

Exactly what I've been thinking. We might still have a chance to succeed given (A) a sufficiently slow takeoff (meaning AI doesn't explode from IQ 50 to IQ 10000 in a month), and (B) a continuous process of integrating the state of the art, applying the best tech available to the control problem. To survive, we'd have to admit that we really don't know what's best for us. That we don't know what to optimize for at all. Average quality of life? Minimum quality of life? Economic fairness? Even these seemingly simple concepts will prove almost impossible to quantify, and would almost certainly be a disaster if they were the only target.

Almost makes me wonder if the only safe goal to give an AGI is "make it look like we never invented AGI in the first place".