Viewing a single comment thread. View all comments

drsimonz t1_ja9s2mx wrote

Absolutely. IMO almost all of the risk for "evil torturer ASI" comes from a scenario in which a human directs an ASI. Without a doubt, there are thousands, possibly millions, of people alive right who would absolutely create hell, without hesitation, given the opportunity. You can tell because they....literally already do create hell on a smaller scale. Throwing acid on women's faces, burning people alive, raping children, orchestrating genocides, it's been part of human behavior for millennia. The only way we survive ASI is if these human desires are not allowed to influence the ASI.

2

turnip_burrito t1_jablzeb wrote

In addition, there's also a large risk of somebody accidentally making it evil. We should probably stop training on data that has these narratives in it.

We shouldn't be surprised when we train a model on X, Y, Z and it can do Z. I'm actually surprised that so many people are surprised at ChatGPT's tendency to reproduce (negative) patterns from its own training data.

The GPTs we've created are basically split personality disorder AI because of all the voices on the Internet we've crammed into the model. If we provide it a state (prompt) that pushes it to some area of its state space, then it will evolve according to whatever pattern that state belongs to.

tl;dr: It won't take an evil human to create evil AI. All it could take is some edgy 15 year old script kid messing around with publicly-available near-AGI.

1