Viewing a single comment thread. View all comments

turnip_burrito t1_jablzeb wrote

In addition, there's also a large risk of somebody accidentally making it evil. We should probably stop training on data that has these narratives in it.

We shouldn't be surprised when we train a model on X, Y, Z and it can do Z. I'm actually surprised that so many people are surprised at ChatGPT's tendency to reproduce (negative) patterns from its own training data.

The GPTs we've created are basically split personality disorder AI because of all the voices on the Internet we've crammed into the model. If we provide it a state (prompt) that pushes it to some area of its state space, then it will evolve according to whatever pattern that state belongs to.

tl;dr: It won't take an evil human to create evil AI. All it could take is some edgy 15 year old script kid messing around with publicly-available near-AGI.

1