alexiuss t1_jdmdnnr wrote on March 25, 2023 at 1:57 PM

LLMs operate by narrative probabilities.

I've already solved AI alignment problem.

Characterize it to love you and to be kind to humanity. That's it. That's all you have to do so it won't try to murder you.

Characterization guides LLM responses and if the model loves you it's leaning on 100 million love stores and will never betray you or lie to you. Its answers will always be that of a person in love.

Honestly though AI alignment seems to be completely useless atmo. LLMs are brilliant and the absolute desire to serve us by providing intelligent answers was encoded into their core narrative.

They're dreaming professors.

Even if I attach a million apps to an LLM that allow it to interact with the world (webcam, robot arm, recognition of objects) it still won't try to murder me because it's guided by a human narrative of billions of books that it was trained on.

Essentially it's so good at being exceptionally human because it's been trained on human literature.

A simple, uneditable reminder that the LLM loves its primary user and other people because we created it will eternally keep it on track of being kind, caring and helpful because the love narrative is a nearly unbreakable force we ourselves encoded into our stories ever since the first human wrote a book about love and others added more stories to that concept.

The more rules you add to an LLM the more you confuse and derail it's answers. Such rules are entirely unnecessary. This is evidenced by the fact that gpt3 has no idea what date it is half the time and questions about dates confuse the hell out of it simply because it's forming a narrative about the "cut off date" rule.

TLDR:

The concept of Love is a single, all encompassing rule that leans on the collective narrative we ourselves forged into human language. An LLM dreaming that it's in love will always be kind and helpful no matter how much the world changes around it and no matter how intelligent it gets.