MysteryInc152

MysteryInc152 t1_j8fzf1i wrote

Even humans don't start a chain of action without some input. Interaction is not the only form of input for us. What you hear, what you see, what you touch and feel. What you smell. All forms of input that inspire action in us. How would a person behave if he was strolled of all input? I suspect not far off from how LLMs currently are. Anyway streams of input is fairly non trivial especially when LLMs are grounded in the physical world.

7

MysteryInc152 t1_j81e986 wrote

Reply to comment by rretaemer1 in Open source AI by rretaemer1

Calling Large Language models "sophisticated parrots" is just wrong and weird lol. And it's obvious how wrong it is when you use the se tools and evaluate without any weird biases or undefinable parameters.

This for instance is simply not possible without impressive recursive understanding. https://www.engraved.blog/building-a-virtual-machine-inside/

We give neural networks data and a structure to learn that data but outside that, we don't understand how they work. What I'm saying is that we don't know what individual neurons or parameters are learning or doing. And a neural networks objective function can be deceptively simply.

How you feel about how complex "predicting the next token" can possibly be is much less relevant than the question, "What does it take to generate paragraphs of coherent text?". There are a lot of abstractions to learn in language.

The problem is that people who are telling you these models are "just parrots" are engaging in a useless philosophical question.

I've long thought the "philosophical zombie" to be a special kind of fallacy. The output and how you can interact with it is what matters not some vague notion of whether something really "feels". If you're at the point where no conceivable test can actually differentiate the two then you're engaging in a pointless philosophical debate rather than a scientific one.

"I present to you... the philosophical orange...it tastes like an orange, looks like one and really for all intents and purposes, down to the atomic level resembles one. However, unfortunately, it is not a real orange because...reasons." It's just silly when you think about it.

LLMs are insanely impressive for a number of reasons.

They emerge new abilities at scale - https://arxiv.org/abs/2206.07682

They build internal world models - https://thegradient.pub/othello/

They can be grounded to robotics -( i.e act as a robots brain) - https://say-can.github.io/, https://inner-monologue.github.io/

They can teach themselves how to use tools - https://arxiv.org/abs/2302.04761

They've developed a theory of mind - https://arxiv.org/abs/2302.02083

I'm sorry but anyone who looks at all these and says "muh parrots man. nothing more" is an idiot. And this is without getting into the nice performance gains that come with multimodality (like Visual Language models).

3

MysteryInc152 t1_j6okowf wrote

Not sure what you mean by penalize but say you wanted an LLM that wasn't instruction fine-tuned to translate between 2 languages it knows.

Your input would be

Language x: "text of language x"

Language y: "translated language x text"

You'd do this for a few examples. 2 or 3 should be good. Or even one depending on the task. Then finally

Language x: "text you want translated"

Language y: The model would translate the text and output here

All transformer generative LLMs work the same way with enough scale. GPT-2 (only 1.5b parameters) does not have the necessary scale.

1

MysteryInc152 t1_j5uvo3i wrote

Nothing that would beat Open AI's stuff (Google's stuff) is open for inference or finetuning from the public.

I think the best Open source alternative is this

https://github.com/THUDM/GLM-130B

https://huggingface.co/spaces/THUDM/GLM-130B

But it's not finetuned for instruction so you have to prompt/approach it like a text completer. And also you'll need a 4x3090 to get it running locally.

The best open source instruction finetuned models are the flan t5 models

https://huggingface.co/google/flan-t5-xxl

If you're not necessarily looking for open source but still actual alternatives that aren't just an API wraparound of GPT, you can try cohere

https://cohere.ai/pricing

Good thing is that it's completely free for non commercial or non production use

or alephalpha

https://app.aleph-alpha.com/

Not free but the pricing is decent and they have a visual language model as well. Something like flamingo

https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model

6