MysteryInc152
MysteryInc152 t1_j8oj9qx wrote
Reply to [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng
Fantastic work. Thanks for doing this. Good luck scaling to 24b. I hope more catch on because the lack of a limited context length is a game changer.
MysteryInc152 t1_j8fzf1i wrote
Reply to comment by el_chaquiste in Altman vs. Yudkowsky outlook by kdun19ham
Even humans don't start a chain of action without some input. Interaction is not the only form of input for us. What you hear, what you see, what you touch and feel. What you smell. All forms of input that inspire action in us. How would a person behave if he was strolled of all input? I suspect not far off from how LLMs currently are. Anyway streams of input is fairly non trivial especially when LLMs are grounded in the physical world.
MysteryInc152 t1_j85rgjx wrote
Reply to Where are all the multi-modal models? by ReadSeparate
Recently 2 papers were released that dealt with making frozen LLMs multimodal (with coffee and models released).
Blip-2 - https://arxiv.org/abs/2301.12597 https://huggingface.co/spaces/Salesforce/BLIP2
And fromage - https://arxiv.org/abs/2301.13823 https://github.com/kohjingyu/fromage
MysteryInc152 t1_j83uty8 wrote
Reply to comment by adt in Where are all the multi-modal models? by ReadSeparate
Only the 17b and 30b models are multimodal. Still pretty good though for sure.
We also have some recent advances that ground frozen language models to images. Namely BLIP-2 and fromage.
MysteryInc152 t1_j81thry wrote
Reply to comment by Grotto-man in Open source AI by rretaemer1
MysteryInc152 t1_j81hoqz wrote
Reply to comment by Grotto-man in Open source AI by rretaemer1
Here's an improved version of what I just linked. https://inner-monologue.github.io/.
Can't really speak on the quantum computers bit. Don't know how helpful they would be.
MysteryInc152 t1_j81e986 wrote
Reply to comment by rretaemer1 in Open source AI by rretaemer1
Calling Large Language models "sophisticated parrots" is just wrong and weird lol. And it's obvious how wrong it is when you use the se tools and evaluate without any weird biases or undefinable parameters.
This for instance is simply not possible without impressive recursive understanding. https://www.engraved.blog/building-a-virtual-machine-inside/
We give neural networks data and a structure to learn that data but outside that, we don't understand how they work. What I'm saying is that we don't know what individual neurons or parameters are learning or doing. And a neural networks objective function can be deceptively simply.
How you feel about how complex "predicting the next token" can possibly be is much less relevant than the question, "What does it take to generate paragraphs of coherent text?". There are a lot of abstractions to learn in language.
The problem is that people who are telling you these models are "just parrots" are engaging in a useless philosophical question.
I've long thought the "philosophical zombie" to be a special kind of fallacy. The output and how you can interact with it is what matters not some vague notion of whether something really "feels". If you're at the point where no conceivable test can actually differentiate the two then you're engaging in a pointless philosophical debate rather than a scientific one.
"I present to you... the philosophical orange...it tastes like an orange, looks like one and really for all intents and purposes, down to the atomic level resembles one. However, unfortunately, it is not a real orange because...reasons." It's just silly when you think about it.
LLMs are insanely impressive for a number of reasons.
They emerge new abilities at scale - https://arxiv.org/abs/2206.07682
They build internal world models - https://thegradient.pub/othello/
They can be grounded to robotics -( i.e act as a robots brain) - https://say-can.github.io/, https://inner-monologue.github.io/
They can teach themselves how to use tools - https://arxiv.org/abs/2302.04761
They've developed a theory of mind - https://arxiv.org/abs/2302.02083
I'm sorry but anyone who looks at all these and says "muh parrots man. nothing more" is an idiot. And this is without getting into the nice performance gains that come with multimodality (like Visual Language models).
MysteryInc152 t1_j81b73v wrote
Reply to comment by Grotto-man in Open source AI by rretaemer1
Well robotics still has a ways to go to replicate the flexibility and maneuverability of the human body but....nothing really. They've already been combined and to promising results. See here - https://say-can.github.io/
MysteryInc152 t1_j7lghig wrote
Reply to comment by astrange in [N] Google: An Important Next Step On Our AI Journey by EducationalCicada
>No they're not. ChatGPT doesn't do anything, it just responds to you
Yes they are and you can get it to "do things" easily
MysteryInc152 t1_j7lg6rm wrote
Reply to comment by drooobie in [N] Google: An Important Next Step On Our AI Journey by EducationalCicada
I think he's basically saying AI's like chatGPT just output text at the base level. But that's really also a moot point anyway. You can plug in LLMs to be a sort of middle-man interface.
MysteryInc152 t1_j7ja39c wrote
Reply to comment by Cheap_Meeting in [D] List of Large Language Models to play with. by sinavski
I believe the fine-tuning dataset matters as well as the model but I guess we'll see. I think they plan on fine-tuning.
The set used to tune OPT doesn't contain any chain of thought.
MysteryInc152 t1_j7g83pw wrote
Reply to comment by Cheap_Meeting in [D] List of Large Language Models to play with. by sinavski
GLM-130B is really really good. https://crfm.stanford.edu/helm/latest/?group=core_scenarios
I think some instruction tuning is all it needs to match the text-davinci models
MysteryInc152 t1_j7fymgb wrote
Reply to comment by Cheap_Meeting in [D] List of Large Language Models to play with. by sinavski
don't think so
MysteryInc152 t1_j7c1kwr wrote
GLM-130b https://huggingface.co/spaces/THUDM/GLM-130B
Cohere's models https://cohere.ai/
Aleph Alpha's models https://app.aleph-alpha.com/
MysteryInc152 t1_j6p0ipa wrote
Reply to comment by Zetsu-Eiyu-O in [D] Generative Model FOr Facts Extraction by Zetsu-Eiyu-O
Sure
MysteryInc152 t1_j6okowf wrote
Reply to comment by Zetsu-Eiyu-O in [D] Generative Model FOr Facts Extraction by Zetsu-Eiyu-O
Not sure what you mean by penalize but say you wanted an LLM that wasn't instruction fine-tuned to translate between 2 languages it knows.
Your input would be
Language x: "text of language x"
Language y: "translated language x text"
You'd do this for a few examples. 2 or 3 should be good. Or even one depending on the task. Then finally
Language x: "text you want translated"
Language y: The model would translate the text and output here
All transformer generative LLMs work the same way with enough scale. GPT-2 (only 1.5b parameters) does not have the necessary scale.
MysteryInc152 t1_j6o62z6 wrote
This is what In-context learning is for.
Giving the model a few examples of a text input and a corresponding fact extraction is all that's necessary.
MysteryInc152 t1_j6jkmus wrote
Reply to comment by currentscurrents in [N] OpenAI has 1000s of contractors to fine-tune codex by yazriel0
The human brain has trillions of synapses (the closest biological equivalent to parameters), is multimodal and evolution fine-tuned.
MysteryInc152 t1_j60vz8p wrote
Reply to comment by FallUpJV in Few questions about scalability of chatGPT [D] by besabestin
OpenAI's models are still undertrained as well.
MysteryInc152 t1_j5uvo3i wrote
Reply to comment by Kamimashita in [D]Are there any known AI systems today that are significantly more advanced than chatGPT ? by Xeiristotle
Nothing that would beat Open AI's stuff (Google's stuff) is open for inference or finetuning from the public.
I think the best Open source alternative is this
https://github.com/THUDM/GLM-130B
https://huggingface.co/spaces/THUDM/GLM-130B
But it's not finetuned for instruction so you have to prompt/approach it like a text completer. And also you'll need a 4x3090 to get it running locally.
The best open source instruction finetuned models are the flan t5 models
https://huggingface.co/google/flan-t5-xxl
If you're not necessarily looking for open source but still actual alternatives that aren't just an API wraparound of GPT, you can try cohere
Good thing is that it's completely free for non commercial or non production use
or alephalpha
Not free but the pricing is decent and they have a visual language model as well. Something like flamingo
https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model
MysteryInc152 t1_j5tits4 wrote
Reply to [D]Are there any known AI systems today that are significantly more advanced than chatGPT ? by Xeiristotle
Google has few systems that would beat current public SOTA models. PALM/Minerva/Med Palm is the best but Flamingo, Chinchilla/Sparrow would also best chatGPT.
Dunno about anything from meta. They have open source GPT models released but they're not as good as Open AI's stuff.
MysteryInc152 t1_j5eyfnm wrote
Reply to [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
Any watermark that couldn't easily be bypassed (paraphrase, switching out words every nth word etc) would cripple the output of the model. In fact even the simple watermarks could have weird effects on output.
MysteryInc152 t1_j50ym7g wrote
Reply to comment by Daos-Lies in [D] Inner workings of the chatgpt memory by terserterseness
There's a repo that actually uses embeddings for long term conversations you can try out.
MysteryInc152 t1_j50pw6e wrote
Reply to comment by IntelArtiGen in [D] Inner workings of the chatgpt memory by terserterseness
With embeddings, it should theoritically not have a hard limit at all. But experiments here suggest a sliding context window of 8096
https://mobile.twitter.com/goodside/status/1598874674204618753?t=70_OKsoGYAx8MY38ydXMAA&s=19
MysteryInc152 t1_j8p2jrd wrote
Reply to comment by farmingvillein in [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng
That's fair. we won't know till it's tested for sure.