MysteryInc152
MysteryInc152 OP t1_j95u3t2 wrote
Reply to comment by Professor_Entropy in [D] Toolformer implementation using only few-shot prompting by MysteryInc152
Seems like something a chain of thought example in the pre prompt would fix more than any deficiency in the approach.
Also eliminating arithmetic errors doesn't mean you'd eliminate logical/reasoning errors.
MysteryInc152 OP t1_j95rp8c wrote
Reply to comment by Taenk in [D] Toolformer implementation using only few-shot prompting by MysteryInc152
MysteryInc152 OP t1_j95r8ni wrote
Reply to comment by ilovethrills in [D] Toolformer implementation using only few-shot prompting by MysteryInc152
Much simpler approach compared to langchain ( and this is self supervised) but they attempt to do the same thing.
MysteryInc152 OP t1_j94ep4b wrote
Reply to comment by blueSGL in [D] Toolformer implementation using only few-shot prompting by MysteryInc152
Yup. That's pretty much it lol
MysteryInc152 OP t1_j93saow wrote
In-context learning is overpowered lol. Really is implicit finetuning.
This is the prompt template
>You are an AI assistant with several tools available to you. The tools are the following:
>
>MATH: This allows you to evaluate mathematical expressions using the math.js library.
>
>NOW: This returns the current date and time. You must not pass any arguments to this tool!
>
>SEARCH: This returns a web search result for the given string argument.
>
>DO NOT USE TOOLS WITHIN TOOLS! KEEP ALL TOOL CALLS SEPARATE FROM EACH OTHER!
>
>User: What is 10 times 14?
>
>Assistant: 10 * 14 is [MATH(10 * 14) -> 140] 140.
>
>User: How old is Donald Trump?
>
>Assistant: Donald Trump is [SEARCH(Donald Trump age) -> 76] 76 years old.
>
>User: What day is it?
>
>Assistant: It is [NOW() -> 13/02/2023, 20:01:35] the 13th of February 2023.
>
>User: Has Avatar the way of water been released yet?
>
>Assistant: [SEARCH(Avatar the way of water release date) -> 22.11.2022] Avatar: The way of water was released on the 22nd of november 2022. Today is [NOW() -> DATE_TODAY] the 13th of February 2023. Therefore, [MATH(2023 > 2022) -> true] it was released last year.
>
>User: Where was the inventor of the light bulb born, and what is the last country his home country was at war with?
>
>Assistant: Thomas Edison, the inventor of the lightbulb, was born in [SEARCH(Thomas Edison birthplace) -> Milan, Ohio] Milan, Ohio. The last country the United States was at war with was [SEARCH(last country US at war with) -> Iraq] Iraq.
>
>User: USER_INPUT
>
>Assistant:
Submitted by MysteryInc152 t3_115x1it in MachineLearning
MysteryInc152 t1_j8wx6tx wrote
Not very necessary. An LLMs Brain might be static itself but the connections it makes between neurons are very much dynamic. That's why in context learning is possible. LLMs already mimic meta learning and fine-tuning when you few shot.
MysteryInc152 t1_j8ppoiq wrote
Reply to comment by swegmesterflex in [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho
I'd rather the basic senses at least (vision as well as audio) be pretrained as well. We know from Multimodal chain of thought as well as scaling laws for generative mixed modal language models that multimodal models far outperform single modal models on the same data and scale. You won't get that kind of performance gain leveraging those basic senses to outside tools.
MysteryInc152 t1_j8p2jrd wrote
Reply to comment by farmingvillein in [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng
That's fair. we won't know till it's tested for sure.
MysteryInc152 t1_j8oj9qx wrote
Reply to [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng
Fantastic work. Thanks for doing this. Good luck scaling to 24b. I hope more catch on because the lack of a limited context length is a game changer.
MysteryInc152 t1_j8fzf1i wrote
Reply to comment by el_chaquiste in Altman vs. Yudkowsky outlook by kdun19ham
Even humans don't start a chain of action without some input. Interaction is not the only form of input for us. What you hear, what you see, what you touch and feel. What you smell. All forms of input that inspire action in us. How would a person behave if he was strolled of all input? I suspect not far off from how LLMs currently are. Anyway streams of input is fairly non trivial especially when LLMs are grounded in the physical world.
MysteryInc152 t1_j85rgjx wrote
Reply to Where are all the multi-modal models? by ReadSeparate
Recently 2 papers were released that dealt with making frozen LLMs multimodal (with coffee and models released).
Blip-2 - https://arxiv.org/abs/2301.12597 https://huggingface.co/spaces/Salesforce/BLIP2
And fromage - https://arxiv.org/abs/2301.13823 https://github.com/kohjingyu/fromage
MysteryInc152 t1_j83uty8 wrote
Reply to comment by adt in Where are all the multi-modal models? by ReadSeparate
Only the 17b and 30b models are multimodal. Still pretty good though for sure.
We also have some recent advances that ground frozen language models to images. Namely BLIP-2 and fromage.
MysteryInc152 t1_j81thry wrote
Reply to comment by Grotto-man in Open source AI by rretaemer1
MysteryInc152 t1_j81hoqz wrote
Reply to comment by Grotto-man in Open source AI by rretaemer1
Here's an improved version of what I just linked. https://inner-monologue.github.io/.
Can't really speak on the quantum computers bit. Don't know how helpful they would be.
MysteryInc152 t1_j81e986 wrote
Reply to comment by rretaemer1 in Open source AI by rretaemer1
Calling Large Language models "sophisticated parrots" is just wrong and weird lol. And it's obvious how wrong it is when you use the se tools and evaluate without any weird biases or undefinable parameters.
This for instance is simply not possible without impressive recursive understanding. https://www.engraved.blog/building-a-virtual-machine-inside/
We give neural networks data and a structure to learn that data but outside that, we don't understand how they work. What I'm saying is that we don't know what individual neurons or parameters are learning or doing. And a neural networks objective function can be deceptively simply.
How you feel about how complex "predicting the next token" can possibly be is much less relevant than the question, "What does it take to generate paragraphs of coherent text?". There are a lot of abstractions to learn in language.
The problem is that people who are telling you these models are "just parrots" are engaging in a useless philosophical question.
I've long thought the "philosophical zombie" to be a special kind of fallacy. The output and how you can interact with it is what matters not some vague notion of whether something really "feels". If you're at the point where no conceivable test can actually differentiate the two then you're engaging in a pointless philosophical debate rather than a scientific one.
"I present to you... the philosophical orange...it tastes like an orange, looks like one and really for all intents and purposes, down to the atomic level resembles one. However, unfortunately, it is not a real orange because...reasons." It's just silly when you think about it.
LLMs are insanely impressive for a number of reasons.
They emerge new abilities at scale - https://arxiv.org/abs/2206.07682
They build internal world models - https://thegradient.pub/othello/
They can be grounded to robotics -( i.e act as a robots brain) - https://say-can.github.io/, https://inner-monologue.github.io/
They can teach themselves how to use tools - https://arxiv.org/abs/2302.04761
They've developed a theory of mind - https://arxiv.org/abs/2302.02083
I'm sorry but anyone who looks at all these and says "muh parrots man. nothing more" is an idiot. And this is without getting into the nice performance gains that come with multimodality (like Visual Language models).
MysteryInc152 t1_j81b73v wrote
Reply to comment by Grotto-man in Open source AI by rretaemer1
Well robotics still has a ways to go to replicate the flexibility and maneuverability of the human body but....nothing really. They've already been combined and to promising results. See here - https://say-can.github.io/
MysteryInc152 t1_j7lghig wrote
Reply to comment by astrange in [N] Google: An Important Next Step On Our AI Journey by EducationalCicada
>No they're not. ChatGPT doesn't do anything, it just responds to you
Yes they are and you can get it to "do things" easily
MysteryInc152 t1_j7lg6rm wrote
Reply to comment by drooobie in [N] Google: An Important Next Step On Our AI Journey by EducationalCicada
I think he's basically saying AI's like chatGPT just output text at the base level. But that's really also a moot point anyway. You can plug in LLMs to be a sort of middle-man interface.
MysteryInc152 t1_j7ja39c wrote
Reply to comment by Cheap_Meeting in [D] List of Large Language Models to play with. by sinavski
I believe the fine-tuning dataset matters as well as the model but I guess we'll see. I think they plan on fine-tuning.
The set used to tune OPT doesn't contain any chain of thought.
MysteryInc152 t1_j7g83pw wrote
Reply to comment by Cheap_Meeting in [D] List of Large Language Models to play with. by sinavski
GLM-130B is really really good. https://crfm.stanford.edu/helm/latest/?group=core_scenarios
I think some instruction tuning is all it needs to match the text-davinci models
MysteryInc152 t1_j7fymgb wrote
Reply to comment by Cheap_Meeting in [D] List of Large Language Models to play with. by sinavski
don't think so
MysteryInc152 t1_j7c1kwr wrote
GLM-130b https://huggingface.co/spaces/THUDM/GLM-130B
Cohere's models https://cohere.ai/
Aleph Alpha's models https://app.aleph-alpha.com/
MysteryInc152 t1_j6p0ipa wrote
Reply to comment by Zetsu-Eiyu-O in [D] Generative Model FOr Facts Extraction by Zetsu-Eiyu-O
Sure
MysteryInc152 t1_j96eaav wrote
Reply to comment by zesterer in Proof of real intelligence? by Destiny_Knight
Your argument and position is weird and that meme is very cringe. You're not a genius for being idiotically reductive.
The problem here is the same as everyone else who takes this idiotic stance. We have definitions for reasoning and understanding that you decide to construe for your ill defined and vague assertions.
You think it's not reasoning ? Cool. Then rigorously define your meaning of reasoning and design tests to comprehensively evaluate it and people on. If you can't do this then you really have no business speaking on whether a language model can reason and understand or not.