What. The. ***k. [less than 1B parameter model outperforms GPT 3.5 in science multiple choice questions]
Submitted by Destiny_Knight t3_118svv7 in singularity
Reply to comment by sumane12 in What. The. ***k. [less than 1B parameter model outperforms GPT 3.5 in science multiple choice questions] by Destiny_Knight
Yes, and it does it with only 0.4% the size of GPT3, possibly enough to run on a single graphics card.
It uses language and pictures together instead of just language.
Fucking wow!
Yeah it's fucking nuts.
What is the "catch" here? It sounds too good to be true
The catch is that it only outperforms large models in a narrow domain of study. It's not a general purpose tool like the really large models. That's still impressive though.
Can It be fine tuned ?
You can tune it to another data set and probably get good results, but you have to have a nice, high quality data set to work with.
I’m working on one that’s trained on JFK speeches and Bachlorette data to help people with conversation skills.
I can't tell if this is a joke or real
It’s real. Gonna launch after GME moons
Sounds like a viable AI implementation to me. I'll be your angel investor and throw some Doge your way or something.
I don't think that's true, but I do believe it was finetuned on the specific dataset to achieve the SOTA result they did.
It chooses the correct answer from multiple choices. it isn't actually comparable to chatGtp.
Where can I get one? I'll take 20
Around 4GB vram, maybe 2GB to run it.
That’s so cool!! That’s how humans remember things, too
amazing.
does that prove that parameters aren't everything?
It was shown recently that for LLMs ~0.01% of parameters explain >95% of performance.
But higher parameters allow for broader knowledge right? You can't have a 6-20B model have broad knowledge as a 100B+ model, right?
At this point we don't really know what is bottlenecking. More params is an easyish way to capture more knowledge if you have the architecture and the $$... but there are a lot of other techniques available that increase the efficiency of the parameters.
Yes but how many parameters must you actually have to store all the knowledge you realistically need. Maybe a few billion parameters is enough to store the basics of every concept known to man and more specific details can be stored in an external file that the neural net can access with API calls.
You mean like a LoRA?
We already knew parameters aren't everything, or else we'd just be using really large feedforward networks for everything. Architecture, data, and other tricks matter too.
Its much small enough to run on a single graphics card
[deleted]
Viewing a single comment thread. View all comments