Recent comments in /f/deeplearning
BrotherAmazing t1_jdn10wi wrote
I can’t speak directly to the question posed, but I have often observed people/groups that either:
-
Overparametrize the model and then uses regularization as needed to avoid overfitting
-
Underparametrize a “baseline” prototype, then work their way up to a larger model until it meets some performance requirement on accuracy, etc.
Time and time again I have seen approach 2 lead to far smaller models that train and run much faster and sometimes yield better test set results than approach 1 depending on the data available during training. I have, of course, seen approach 1 perform better than approach 2 at times, but if you have an accuracy requirement and ramp up the model complexity in approach 2 until you meet/exceed it, you still met your requirement and end up with a smaller faster to run/train model.
howtorewriteaname t1_jdmy2ik wrote
This could definitely work, given that you have the right data and in great amounts. I believe that is the biggest challenge for this kind of model, more than the learning method.
Readityesterday2 t1_jdmvd9w wrote
It’s gonna be like OS. Some closed and some open sources. Transformers are open sourced and not exactly rocket science. Plus with alpaca and dolly, anyone can have an LLM on their desktop doing narrow domain tasks like responding to customer queries or even coding. With your copilot in an airgapped environment you can shove in corporate secrets.
Imagine a cybersecurity company with founders with immense knowledge in the domain. They transfer their knowledge to an internally hosted model.
Now any junior associate can generate pro-level strategy or find solutions. Making the whole company a monster in productivity.
That’s where we are headed. And your Uptrain will be critical in starting the revolution.
I’ll gladly pay uptrain for premium training data set.
Fledgeling t1_jdmv8pp wrote
No, we're finding that with new tuning techniques, improved datasets and some multimodal systems we can get our much smaller models to perform just as good as the big boys at many complex tasks. This is a huge area of research right now and is also showing us that our large models themselves also have huge growth potential we have yet to unlock.
The main benefit I've seen from this is not needing to run these massive things across multiple nodes or multiple GPUs which makes building a massive inference service much easier.
suflaj t1_jdmr9t8 wrote
Reply to comment by whispering-wisp in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
You can, but not on a fresh chat or without an unpatched jailbreak method. Also, I think you are referring to ChatGPT, which is very different from GPT4. Most researches haven't even had the time to check out GPT4 given it's behind a paywall and limited to 25 requests per 3 hours.
nixed9 t1_jdmpbks wrote
Reply to comment by BellyDancerUrgot in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
I don’t think this is accurate. Are you sure you were using GPT-4? It’s leaps and bounds better than text-davinci-003 which was chatGPT3.5
StrippedSilicon t1_jdmhdac wrote
Reply to comment by BellyDancerUrgot in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
You are wrong, it does well on problems completely outside of it's training data. There's a good look here: https://arxiv.org/abs/2303.12712
It's obviously not just memorizing, it has some kind of "understanding" to be able to do this.
whispering-wisp t1_jdmfbpk wrote
Reply to comment by suflaj in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
One of the researchers found a little while ago that you could get gpt to hallucinate that it opened urls and was reading or summarizing content. Some of it was RNG.
I believe at least for the urls , it was fixed and it is more consistent about telling you it doesn't have a live feed.
whispering-wisp t1_jdmeu88 wrote
It entirely depends on the task. If you are very general like open AI's gpt ? Yes. It needs a ton.
If you are training it for one specific task, then no, probably not.
Are you happy with it being a little more robotic ? Then again, you can drastically cut down on things.
Right tool for the job.
R_K_J-DK OP t1_jdmbcs2 wrote
Reply to comment by Praise_AI_Overlords in Has anyone tried to use deep learning with CNNs on satellite images to predict malaria risk (or other similar diseases)? by R_K_J-DK
I also know that crossing a mosquito with a mountain climber is impossible, because you can't cross a vector with a scaler!
R_K_J-DK OP t1_jdm8u8o wrote
Reply to comment by Praise_AI_Overlords in Has anyone tried to use deep learning with CNNs on satellite images to predict malaria risk (or other similar diseases)? by R_K_J-DK
I know that in malaria ridden areas, muslims are not required to remove shoes when entering their praying buildings, because mosque-y toe control is essential.
Better_Nebula_9790 t1_jdm51jn wrote
Reply to Has anyone tried to use deep learning with CNNs on satellite images to predict malaria risk (or other similar diseases)? by R_K_J-DK
Yes I know that someone used satellite images to detect slave camps for their masters thesis
hemphock t1_jdm2zn4 wrote
Reply to comment by Praise_AI_Overlords in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
please forgive me mr. Praise_AI_Overlords but i also think you might have a biased opinion lol
fysmoe1121 t1_jdm2mtw wrote
deep double descent => bigger is better
Praise_AI_Overlords t1_jdm1iu6 wrote
Reply to Has anyone tried to use deep learning with CNNs on satellite images to predict malaria risk (or other similar diseases)? by R_K_J-DK
Ummmm....
Do you know anything about malaria?
Praise_AI_Overlords t1_jdm0p4g wrote
Reply to comment by hemphock in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
They have no idea what they are talking about.
JasonRDalton t1_jdlyrz3 wrote
Reply to comment by R_K_J-DK in Has anyone tried to use deep learning with CNNs on satellite images to predict malaria risk (or other similar diseases)? by R_K_J-DK
There you go! That sounds great. I bet you’ll do well. Maybe if you find some animal habitat models, population density, etc you can augment further. Would love to hear how it performs.
R_K_J-DK OP t1_jdlxxnu wrote
Reply to comment by JasonRDalton in Has anyone tried to use deep learning with CNNs on satellite images to predict malaria risk (or other similar diseases)? by R_K_J-DK
We are also giving our model other data. So far we give it satellite images, land cover, temperature and precipitation. The hypothesis about the satellite images is that we can get some nuances with it that has been lost in the land cover data.
suflaj t1_jdlruqe wrote
Reply to comment by BellyDancerUrgot in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Could you share those questions it supposedly hallucinated on? I have not see it hallucinate EVER on new chats, only when the hallucination was based on that chat's hiatory.
> Of course it aces tests and leetcode problems it was trained on.
It does not ace leetcode. This statement casts doubt about your capabilities to objectively evaluate it.
> How do you even get an unbiased estimate of test error?
First you need to define unbiased. If unbiased means no direct dataset leak, then the existing evaluation is already done like that.
> Doesn’t even begin to approach the footholds of AGI imo.
Seems like you're getting caught on the AI effect. We do not know if associative memory is insufficient to reach AGI.
> No world model. No intuition.
Similarly, we do not know if those are necessary for AGI. Furthermore, I would dare you to define intuition, because depending on your answer, DL models inherently have that.
FirstOrderCat t1_jdlrsxv wrote
Reply to comment by BellyDancerUrgot in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
first and/or most powerful AGI will likely be closed and owned by corp.
JasonRDalton t1_jdlqni1 wrote
Reply to comment by FesseJerguson in Has anyone tried to use deep learning with CNNs on satellite images to predict malaria risk (or other similar diseases)? by R_K_J-DK
it can’t identify phenomena that don’t appear in the scene at all.
Jaffa6 t1_jdlk3j9 wrote
There was a paper a while back (Chinchilla?) that indicated that for the best results, model size and the amount of data you give it should grow proportionally and that many then-SotA models were undertrained in terms of how much data they were given. You might find it interesting.
But as a tangent, I think ML focuses too much on chasing accuracy. You see it constantly in SotA papers where they're claiming things like "We improved our GLUE score by 0.1 compared to SotA, and all it took was spending the GDP of Switzerland on electricity and GPUs!"
And it's still a model that hallucinates way too much, contains bias, and just generally isn't worth all that time, money, and pollution.
hemphock t1_jdlk30h wrote
Reply to comment by BellyDancerUrgot in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
> LLMs are not the route to AGI as they exist today.
what makes you say this? i'm new to the field
BellyDancerUrgot t1_jdlfsuq wrote
Reply to comment by FirstOrderCat in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Except now it’s closedAI and most of the papers they release are laughably esoteric. I know the world will catch up within months to whatever they pioneer but it’s just funny seeing this happen after they held a sanctimonious attitude for so long.
Appropriate_Ant_4629 t1_jdnliik wrote
Reply to comment by BellyDancerUrgot in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
> chatgpt and I’ve had a 50-50 hit rage wrt good results or hallucinated bullshit with both of them
Which just suggests they're not large enough yet to memorize/encode enough of the types of content you're interested in.