FirstOrderCat t1_jdldywl wrote on March 25, 2023 at 6:37 AM

Reply to comment by BellyDancerUrgot in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

I think people amazed by progress speed, OpenAI got 10B fundning, they built strong team, and they can continue expanding system with missing components.

BellyDancerUrgot t1_jdldmda wrote on March 25, 2023 at 6:33 AM

Reply to comment by FirstOrderCat in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

Funny cuz , I keep seeing people rave like madmen over gpt4 and chatgpt and I’ve had a 50-50 hit rage wrt good results or hallucinated bullshit with both of them. Like it isn’t even funny. People think it’s going to replace programmers and doctors meanwhile it can’t do basic shit like cite the correct paper.

Of course it aces tests and leetcode problems it was trained on. It was trained on basically the entire internet. How do you even get an unbiased estimate of test error?

Doesn’t mean it isn’t impressive. It’s just one huge block of really good associative memory. Doesn’t even begin to approach the footholds of AGI imo. No world model. No intuition. Just memory.

cameldrv t1_jdldgo8 wrote on March 25, 2023 at 6:30 AM

Reply to Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

I think that fine tuning has its place, but I don't think you're going to be able to replicate the results of a 175B parameter model with a 6B one, simply because the 100B model empirically just holds so much information.

If you think about it from an information theory standpoint, all of that specific knowledge has to be encoded in the model. If you're using 8 bit weights, your 6B parameter model is 6GB. Even with incredible data compression, I don't think that you can fit anywhere near the amount of human knowledge that's in GPT-3.5 into that amount of space.

BellyDancerUrgot t1_jdld7w4 wrote on March 25, 2023 at 6:27 AM

Reply to Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

LLMs are not the route to AGI as they exist today.

And no I don’t think fine tuning itself is enough to choose some other model over the top LLMs. For most LLM focussed work API calls to the big ones will be enough.

FirstOrderCat t1_jdl9uwf wrote on March 25, 2023 at 5:43 AM

Reply to Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

gpt is cool because it memorized lots of info, so you can quickly retrieve it. LM memorization abilities likely dependent on the size.

FesseJerguson t1_jdl5xd4 wrote on March 25, 2023 at 4:56 AM

Reply to Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

Personally I see networks of llms cooperating and a sort of director who controls them being the most powerful "ai". But small llms that are field experts will definitely have their place.

FesseJerguson t1_jdl5bdt wrote on March 25, 2023 at 4:50 AM

Reply to comment by JasonRDalton in Has anyone tried to use deep learning with CNNs on satellite images to predict malaria risk (or other similar diseases)? by R_K_J-DK

To be fair you could just throw the "RGB" data at it and eventually it would conclude those things and *** possibly find more *** which is the most exciting thing about ml right?

Rishh3112 OP t1_jdl42av wrote on March 25, 2023 at 4:37 AM

Reply to comment by j-solorzano in Cuda out of memory error by Rishh3112

Sure I will try using a garbage collector in every epoch. Thanks.

JasonRDalton t1_jdkt4a4 wrote on March 25, 2023 at 2:55 AM

Reply to Has anyone tried to use deep learning with CNNs on satellite images to predict malaria risk (or other similar diseases)? by R_K_J-DK

Object detection in an RGB image doesn’t seem like the right approach for the malaria census. What is the phenomenon you’re looking for? You can’t ‘see’ malaria on the ground, so instead how about looking for conditions that would indicate higher mosquito levels. Like stagnant water mosquito breeding areas, appropriate temperature ranges, lack of predators for mosquitos, low wind speeds, population density, lots of outdoor living, etc. you’ll need some multispectral data but you’ll have better prediction results.

j-solorzano t1_jdk2kod wrote on March 24, 2023 at 11:26 PM

Reply to Cuda out of memory error by Rishh3112

If it works in CPU but not GPU, even though the GPU should have more memory, the only difference I can think of is garbage collection timing. Try calling the garbage collector in every epoch. Also, note that you have a GRU, which retains tensors.

verboseEqualsTrue t1_jdjjtpz wrote on March 24, 2023 at 9:11 PM

Reply to Has anyone tried to use deep learning with CNNs on satellite images to predict malaria risk (or other similar diseases)? by R_K_J-DK

likes this? https://arxiv.org/abs/2110.14144 also tf hub used to have a few models for SAR data I believe

paperswithcode is bound to have something as well

BarriJulen t1_jdjgyeg wrote on March 24, 2023 at 8:52 PM

Reply to Has anyone tried to use deep learning with CNNs on satellite images to predict malaria risk (or other similar diseases)? by R_K_J-DK

Oh interesting. Well maybe you can look up some CNNs in image regression in areas like predicting CO2 or some type of emissions from satellites and try to apply some of those techniques.