Recent comments in /f/deeplearning
BellyDancerUrgot t1_jdldmda wrote
Reply to comment by FirstOrderCat in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Funny cuz , I keep seeing people rave like madmen over gpt4 and chatgpt and I’ve had a 50-50 hit rage wrt good results or hallucinated bullshit with both of them. Like it isn’t even funny. People think it’s going to replace programmers and doctors meanwhile it can’t do basic shit like cite the correct paper.
Of course it aces tests and leetcode problems it was trained on. It was trained on basically the entire internet. How do you even get an unbiased estimate of test error?
Doesn’t mean it isn’t impressive. It’s just one huge block of really good associative memory. Doesn’t even begin to approach the footholds of AGI imo. No world model. No intuition. Just memory.
cameldrv t1_jdldgo8 wrote
I think that fine tuning has its place, but I don't think you're going to be able to replicate the results of a 175B parameter model with a 6B one, simply because the 100B model empirically just holds so much information.
If you think about it from an information theory standpoint, all of that specific knowledge has to be encoded in the model. If you're using 8 bit weights, your 6B parameter model is 6GB. Even with incredible data compression, I don't think that you can fit anywhere near the amount of human knowledge that's in GPT-3.5 into that amount of space.
BellyDancerUrgot t1_jdld7w4 wrote
LLMs are not the route to AGI as they exist today.
And no I don’t think fine tuning itself is enough to choose some other model over the top LLMs. For most LLM focussed work API calls to the big ones will be enough.
FirstOrderCat t1_jdl9uwf wrote
gpt is cool because it memorized lots of info, so you can quickly retrieve it. LM memorization abilities likely dependent on the size.
FesseJerguson t1_jdl5xd4 wrote
Personally I see networks of llms cooperating and a sort of director who controls them being the most powerful "ai". But small llms that are field experts will definitely have their place.
FesseJerguson t1_jdl5bdt wrote
Reply to comment by JasonRDalton in Has anyone tried to use deep learning with CNNs on satellite images to predict malaria risk (or other similar diseases)? by R_K_J-DK
To be fair you could just throw the "RGB" data at it and eventually it would conclude those things and *** possibly find more *** which is the most exciting thing about ml right?
Rishh3112 OP t1_jdl42av wrote
Reply to comment by j-solorzano in Cuda out of memory error by Rishh3112
Sure I will try using a garbage collector in every epoch. Thanks.
JasonRDalton t1_jdkt4a4 wrote
Reply to Has anyone tried to use deep learning with CNNs on satellite images to predict malaria risk (or other similar diseases)? by R_K_J-DK
Object detection in an RGB image doesn’t seem like the right approach for the malaria census. What is the phenomenon you’re looking for? You can’t ‘see’ malaria on the ground, so instead how about looking for conditions that would indicate higher mosquito levels. Like stagnant water mosquito breeding areas, appropriate temperature ranges, lack of predators for mosquitos, low wind speeds, population density, lots of outdoor living, etc. you’ll need some multispectral data but you’ll have better prediction results.
j-solorzano t1_jdk2kod wrote
Reply to Cuda out of memory error by Rishh3112
If it works in CPU but not GPU, even though the GPU should have more memory, the only difference I can think of is garbage collection timing. Try calling the garbage collector in every epoch. Also, note that you have a GRU, which retains tensors.
verboseEqualsTrue t1_jdjjtpz wrote
Reply to Has anyone tried to use deep learning with CNNs on satellite images to predict malaria risk (or other similar diseases)? by R_K_J-DK
likes this? https://arxiv.org/abs/2110.14144 also tf hub used to have a few models for SAR data I believe
paperswithcode is bound to have something as well
BarriJulen t1_jdjgyeg wrote
Reply to Has anyone tried to use deep learning with CNNs on satellite images to predict malaria risk (or other similar diseases)? by R_K_J-DK
Oh interesting. Well maybe you can look up some CNNs in image regression in areas like predicting CO2 or some type of emissions from satellites and try to apply some of those techniques.
R_K_J-DK OP t1_jdjgdqm wrote
Reply to comment by BarriJulen in Has anyone tried to use deep learning with CNNs on satellite images to predict malaria risk (or other similar diseases)? by R_K_J-DK
The malaria labels are downloaded with the malaria atlas project r package. https://cran.r-project.org/web/packages/malariaAtlas/index.html .
Data points are a location and the malaria rate in that area (but note in some cases the data is not very accurate).
goxdin t1_jdjdwcw wrote
Reply to Cuda out of memory error by Rishh3112
Confirm your video ram - if NVIDIA - run nvidia-smi
BarriJulen t1_jdjdkfr wrote
Reply to Has anyone tried to use deep learning with CNNs on satellite images to predict malaria risk (or other similar diseases)? by R_K_J-DK
And how are you going to get that data from satellite images??? Just curious
stuv_x t1_jdj5l6h wrote
Reply to Cuda out of memory error by Rishh3112
Make sure you’re putting the model into evaluation mode during validation and zero your optimiser.
_vb__ t1_jdiwjqk wrote
Reply to comment by Rishh3112 in Cuda out of memory error by Rishh3112
Are you calling the zero_grad method on your optimizer in every step of your training loop?
Rishh3112 OP t1_jdihvhj wrote
Reply to comment by what_if___420 in Cuda out of memory error by Rishh3112
Even reducing the batch size to 1 is generating the same error.
boosandy t1_jdi7fiy wrote
Reply to Cuda out of memory error by Rishh3112
Use smaller batch size
what_if___420 t1_jdi72rw wrote
Reply to Cuda out of memory error by Rishh3112
Reduce batch size
Rishh3112 OP t1_jdhti46 wrote
Reply to comment by MisterManuscript in Cuda out of memory error by Rishh3112
my model is down here. and my batch size is just 8.
MisterManuscript t1_jdhtan0 wrote
Reply to comment by Rishh3112 in Cuda out of memory error by Rishh3112
You probably have a memory leak somewhere in youe training loop. Either that or your model or batch size is way too big and occupies a lot of vRAM.
Addendum: There's a difference between RAM and vRAM (your GPU's RAM), I hope the 14GB you're talking about is vRAM and not the RAM of your AWS vm.
Rishh3112 OP t1_jdhsvfv wrote
Reply to comment by MisterManuscript in Cuda out of memory error by Rishh3112
Using aws and it have a ram of 14gb
MisterManuscript t1_jdhrqhl wrote
Reply to Cuda out of memory error by Rishh3112
What GPU are you using? How much vRAM does it have?
humpeldumpel t1_jdhpl0w wrote
Reply to comment by trajo123 in Cuda out of memory error by Rishh3112
And also make use of the training and validation mode of the model
FirstOrderCat t1_jdldywl wrote
Reply to comment by BellyDancerUrgot in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
I think people amazed by progress speed, OpenAI got 10B fundning, they built strong team, and they can continue expanding system with missing components.