Recent comments in /f/deeplearning

BellyDancerUrgot t1_jdldmda wrote

Funny cuz , I keep seeing people rave like madmen over gpt4 and chatgpt and I’ve had a 50-50 hit rage wrt good results or hallucinated bullshit with both of them. Like it isn’t even funny. People think it’s going to replace programmers and doctors meanwhile it can’t do basic shit like cite the correct paper.

Of course it aces tests and leetcode problems it was trained on. It was trained on basically the entire internet. How do you even get an unbiased estimate of test error?

Doesn’t mean it isn’t impressive. It’s just one huge block of really good associative memory. Doesn’t even begin to approach the footholds of AGI imo. No world model. No intuition. Just memory.

15

cameldrv t1_jdldgo8 wrote

I think that fine tuning has its place, but I don't think you're going to be able to replicate the results of a 175B parameter model with a 6B one, simply because the 100B model empirically just holds so much information.

If you think about it from an information theory standpoint, all of that specific knowledge has to be encoded in the model. If you're using 8 bit weights, your 6B parameter model is 6GB. Even with incredible data compression, I don't think that you can fit anywhere near the amount of human knowledge that's in GPT-3.5 into that amount of space.

10

JasonRDalton t1_jdkt4a4 wrote

Object detection in an RGB image doesn’t seem like the right approach for the malaria census. What is the phenomenon you’re looking for? You can’t ‘see’ malaria on the ground, so instead how about looking for conditions that would indicate higher mosquito levels. Like stagnant water mosquito breeding areas, appropriate temperature ranges, lack of predators for mosquitos, low wind speeds, population density, lots of outdoor living, etc. you’ll need some multispectral data but you’ll have better prediction results.

2

j-solorzano t1_jdk2kod wrote

If it works in CPU but not GPU, even though the GPU should have more memory, the only difference I can think of is garbage collection timing. Try calling the garbage collector in every epoch. Also, note that you have a GRU, which retains tensors.

1

R_K_J-DK OP t1_jdjgdqm wrote

The malaria labels are downloaded with the malaria atlas project r package. https://cran.r-project.org/web/packages/malariaAtlas/index.html .

Data points are a location and the malaria rate in that area (but note in some cases the data is not very accurate).

2

MisterManuscript t1_jdhtan0 wrote

Reply to comment by Rishh3112 in Cuda out of memory error by Rishh3112

You probably have a memory leak somewhere in youe training loop. Either that or your model or batch size is way too big and occupies a lot of vRAM.

Addendum: There's a difference between RAM and vRAM (your GPU's RAM), I hope the 14GB you're talking about is vRAM and not the RAM of your AWS vm.

1