Recent comments in /f/deeplearning

BrotherAmazing t1_jdn10wi wrote

I can’t speak directly to the question posed, but I have often observed people/groups that either:

  1. Overparametrize the model and then uses regularization as needed to avoid overfitting

  2. Underparametrize a “baseline” prototype, then work their way up to a larger model until it meets some performance requirement on accuracy, etc.

Time and time again I have seen approach 2 lead to far smaller models that train and run much faster and sometimes yield better test set results than approach 1 depending on the data available during training. I have, of course, seen approach 1 perform better than approach 2 at times, but if you have an accuracy requirement and ramp up the model complexity in approach 2 until you meet/exceed it, you still met your requirement and end up with a smaller faster to run/train model.

4

Readityesterday2 t1_jdmvd9w wrote

It’s gonna be like OS. Some closed and some open sources. Transformers are open sourced and not exactly rocket science. Plus with alpaca and dolly, anyone can have an LLM on their desktop doing narrow domain tasks like responding to customer queries or even coding. With your copilot in an airgapped environment you can shove in corporate secrets.

Imagine a cybersecurity company with founders with immense knowledge in the domain. They transfer their knowledge to an internally hosted model.

Now any junior associate can generate pro-level strategy or find solutions. Making the whole company a monster in productivity.

That’s where we are headed. And your Uptrain will be critical in starting the revolution.

I’ll gladly pay uptrain for premium training data set.

7

Fledgeling t1_jdmv8pp wrote

No, we're finding that with new tuning techniques, improved datasets and some multimodal systems we can get our much smaller models to perform just as good as the big boys at many complex tasks. This is a huge area of research right now and is also showing us that our large models themselves also have huge growth potential we have yet to unlock.

The main benefit I've seen from this is not needing to run these massive things across multiple nodes or multiple GPUs which makes building a massive inference service much easier.

2

suflaj t1_jdlruqe wrote

Could you share those questions it supposedly hallucinated on? I have not see it hallucinate EVER on new chats, only when the hallucination was based on that chat's hiatory.

> Of course it aces tests and leetcode problems it was trained on.

It does not ace leetcode. This statement casts doubt about your capabilities to objectively evaluate it.

> How do you even get an unbiased estimate of test error?

First you need to define unbiased. If unbiased means no direct dataset leak, then the existing evaluation is already done like that.

> Doesn’t even begin to approach the footholds of AGI imo.

Seems like you're getting caught on the AI effect. We do not know if associative memory is insufficient to reach AGI.

> No world model. No intuition.

Similarly, we do not know if those are necessary for AGI. Furthermore, I would dare you to define intuition, because depending on your answer, DL models inherently have that.

10

Jaffa6 t1_jdlk3j9 wrote

There was a paper a while back (Chinchilla?) that indicated that for the best results, model size and the amount of data you give it should grow proportionally and that many then-SotA models were undertrained in terms of how much data they were given. You might find it interesting.

But as a tangent, I think ML focuses too much on chasing accuracy. You see it constantly in SotA papers where they're claiming things like "We improved our GLUE score by 0.1 compared to SotA, and all it took was spending the GDP of Switzerland on electricity and GPUs!"

And it's still a model that hallucinates way too much, contains bias, and just generally isn't worth all that time, money, and pollution.

14