Viewing a single comment thread. View all comments

suflaj t1_j5qb32y wrote

There is this: https://www.microsoft.com/en-us/research/blog/%C2%B5transfer-a-technique-for-hyperparameter-tuning-of-enormous-neural-networks/

However, it's unlikely to help in your case. The best thing you can do is grid search if you know something about the problem, or just random search. I prefer random search even if I'm am expert for the problem, ESPECIALLY with ML models.

But I'm curious how it takes a long time. You don't have to train the whole dataset. Take 10% for training and 10% for validation, or less if that dataset is huge. You just need enough data to learn something. Then your optimal hyperparameters are a good enough approximation.

Also, it might help to just not tune redundant hyperparameters. Layer sizes are usually such, as is almost any hyperparameter in the Adam family of optimizers besides learning rate and to a lesser extent first momentum. Which ones are you optimizing?

4

NinjaUnlikely6343 OP t1_j5r4d3x wrote

Thanks a lot for the detailed response! Didn't know you could use a portion of the dataset and expect approximately what you'd get with the whole set. I'm currently just testing different learning rates, but I thought about having a go at dropout rate as well.

2

suflaj t1_j5r5bfw wrote

For learning rate you should just use a good starting point based on the batch size and architecture and relegate everything else to the scheduler and optimizer. I don't think there's any point messing with the learning rate once you find one that doesn't blow up your model, just use warmup or plateau schedulers to manage it for you after that.

Since you mentioned Inception I believe that unless you are using quite big batch sizes, your starting LR should be the magical 3e-4 for Adam or 1e-2 for SGD, and you would just use a ReduceOnPlateau scheduler with ex. patience of 3 epochs, cooldown of 2, factor of 0.1 and probably employ EarlyStopping if metric doesn't improve after 6 epochs.

2