Submitted by kaphed t3_124pbq5 in MachineLearning

Looking at some old tables:, Table 4, Table 2

Why do the ResNet-152 results vary? E.g. Top-1 error on ImageNet validation set is 19.38 in the original, but 22.2 in the EfficientNet paper.

Normally I would assume these type of results would be copied from the previous publication.



You must log in or register to comment.

suflaj t1_je1uvo8 wrote

They probably redid the experiments themselves. Also, ResNets had some changes shortly after release I believe, and they could have used different pretraining weights. AFAIK He et al. never released their weights.

Furthermore, Wolfram and PyTorch pretrained weights are also around 22% top-1 error rate, so that is probably the correct error rate. Since PyTorch provides weights that reach 18% top-1 error rate with some small adjustments to the training procedure, it is possible the authors got lucky with the hyperparameters, or employed some techniques they didn't describe in the paper.


U03B1Q t1_je4xekj wrote

There was an ASE paper that found that even under identical hyperparameter seed settings networks had a variance of about 2% due to non-determinism in the parallel computing workflow. If they chose to retrain it instead of copying the old numbers, this performance discrepancy is in line with this work.