tysam_and_co
Submitted by tysam_and_co t3_10op6va in MachineLearning
tysam_and_co t1_j26t3fs wrote
This is a cool idea, but why do we have to sign in to the website to use cached prompts? Is this just an onboarding marketing post?
tysam_and_co t1_izwgts5 wrote
Great paper. I'd love to play with the concepts in it one day. That would be cool. Looks like a new paradigm, and as is usual for Hinton, has several decades of thought behind it. A little bit worried about the more philosophical bent he took near the end -- he's getting older and he usually does not have that grave kind of tone in his papers, as I can recall from my personal experience. I hope he, and all he is with and that is around him, is well. :') :')
tysam_and_co t1_izviq7y wrote
I believe that it depends. I've played around with this on one of my projects, though not exhaustively, and it seemed like the averaging helped a lot. Here's the definition line where the global average pooling is: https://github.com/tysam-code/hlb-CIFAR10/blob/d683ee95ff0a8dde5e4b9c9d4425f49de7fe9805/main.py#L332
This is not originally my network, but it's a small stepdown from 4x4 to 1x1 -- only a 16x reduction in information.
That being said, at a higher dimension, neural networks, convnets in this case, can operate like loose classifiers over a "bag of textured features", as was the craze back from some smaller research threads maybe in the 2016-2018/2019 range or so. So in that case, once you get to the higher dimensions, you're just "feature voting" anyways, so you really don't gain or lose too much with the global average pooling.
I'm sure Transformers work in a different kind of way, but that's a vastly different kind of inductive bias that they are using.
tysam_and_co t1_iv3qkvj wrote
Thanks for the comparisons. Multi-GPU is always an interesting one. Hopefully they get things ironed out, there's things on this architecture that I really do like a lot :)
tysam_and_co OP t1_j6g0mvc wrote
Reply to [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co
Hello everyone,
We're continuing our journey to training CIFAR10 to 94% in under 2 seconds, carrying on the lovely work that David Page began when he took that one single-GPU dawnbench entry from over 10 minutes to 24 seconds. Things are getting much, much tighter now as there is not as much left to trim, but we do have a "comfortable" road ahead still, provided enough sweat, blood, and tears are put in to make certain methods work under the (frankly ridiculous) torrent of information being squeezed into this network. Remember, we're breaking 90% having only seen each training set image 5 times during training. 5. times! Then 94% at 10 times. To me, that is hard to believe.
I am happy to answer any questions, please be sure to read the v0.3.0 patch notes if you would like a more verbose summary of the changes that we've made to bring this network from ~12.34-12.38 seconds in the last patch to ~9.91-9.96 seconds in the current one. The baseline of this implementation started at around ~18.1 seconds total, so incredibly we have almost halved our starting speed, and that is only within a few months of the project's start back in October/November of last year.
Please do ask or say anything if it's on your mind, this project hasn't gotten a lot of attention and I'd love to talk to some like-minded people about it. This is pretty darn cool stuff!
Many thanks,
Tysam&co