JustOneAvailableName
JustOneAvailableName t1_izbnfki wrote
Reply to comment by undefdev in [D] If you had to pick 10-20 significant papers that summarize the research trajectory of AI from the past 100 years what would they be by versaceblues
> What did he claim that he didn't achieve?
Connections to his work are often vague. Yes, his lab tried something in the same extremely general direction. No, his lab did not show it actually worked or what part of the broad direction they went in actually worked. So I am not gonna cite Fast Weight Programmers when I want to write about transformers. Yes, Fast Weight Programmers also argued there are more ways to handle variable sized input than using RNNs. No, I don't think the idea is special at all. The main point of Attention is all you need was that removing something of the then mainstream architecture made it faster (or larger) to train while keeping the quality. It was the timing that made it special, because it successfully went against mainstream and they made it work, not the idea itself.
JustOneAvailableName t1_iy7bqul wrote
Reply to comment by Tgs91 in [D] What method is state of the art dimensionality reduction by olmec-akeru
PCA is also lossy
> You can get very high quality reconstructions, but it won't be an exact match of the original.
With a GAN (among others), a VAE for example is fuzzy
JustOneAvailableName t1_ixv2bz6 wrote
Reply to comment by parabellum630 in [D] Pytorch or TensorFlow for development and deployment? by CodaholicCorgi
ONNX tracers are decent, but aren't that good, I had to rewrite Wav2vec to get both batch size and input length as dynamic axis.
JustOneAvailableName t1_ixid0yr wrote
Reply to comment by ReginaldIII in [D] Schmidhuber: LeCun's "5 best ideas 2012-22” are mostly from my lab, and older by RobbinDeBank
Schmidhuber would have a way better point if he kept it to quality criticism. His OG papers are very often pretty far from the idea he tries to take credit for. Not that the high cited papers aren't a special case of something Schmidhuber also wrote a paper about, but in the same vein you can say that every paper is just a special case of a NN.
JustOneAvailableName t1_iujqrr1 wrote
Reply to comment by alexnasla in [D] When the GPU is NOT the bottleneck...? by alexnasla
> Its 4 sequential layers, Dense+conv1d+lstm+dense
I thinks this is not enough to saturate the A100. Try to 10x the batch size by just repeating the data. Useless for training, but it should increase GPU utilization without increasing disk utilization. Handy to confirm the bottleneck
JustOneAvailableName t1_isjafij wrote
Reply to comment by wrsage in [D] Gpu for machine translation by wrsage
Anyways, good luck on the endeavor. If you need small pointers I am willing to help.
JustOneAvailableName t1_isjabli wrote
JustOneAvailableName t1_isj9qhv wrote
Reply to comment by wrsage in [D] Gpu for machine translation by wrsage
It could be a gigantic coincidence, but did you happen to decide this based on another comment that I just did?
JustOneAvailableName t1_isiyxi4 wrote
Reply to [D] GPU comparison for ML by denisn03
2080 Ti, unless that 1Gb extra is exactly needed
JustOneAvailableName t1_iqqovjt wrote
Reply to [D] Gpu for machine translation by wrsage
You could make an impact with lots of data in a low resource language. You can't make an impact without experience in this area.
The 1060 is absolutely useless for any kind of training, it was a low tier GPU 6 years ago. The older techniques are fine on a CPU.
JustOneAvailableName t1_izbzbaq wrote
Reply to comment by undefdev in [D] If you had to pick 10-20 significant papers that summarize the research trajectory of AI from the past 100 years what would they be by versaceblues
Because Schmidhuber claiming that transformers are based on his work was a meme for 3-4 years before he actually did that. Like here.
There are hundreds more relevant papers to cite and read about (linear scaling) transformers