graphicteadatasci
graphicteadatasci t1_jb9afw5 wrote
Reply to comment by enjakuro in [R] We found nearly half a billion duplicated images on LAION-2B-en. by von-hust
Really? Because copying all your data once is the same as running your dataset twice per epoch instead of once. Doesn't sound right. Unless your test data is drawn from the same dataset and duplication happens before splitting in which case you would certainly expect metric improvements. Or was this a case of duplicating rare text in which case it is the opposite of having duplicate images in LAION.
graphicteadatasci t1_ixh2mk4 wrote
Reply to comment by farmingvillein in [R] Human-level play in the game of Diplomacy by combining language models with strategic reasoning — Meta AI by hughbzhang
But they specifically created a model for playing Diplomacy - not a process for building board game playing models. With the right architecture and processes then they could probably do away with most of that hand-calibration stuff but the goal here was to create a model that does one thing.
graphicteadatasci t1_itp0p1l wrote
This repo is what you want: https://github.com/YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy
graphicteadatasci t1_is4o6c9 wrote
Reply to comment by TenaciousDwight in [P] Understanding LIME | Explainable AI by Visual-Arm-7375
Well yeah, LIME tells you about an existing model, right? So if multiple features are correlated then a model may drop one of the features and the explanations will say that the drop model has no predictive power while the correlated feature is important. But we can drop the important feature and train an equally good model (maybe even better).
graphicteadatasci t1_irqq0jr wrote
Reply to [D] Quantum ML promises massive capabilities, while also demanding enormous training compute. Will it ever be feasible to train fully quantum models? by avialex
But you don't have to do backprop to train a neural network. Even without anything quantum you could do simulated annealing. It's just that SGD is fast and effective.
graphicteadatasci t1_iqzr880 wrote
Reply to comment by jms4607 in [D] Why restrict to using a linear function to represent neurons? by MLNoober
Taylor series are famously bad at generalizing and making predictions on out-of-distribution data. But you are absolutely free to add feature engineering on your inputs. It is very common to take the log of a numeric input and you always standardize your inputs in some way, either trying to bound between 0 and 1 or giving the data mean 0 and std 1. In the same way you could totally look at x*y effects. If you don't have reason why two values should be multiplied with each other then you could try all combinations and feed to a decision forest or logistic regression and see if any come out as being very important.
graphicteadatasci t1_iqv0m4y wrote
Reply to comment by HjalmarLucius in [D] - Why do Attention layers work so well? Don't weights in DNNs already tell the network how much weight/attention to give to a specific input? (High weight = lots of attention, low weight = little attention) by 029187
This the one. A DNN may be a universal function approximator but only if data and n_parameters is infinite. When we have infinite data we can learn y as parameters and when we multiply the parameters with x we get x*y. But we don't have infinite data / infinite parameters and even if we did we don't have a stable method for training infinitely. So we need other stuff.
graphicteadatasci t1_jbdt33t wrote
Reply to comment by enjakuro in [R] We found nearly half a billion duplicated images on LAION-2B-en. by von-hust
Yeah, because there's some very nice results on classification models where they remove data that doesn't contribute to learning and it made training faster and more accurate. But of course I can't remember at all what the paper was called.