graphicteadatasci

graphicteadatasci t1_jb9afw5 wrote

Really? Because copying all your data once is the same as running your dataset twice per epoch instead of once. Doesn't sound right. Unless your test data is drawn from the same dataset and duplication happens before splitting in which case you would certainly expect metric improvements. Or was this a case of duplicating rare text in which case it is the opposite of having duplicate images in LAION.

1

graphicteadatasci t1_ixh2mk4 wrote

But they specifically created a model for playing Diplomacy - not a process for building board game playing models. With the right architecture and processes then they could probably do away with most of that hand-calibration stuff but the goal here was to create a model that does one thing.

1

graphicteadatasci t1_is4o6c9 wrote

Well yeah, LIME tells you about an existing model, right? So if multiple features are correlated then a model may drop one of the features and the explanations will say that the drop model has no predictive power while the correlated feature is important. But we can drop the important feature and train an equally good model (maybe even better).

1

graphicteadatasci t1_iqzr880 wrote

Taylor series are famously bad at generalizing and making predictions on out-of-distribution data. But you are absolutely free to add feature engineering on your inputs. It is very common to take the log of a numeric input and you always standardize your inputs in some way, either trying to bound between 0 and 1 or giving the data mean 0 and std 1. In the same way you could totally look at x*y effects. If you don't have reason why two values should be multiplied with each other then you could try all combinations and feed to a decision forest or logistic regression and see if any come out as being very important.

1

graphicteadatasci t1_iqv0m4y wrote

This the one. A DNN may be a universal function approximator but only if data and n_parameters is infinite. When we have infinite data we can learn y as parameters and when we multiply the parameters with x we get x*y. But we don't have infinite data / infinite parameters and even if we did we don't have a stable method for training infinitely. So we need other stuff.

3