Ulfgardleo t1_j7y8hdg wrote
The difference between stats and ml is as large as between math and applied math. They aim to answer vastly different questions. In ml you don't care about identifiability because you don't care whether there is a gene among 2 millions that cause a specific type of cancer. This is not what ml is about. In ML you also very rarely care about tail risk (you should) and almost nothing about calibration (you really should). And identifiability is out of the window as soon as you use neural networks and that prevents you from interpreting your models.
I-am_Sleepy t1_j7ybb41 wrote
I don’t think ML researcher didn’t care about model calibration or tail risks. Just it often doesn’t came up in experimental settings
It also depends on the objective. If your goal is regression or classification, then tail risk and model calibration might be necessary as supporting metrics
But for more abstract use case such as generative modeling, it is debatable if tail risk and model calibration actually matter. For example GANs model can experience mode collapse such that the generated data isn’t as diverse as the original data distribution. But it doesn’t mean the model is totally garbage either
Also I don’t think statistics and ML is totally different, because most of statistical fundamentals is also ML fundamentals. And such many of ML metrics is directly derive from fundamental statistics and / or related fields
Ulfgardleo t1_j7yd02x wrote
You are right, but the point I was making that in ml in general those are not of high importance and this already holds for rather basal questions like:
"For your chosen learning algorithm, under which conditions holds that: in expectation over all training datasets of size n, the Bayes risk is not monotonously increasing with n"
One would think that this question is of rather central importance. Yet no-one cares, and answering this question is non-trivial for linear classification already. Stats cares a lot about this question. While the math behind both fields is the same, (all applied math is a subset of math, except if you people who identify as one of both) the communities have different goals.
BrotherAmazing t1_j86kxmq wrote
You should say “…between pure mathematics and applied math” IMO. Nit-picky, yes, but more accurate.
Ulfgardleo t1_j87y15c wrote
Sorry that was a wrong translation from how we say it over here.
canbooo t1_j7z0lku wrote
I agree with the size of the difference yet disagree with the examples as there is ml research considering all 3 (causal ml, conformal ml/predictions/forecasting, AI safety, reliability etc.) I think the difference is more like deduction and induction in a sense, meaning the process of finding the answers are different. Since finishing pooping on corporate time, I will keep this short.
ML: Data -> Method -> Hypothesis -> Answers
Statistics: Hypothesis -> Method -> Data -> Answers
This may be too simplistic and please propose a better distinction but do not postulate that ML does not care about things statistics do.
Viewing a single comment thread. View all comments