Ulfgardleo t1_j7y8hdg wrote on February 10, 2023 at 6:33 AM

The difference between stats and ml is as large as between math and applied math. They aim to answer vastly different questions. In ml you don't care about identifiability because you don't care whether there is a gene among 2 millions that cause a specific type of cancer. This is not what ml is about. In ML you also very rarely care about tail risk (you should) and almost nothing about calibration (you really should). And identifiability is out of the window as soon as you use neural networks and that prevents you from interpreting your models.

I-am_Sleepy t1_j7ybb41 wrote on February 10, 2023 at 7:07 AM

I don’t think ML researcher didn’t care about model calibration or tail risks. Just it often doesn’t came up in experimental settings

It also depends on the objective. If your goal is regression or classification, then tail risk and model calibration might be necessary as supporting metrics

But for more abstract use case such as generative modeling, it is debatable if tail risk and model calibration actually matter. For example GANs model can experience mode collapse such that the generated data isn’t as diverse as the original data distribution. But it doesn’t mean the model is totally garbage either

Also I don’t think statistics and ML is totally different, because most of statistical fundamentals is also ML fundamentals. And such many of ML metrics is directly derive from fundamental statistics and / or related fields

Ulfgardleo t1_j7yd02x wrote on February 10, 2023 at 7:28 AM

You are right, but the point I was making that in ml in general those are not of high importance and this already holds for rather basal questions like:

"For your chosen learning algorithm, under which conditions holds that: in expectation over all training datasets of size n, the Bayes risk is not monotonously increasing with n"

One would think that this question is of rather central importance. Yet no-one cares, and answering this question is non-trivial for linear classification already. Stats cares a lot about this question. While the math behind both fields is the same, (all applied math is a subset of math, except if you people who identify as one of both) the communities have different goals.

BrotherAmazing t1_j86kxmq wrote on February 12, 2023 at 12:43 AM

You should say “…between pure mathematics and applied math” IMO. Nit-picky, yes, but more accurate.

Ulfgardleo t1_j87y15c wrote on February 12, 2023 at 8:40 AM

Sorry that was a wrong translation from how we say it over here.

canbooo t1_j7z0lku wrote on February 10, 2023 at 12:34 PM

I agree with the size of the difference yet disagree with the examples as there is ml research considering all 3 (causal ml, conformal ml/predictions/forecasting, AI safety, reliability etc.) I think the difference is more like deduction and induction in a sense, meaning the process of finding the answers are different. Since finishing pooping on corporate time, I will keep this short.

ML: Data -> Method -> Hypothesis -> Answers

Statistics: Hypothesis -> Method -> Data -> Answers

This may be too simplistic and please propose a better distinction but do not postulate that ML does not care about things statistics do.