PredictorX1

PredictorX1 t1_j8tpp5p wrote

This presents on-line survey data of parents about the sweet foods their children consume, n = 1,135, with a minimum of n = 20 per state. Ignoring the obvious sampling issues, it'd be interesting to see confidence intervals of the mean for each state. Also, I'm not sure why the data is presented twice: per day, and per week, which is simply 7 times the daily figure.

5

PredictorX1 t1_j6mkzl0 wrote

To be clear, there are neural networks which are "deep", and others which are "shallow" (few hidden layers). From a practical standpoint, the latter have more in common with other "shallow" learning methods (tree-induction, statistical regressions, k-nearest neighbor, etc.) than they do with deep learning.

You're right that many people (especially in the non-technical press) have erroneously used "machine learning" to mean specifically "deep learning", just as they've used "artificial intelligence" to mean "machine learning". Regardless, there are still non-deep machine learning methods and other branches of A.I. In practice, non-deep machine learning represents the overwhelming majority of applications today.

I haven't followed the research as closely in recent years, but I can tell you that, deep learning aside, people have only begun to scratch the surface of machine learning application.

54

PredictorX1 t1_j5rb8gp wrote

>I was in the understanding that two contiguous linear layers in a NN would be no better than only one linear layer.

This is correct: In terms of the functions they can represent, two consecutive linear layers are algebraically equivalent to one linear layer.

1

PredictorX1 t1_j5h5pb5 wrote

The biggest technical challenges I see:

  1. Having enough reference samples from known people
  2. The difference how people write on Reddit and how they write elsewhere (professional articles, e-mail, etc.: presumably used as reference)
  3. If too many Reddit users are being considered, it may all dissolve into mush (estimated probabilities would all be low)
3

PredictorX1 t1_j5h3ymz wrote

>With more compute could it be easy to quickly un Mask all the people on Reddit by using text correlations to non masked publicly available text data?

With labeled samples of text, I think it would be pretty easy to come up with a a likelihood model, giving a reasonable educated guess of the identity of some Reddit members, and I don't think it would take much computing power.

2

PredictorX1 t1_j4azldr wrote

No, but the idea is pretty straightforward. Assuming that experts can provide domain knowledge that can be coded as conditions or rules (IF engine_temperature > 95 AND coolant_pressure < 12 THEN engine_status = "CRITICAL"), these can be used to generate 0/1 flags based on existing data to augment the training variables.

This can be made much more complex by using actual expert systems or fuzzy logic. There are entire sections of the technical library for those. For fuzzy logic, I would recommend:

"The Fuzzy Systems Handbook"

by Earl Cox

ISBN-13: 978-0121942700

3

PredictorX1 t1_j3cacld wrote

>Which is why it's important to not give access to dangerous things into hands of those who could misuse it with catastrophic consequences.

What does "give access" mean, in this context? Information on construction of learning systems is widely available. Also, who decides which people "could misuse it"? You?

1