BrotherAmazing
BrotherAmazing t1_iyazrs4 wrote
Reply to comment by Difficult-Race-1188 in Neural Networks are just a bunch of Decision Trees by Difficult-Race-1188
Again, they can approximate any function or algorithm. This is proven mathematically.
Just because people are confounded by examples of DNNs that don’t seem to do what they want them to do, and just because people do not yet understand how to construct DNNs that exist that can indeed do these things does not mean they are “dumb” or limited.
Perhaps you are constructing them wrong. Perhaps the engineers are the dumb ones? 🤷🏼
Sometimes people literally argue, just with plain english and not mathematics, that basic mathematically proven concepts are not true.
If you had a mathematical proof that showed DNNs were equivalent to decision trees or incapable of performing certain tasks, with a mathematical proof, neat! If you argue DNNs can’t perform tasks that can be reduced to functions or algorithms though, and do it in mere language without mathematical proofs, I’m not impressed yet!
BrotherAmazing t1_iyaux7r wrote
A deep neural network can approximate and function.
A deep recurrent neural network can approximate any algorithm.
The are mathematically proven facts. Can the same be said about “a bunch of decision trees in hyperspace”? If so, then I would say “a bunch of decision trees in hyperspace” are pretty darn powerful, as are deep neural networks. If not, then I would say the author has made a logical error somewhere along the way in his very qualitative reasoning. Plenty of thought experiments in language with “bulletproof” arguments have led to “contradictions” in the past, only for a subtle logical error to be unveiled when we stop using language and start using mathematics.
BrotherAmazing t1_iw18e2b wrote
Usually if they share their dataset and problem with you, you can find something that is incredibly simple (just normal learning rate decay) and an alternative to gradient clipping, showing it was only “crucial” for their setup but not “crucial” in general, if you spend just a few hours on the problem and have extensive experience with designing and training deep NNs from scratch, and it will work just as well.
Often you can analyze datasets to see which mini-batches had the gradient exceeding various thresholds and understand what training examples led to large gradients and why, and pre-process the data, get rid of the need for clipping, and since the whole thing is nonlinear that might completely invalidate their other hyperparams once the training set is “cleaned up”.
Not saying this is what is going on here with this research group, but you’d be amazed how often this is the case and some complex trial-and-error is being done just to avoid debugging and understanding why the simpler approach that should have worked didn’t.
BrotherAmazing t1_it935fx wrote
Reply to comment by Chefbook in [D] Accurate blogs on machine learning? by likeamanyfacedgod
The Stanford YT-posted lectures from TAs who helped Fei-Fei like Justin Johnson and Serena Yeung are great too, but those aren’t “blogs” so much as lectures. Justin’s UMich web page has some cool stuff on it.
BrotherAmazing t1_it92qjb wrote
Reply to comment by acdjent in [D] Accurate blogs on machine learning? by likeamanyfacedgod
Lulz I just posted that as an example of a good blog without having scrolled down to see your response until now. 😆
BrotherAmazing t1_it929ik wrote
Yes, don’t look at blogs that are basically .com “journalism for hire” crap where people are simply trying to make a buck as a “side gig” unless the author is mildly famous/respected, and look at blogs like this from someone who is an actual researcher with not just a PhD but who has experience with a job in AI/ML:
This is just one example. There are many more out there that are good blogs that are mostly accurate (everyone is entitled to a mistake once in a while).
BrotherAmazing t1_isosnle wrote
Reply to comment by redditnit21 in Testing Accuracy higher than Training Accuracy by redditnit21
Then indeed I would try different randomized training/test set splits to rule that out as one step in the debugging.
BrotherAmazing t1_iso8zyl wrote
Reply to comment by No_Slide_1942 in Testing Accuracy higher than Training Accuracy by redditnit21
If you have a different problem where this happens without dropout then you may indeed want to make sure the training/test split isn’t a “bad” one and do k-fold validation.
The other thing to check would be other regularizers you may be using during training but not test that make it harder for the network to do well on training sets; i.e., you can dial down data augmentation if you are using that, and so on.
Things people have touched upon already for the most part, but this is very common to see when using dropout layers.
BrotherAmazing t1_iso8emo wrote
It’s what other people have already said: This is extremely common to see if you’re using dropout. There’s nothing necessarily wrong here either, and this network might outperform a network (on test data—the data we care about!) that is trained without dropout and gets higher training accuracy.
Here is how you can prove it to yourself:
-
You can keep dropout activated during test time as an experiment and see that the test accuracy, when dropout remains on, does indeed decrease to be below the training accuracy.
-
You can keep everything else fixed and just parametrically dial down the dropout % in each dropout layer. Usually 0.5 (50%) is a default, but you’ll see for a fixed training/test split that as that parameter goes from 0.5 —> 0.25 —> 0.1 —> 0.05 —> 0 that the training accuracy will increase back to be at/above the test accuracy.
You can also rule out the possibility you had a rare split that led to an easy test set and hard training set by splitting randomly over and over and seeing that this phenomenology is not rare, but the norm across nearly all splits, but if 1 and 2 above exhibit behavior consistent with dropout being the reason, then I see this last exercise as a waste of time unless you just want to win an argument against someone who insists it is due to a “bad” split. If they really insist that vs. just propose it as a possible reason, then they don’t have much real-world experience using dropout! This is very common, nothing wrong, telltale sign of dropout.
BrotherAmazing t1_isfrx3g wrote
Reply to comment by MTGTraner in [D] Could a ML model be used for Image Compression? by midasp
And might need ‘N’ decompressors on your PC for ‘N’ files, and the size of those decompressors might he so large that it starts to outweigh the savings in compression. I mean, a decompressor that knows what the text is can magically “decompress” a file of size 0 with nothing in it to the original text. lol
BrotherAmazing t1_is8rh9t wrote
Reply to comment by badabummbadabing in [D] Are GAN(s) still relevant as a research topic? or is there any idea regarding research on generative modeling? by aozorahime
I agree with the assessment of adversarial games and their wide ranging utility, but to be fair, diffusion models can be applied to many tasks in which one requires a generative model; i.e., they’re useful in a much more wide ranging set of applications than just “generating pretty pictures”.
BrotherAmazing t1_ir3etpb wrote
Reply to comment by porygon93 in A wild question? Why CNNs are not aware of visual quality? [D] by ThoughtOk5558
I think someone didn’t understand what you meant and downvoted or downvoted because you didn’t define ‘z’ and ‘x’ and so on, but I know what you mean and you’re correct. This is another way of looking at it that is completely right.
p(x) for all these images under a CIFAR-10 world is basically 0, but your CNN is not computing that or factoring that in and is just assuming the input images are good images, then estimating the probability of airplane vs. bird for these nonsense images given that they are not nonsense images and given that they come from the same pdf as CIFAR-10….. which is a very very false assumption!
BrotherAmazing t1_ir3dmwz wrote
Reply to comment by ThoughtOk5558 in A wild question? Why CNNs are not aware of visual quality? [D] by ThoughtOk5558
Nearly every data-driven approach to regression and purely discriminative classification has this problem, and it’s a problem of trying to extrapolate far outside the domain that you trained/fit the model in. It’s not about anything else.
Your generated images clearly look nothing like CIFAR-10 training images, so it’s not much different than if I fit two Gaussians to data that was Gaussian in 2-D using samples that all fit within the sphere of radius 1, then I send a 2-D feature measurement into my classifier than is a distance 100 from the origin. Any discriminative classifier that doesn’t have a way to detect outliers/anomalies will likely be extremely confident in classifying this 2-D feature as one of the two classes. We would not say that the classifier has a problem not considering “feature quality”, but would say it’s not very sophisticated.
In the real world in critical problems, CNNs aren’t just fed images like this. Smart engineers have ways to detect if an image is likely not in the training distribution and throw a flag to not have confidence in the CNN’s output.
BrotherAmazing t1_iqwgcms wrote
Reply to comment by PassionatePossum in [D] Model not learning data by Imaginary_Carrot4092
You can’t tell if the data is “fairly random” or not just based on that plot though. Once a blue dot of finite size plots over another, a density of two dots nearly on top of one another will appear identically to the human eye as 10 or 100 or any N > 1 dots plotted almost entirely on top of another.
Unfortunately, OP doesn’t provide anything close to enough information for anyone here to truly be able to diagnose the problem (what is the theoretical relationship between these inputs/outputs?), or even if there is a problem; i.e., what would we estimate the Bayes Error Rate to be for this problem and what loss would that yield?
BrotherAmazing t1_iqtiyud wrote
Reply to [D] Types of Machine Learning Papers by Lost-Parfait568
This looks more like a “meme”-tag worthy post than “discussion”.
BrotherAmazing t1_iyeq8zq wrote
Reply to comment by Difficult-Race-1188 in Neural Networks are just a bunch of Decision Trees by Difficult-Race-1188
Interesting—I will have a read when I have time to read and check the math/logic. Thanks!
I do think I am allowed to remain skeptical for now because this was just posted as a pre-print with a single author a month ago and has not been vetted by the community.
Besides, if there is an equivalence between recurrent neural networks, convolutional neural networks, fully connected networks, policies learned with deep reinforcement learning, and all of this regardless of the architecture, how the network is trained, and so on, and there always exists a decision tree that is equivalent, then I would say:
Very interesting
Decision trees are then more flexible and powerful than we give them credit for, not NNs are less flexible and less powerful than they have been proven to be.
What is it about decision trees that makes people not use them in practice for anything too complicated on full motion video, etc? How does one construct the decision tree “from scratch” via training except by training the NN first, then building a decision tree that represents the NN? I wouldn’t say “they’re the same” from an engineering and practical point of view if one can be trained efficiently and the other cannot, but can only be built once the trained NN already exists.