Dartagnjan t1_jdzo44q wrote on March 28, 2023 at 11:16 AM

Reply to [D] Simple Questions Thread by AutoModerator

Is anyone in need of machine learning protégé? I am looking for a doctorate position in the German and English speaking worlds.

My experience is in deep learning, specifically GNNs applied to science problems. I would like to remain in deep learning, broadly but would not mind changing topic to some other application, or to a more theoretical research project.

I am also interested in theoretical questions, e.g. given a well defined problem (e.g. the approximation of the solution of a PDE), what can we say about the "training difficulty", is optimization at all possible (re. Tangent kernel analysis), how architectures help facilitate optimization, and solid mathematical foundations of deep learning theory.

I have a strong mathematical background with knowledge in functional analysis and differential geometry, and also hold a BSc in Physics, adjacent to my main mathematical educational track.

Last week I also started getting into QML with pennylane and find the area also quite interesting.

Please get in touch if you think I could be a good fit for your research group or know an open position that might fit my profile.

Dartagnjan OP t1_j4qs8zx wrote on January 17, 2023 at 4:52 PM

Reply to comment by SetentaeBolg in [D] Are there any results on convergence guarantees when optimizing NNs? by Dartagnjan

Thank for confirming my suspicions. Do you happen to have a reference for that case when optimizations methods influence optimization in such a way to inhibit convergence to some better set of minimas?

Dartagnjan OP t1_j108k4y wrote on December 20, 2022 at 6:50 PM

Reply to comment by JustOneAvailableName in [D] Techniques to optimize a model when the loss over the training dataset has a Power Law type curve. by Dartagnjan

That is what I have already done. So far, the loss just oscillates but remains high, which leads me to believe that either I am not training in the right way i.e. maybe the difference between the easy and hard training examples is too drastic to bridge. Or my model is just not capable of handing the harder examples.

Dartagnjan OP t1_j105tp0 wrote on December 20, 2022 at 6:32 PM

Reply to comment by carbocation in [D] Techniques to optimize a model when the loss over the training dataset has a Power Law type curve. by Dartagnjan

It's a regression problem, but I already tried something similiar. I scaled the loss according to how hard the example is which was derived from a hand crafted heuristic, but I did not get good results with it.

Dartagnjan OP t1_j103ef6 wrote on December 20, 2022 at 6:17 PM

Reply to comment by dumbmachines in [D] Techniques to optimize a model when the loss over the training dataset has a Power Law type curve. by Dartagnjan

I have already tried my own version of selective backprob, but thanks for the link. this is exactly what I was looking for. I want to know how other people implement it and if I did something wrong.
Overfitting on the hard examples is a test that I carried out already multiple times but not yet on the latest experiments. Thanks for reminding me of this. I guess from this I can infer whether my complexity is definitely too low, if I cannot overfit. If I can overfit. If I can overfit on the hard examples it does not mean the model is able to handle easy and hard examples at the same time, still.

Dartagnjan OP t1_j103e5a wrote on December 20, 2022 at 6:17 PM

Reply to comment by trajo123 in [D] Techniques to optimize a model when the loss over the training dataset has a Power Law type curve. by Dartagnjan

Yes, I already have batch_size=1. I am looking to sharding the model on multiple GPUs now. In my case, not being able to predict on the 1% of super hard examples means that those examples have features that the model has not learned to understand yet. The labeling is very close to perfect with mathematically proven error bounds...

> focal loss, hard-example mining

I think these are exactly the keywords that I was missing in my search.