I can’t say for certain without the code but it looks like the loss is being applied to every hidden unit (as a scalar) rather than being distributed based off of each units contribution to the loss (as a vector). Check the shape of your loss as it moves through each layer?
Edit: also, are you applying the total loss or the mean loss? It should be the latter.
HowdThatGoIn t1_ivi1zzn wrote
Reply to In my deep NN with 3 layer, . In the second iteration of GD, The activation of Layer 1 and Layer 2 output all 0 due to ReLU as all the input are smaller than 0. And L3 output some value with high floating point which is opposite to first forward_ propagation . Is this how it should work ? by Emotional-Fox-4285
I can’t say for certain without the code but it looks like the loss is being applied to every hidden unit (as a scalar) rather than being distributed based off of each units contribution to the loss (as a vector). Check the shape of your loss as it moves through each layer?
Edit: also, are you applying the total loss or the mean loss? It should be the latter.