Logon1028 OP t1_j0mcyle wrote on December 17, 2022 at 7:33 PM

Reply to comment by Logon1028 in Efficient Max Pooling Implementation by Logon1028

What I ended up doing is using np.indices (multiplied by the stride) to apply a mask to the x and y argmax arrays using an elementwise multiplication. Then I used elementwise division and modulus to calculate the input indexes myself. The only for loop I have in the forward pass now is a simple one for the depth of the input. The backward pass still uses a triple for loop, but I can live with that.

The model I showed in the previous comment now trains in just under 4 minutes. So now I have a roughly 3x performance increase from my original implementation. And I think that is where I am going to leave it.

Thank you for your help. Even though I didn't use all your suggestions directly, it definitely guided me in the right direction. My current implementation is FAR more efficient than any examples I could find online unfortunately.