Submitted by Vae94 t3_z7rn5o in MachineLearning
eeng_ t1_iy82r1q wrote
This is probably obvious to you, but most of the frames in a long video are redundant and provide little additional information. You could easily extract some key frames (eg substract previous frame from current frame and apply a fixed threshold), then run your network only on key frames and then ensemble these key frame predictions into a single label per video.
Vae94 OP t1_iy8fy1f wrote
Yes. Thanks for sanity check!
I was thinking of first coming up with algorithm to find outliers and the training LSTM only on the outliers, for that I should assemble some meta-algorithm I guess and train both LSTM and trimming network at the same time.
I was wondering if something like this exists in literature already?
Viewing a single comment thread. View all comments