Viewing a single comment thread. View all comments

Unlikely-Video-663 t1_j284flc wrote on December 30, 2022 at 9:16 AM

In CNNs you usually already have long range dependencies channel wise - and imho one of the advantages of vit is allowing long range spatial information flow as well.

So channel-wise tokenization would not improve upon CNNs.. maybe?