Sad-Comedian-711 t1_jcqgv1x wrote on March 18, 2023 at 7:40 PM

Reply to comment by super_deap in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

This approach has been shown to work. Longformer even provided a script that did this for you: https://github.com/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb

I think for flash attention you do not want to use Longformer's attention though, you want to use Big Bird's with specific block sizes or something like that.