Submitted by super_deap t3_11tmpc5 in MachineLearning
Sad-Comedian-711 t1_jcqgv1x wrote
Reply to comment by super_deap in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
This approach has been shown to work. Longformer even provided a script that did this for you: https://github.com/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb
I think for flash attention you do not want to use Longformer's attention though, you want to use Big Bird's with specific block sizes or something like that.
Viewing a single comment thread. View all comments