younesbelkada
younesbelkada OP t1_j14eysb wrote
Reply to comment by SergioSV96 in [D] BLIP is now available on transformers, what are the cool apps you can build on top of it? by younesbelkada
no actually the BLIP demo was on the Hub but the model architecture and weights were not on the library yet
younesbelkada t1_ixdyvls wrote
Reply to comment by JahrudZ in [P] BetterTransformer: PyTorch-native free-lunch speedups for Transformer-based models by fxmarty
because BetterTransformer merges the whole TransformerEncoderLayer operations in a single operation. This is called with the appropriate weights / biases at runtime.
For int8, each linear layer is replaced by the linear layer from bitsandbytes, that are slightly particular. At runtime it decomposes the matrix multiplication in two stages, and this is done with particular CUDA kernels. Therefore since this is not embedded in the fused operation from PyTorch, these two options are mutually exclusive. Please read more about int8 models here: https://huggingface.co/blog/hf-bitsandbytes-integration
younesbelkada t1_ixdwsh6 wrote
Reply to comment by JahrudZ in [P] BetterTransformer: PyTorch-native free-lunch speedups for Transformer-based models by fxmarty
I know at least that this is mutually exclusive with int8, did not tried with DS though.
younesbelkada OP t1_j1adt9t wrote
Reply to comment by matigekunst in [D] BLIP is now available on transformers, what are the cool apps you can build on top of it? by younesbelkada
super cool!!