guillaumekln t1_j9nfl9t wrote on February 23, 2023 at 5:34 AM

You can also check out the CTranslate2 library which supports efficient inference of T5 models, including 8-bit quantization on CPU and GPU. There is a usage example in the documentation.

Disclaimer: I’m the author of CTranslate2.

_learn_faster_ OP t1_j9nuqe3 wrote on February 23, 2023 at 8:35 AM

For flan-t5 does this only work for a Translation task?

guillaumekln t1_j9nv5n0 wrote on February 23, 2023 at 8:41 AM

No. Even though the high-level class is named Translator, it can be used to run any tasks that would work using T5ForConditionalGeneration in the transformers library.