Submitted by _learn_faster_ t3_1194vcc in MachineLearning
guillaumekln t1_j9nfl9t wrote
You can also check out the CTranslate2 library which supports efficient inference of T5 models, including 8-bit quantization on CPU and GPU. There is a usage example in the documentation.
Disclaimer: I’m the author of CTranslate2.
_learn_faster_ OP t1_j9nuqe3 wrote
For flan-t5 does this only work for a Translation task?
guillaumekln t1_j9nv5n0 wrote
No. Even though the high-level class is named Translator
, it can be used to run any tasks that would work using T5ForConditionalGeneration
in the transformers library.
Viewing a single comment thread. View all comments