Submitted by Chemont t3_109z8om in MachineLearning
I recently came across " Confident Adaptive Language Modeling " which allows Transformers to exit early during inference and not use all model layers if a token is easy to predict. Is there any research on basically doing the opposite and allowing Transformers to spent more compute on tokens that are very hard to predict?
rehrev t1_j414ibv wrote
What does early stopping inference mean tho