PassingTumbleweed t1_j46sco1 wrote
Reply to comment by Raphaelll_ in [R] Is there any research on allowing Transformers to spent more compute on more difficult to predict tokens? by Chemont
That depends on what you mean. I don't think any of the LLMs use it, but it has some citations and follow-up literature.
Viewing a single comment thread. View all comments