Viewing a single comment thread. View all comments

xtof54 t1_jd467f3 wrote

There are several. either collaboratively (look at together.computer hivemind petals) or on single no gpu machine with pipeline parallelism, but it requires reimplementing for every model, see e.g slowLLM on github for bloom176b

10