Submitted by michaelthwan_ai t3_121domd in MachineLearning
michaelthwan_ai OP t1_jdlf8g8 wrote
Because the recent release of LLMs has been too vigorous, I organized recent notable models from the news. Some may find the diagram useful, so please allow me to distribute it.
Please let me know if there is anything I should change or add so that I can learn. Thank you very much.
If you want to edit or create an issue, please use this repo.
---------EDIT 20230326
Thank you for your responses, I've learnt a lot. I have updated the chart:
- https://github.com/michaelthwan/llm_family_chart/blob/master/LLMfamily2023Mar.drawio.png
- (Look like I cannot edit the post)
Changes 20230326:
- Added: OpenChatKit, Dolly and their predecessors
- More high-res
To learn:
- RWKV/ChatRWKV related, PaLM-rlhf-pytorch
Models that not considered (yet)
- Models that is <= 2022 (e.g. T5 (2022May). This post is created to help people quickly gather information about new models)
- Models that is not fully released yet (e.g. Bard, under limited review)
gopher9 t1_jdlq1jy wrote
Add RWKV.
Puzzleheaded_Acadia1 t1_jdlx1g3 wrote
What is RWKV?
fv42622 t1_jdm3vtm wrote
Puzzleheaded_Acadia1 t1_jdn4sly wrote
So from what I understand it faster the gpt and saves more VRAM and it can run on gpu what else did I miss
DigThatData t1_jdpza0l wrote
it's an RNN
michaelthwan_ai OP t1_jdpyb80 wrote
added in backlog. Need some time to study. Thanks.
Rejg t1_jdmdspx wrote
I think you are potentially missing Claude 1.0 and Claude 1.2, the Co:Here Suite, and Google Flan models.
ganzzahl t1_jdouip7 wrote
You're definitely missing the entire T5 (encoder-decoder) family of models. From the UL2 paper , it seems encoder-decoder models are more powerful than decoder-only models (such as the GPT family), especially if you're most interested in inference latency.
I do very much wonder if OpenAI has tested equally-sized T5 models, and if there's some secret reason they have found as to why we should stick with GPT models, or if they just are doubling down on "their" idea, even if it is slightly inferior. Or maybe there are newer papers I don't know about.
signed7 t1_jdqm8lt wrote
I'm probably wrong but I think I read somewhere Google has a patent on encoder-decoder, thus everyone else uses decoder-only
maizeq t1_jdlzhql wrote
Would be useful to distinguish between SFT and RLHF tuned models
[deleted] t1_jdpz1d5 wrote
[deleted]
Viewing a single comment thread. View all comments