Submitted by __Maximum__ t3_11l3as6 in MachineLearning
whata_wonderful_day t1_jbcxdwf wrote
Reply to comment by adt in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__
Nice! How did you get access to Megatron-11B? I can't find it online anywhere
Jepacor t1_jbdrovb wrote
The link to the model is in the Google sheets they linked : https://github.com/facebookresearch/fairseq/blob/main/examples/megatron_11b/README.md
whata_wonderful_day t1_jbhp4gb wrote
Thanks, alas I thought it was an encoder model. I've been on the lookout for a big one, largest I've seen is deberta V2 with 1.5B params
Viewing a single comment thread. View all comments