Viewing a single comment thread. View all comments

whata_wonderful_day t1_jbcxdwf wrote on March 8, 2023 at 3:23 AM

Reply to comment by adt in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__

Nice! How did you get access to Megatron-11B? I can't find it online anywhere

Jepacor t1_jbdrovb wrote on March 8, 2023 at 8:57 AM

The link to the model is in the Google sheets they linked : https://github.com/facebookresearch/fairseq/blob/main/examples/megatron_11b/README.md

whata_wonderful_day t1_jbhp4gb wrote on March 9, 2023 at 2:53 AM

Thanks, alas I thought it was an encoder model. I've been on the lookout for a big one, largest I've seen is deberta V2 with 1.5B params