IluvBsissa t1_j9j8ubb wrote on February 22, 2023 at 10:51 AM

I don't get it. Why are they comparing their model's performance to regular humans and not experts, like every other papers ? Does it mean these tests are "average difficulty" ? I read somewhere that GPT3.5 had a 55.5% score on MMLU, while PalM was at 75 and human experts 88.8. How would this CoT model perform on standards benchmarks, then ? I feel scammed rn.