Submitted by enryu42 t3_122ppu0 in MachineLearning
liqui_date_me t1_jdrd9dx wrote
Reply to comment by enryu42 in [D] GPT4 and coding problems by enryu42
One could argue that even standardized tests are somewhat boilerplate - if you practice enough SAT tests you’ll eventually do quite well at them, the questions are quite similar to each other from exam to exam. Ditto for AP exams.
I think a serious test for GPT4’s intelligence will be on one of the competitive entrance exams for some countries, like the IIT-JEE or the Gaokao or the International Math Olympiad, where the questions are made by domain experts and are designed to be intentionally difficult and specialized to solve.
enryu42 OP t1_jdrezba wrote
I don't know about IIT-JEE/Gaokao, but many of the problems from the International Math Olympiad are freaking hard. If the model aims for human-level intelligence, such high bar would be unfair - it is more of the realm of "the best human"-level intelligence.
To be fair, hardest problems from "AtCoder Grand" contests have the same issue. But "AtCoder Regular" problems should definitely be solvable by an average human with the right knowledge and skillset, and yet, GPT4 cannot solve anything (and it doesn't look like it is lacking knowledge).
blose1 t1_jdskab0 wrote
These models have access to all human knowledge, all scientific papers, books etc. If I would have such a knowledge I could solve any Olympiad tasks.
visarga t1_jdtxxfd wrote
You're mistaken, Olympiad problems require bespoke tricks that don't generalise from problem to problem. It's not a problem of breadth of knowledge, they don't test memorisation.
blose1 t1_jdu4cln wrote
What? Where I'm exactly mistaken? Because both of my statements are true. And there is 0% chance you can pass olympiad task without knowledge, human with all the knowledge WILL reason and come up with a solution BASED on the knowledge he has AND experience of others that is part of that knowledge, if that weren't true then no human would solve any Olympiad. Sorry, but what you wrote in context of my comment is just ridiculous, and looks like a reply to something I didn't write.
[deleted] t1_jdrxdrd wrote
[deleted]
currentscurrents t1_jdrt3gv wrote
I think all tests designed for humans are worthless here.
They're all meant to compare humans against each other, so they assume you don't have the ability to read and remember the entire internet. You can make up for a lack of reasoning with an abundance of data. We need synthetic tests designed specifically for LLMs.
Yecuken t1_jdsm4w1 wrote
Tests would not help against optimization, models will just learn how to pass the test. Optimization will always win against any problem with a known solution
Viewing a single comment thread. View all comments