Viewing a single comment thread. View all comments

Cryptizard t1_j9j6j7x wrote

You are wrong. It’s not experts. It’s randos on mechanical Turk.

48

Bakagami- t1_j9j7a63 wrote

rip, they should've included expert performance as well then

15

Artanthos t1_j9jhm3l wrote

You are setting the bar as anything less than perfect is failure.

By that standard, most humans would fail. And most experts are only going to be an expert in one field, not every field, so they would also fail by your standards.

14

Bakagami- t1_j9jid4u wrote

Wtf are you talking about. It's a benchmark, it's to compare performance. I'm not setting any bar, and I'm not expecting it to beat human experts immediately.

9

SgathTriallair t1_j9knp1a wrote

Agreed. Stage one was "cogent", stage two was "as good as a human", stage three is "better than all humans". We have already passed stage 2 which could be called AGI. We will soon hit stage 3 which is ASI.

−1

jeegte12 t1_j9mocmt wrote

we are a million miles away from AGI.

0

Cryptizard t1_j9j80kk wrote

But then they wouldn’t be able to say that the AI beats them and it wouldn’t be as flashy of a publication. Don’t you know how academia works?

3

Bakagami- t1_j9j8djw wrote

No. I haven't seen anyone talking about it because it beat humans, it was always about it beating GPT-3 with less than 1B parameters. Beating humans was just the cherry on top. The paper is "flashy" enough, including experts wouldn't change that. Many papers do include expert performance as well, it's not a stretch to expect it.

17

Cryptizard t1_j9j8qk5 wrote

The human performance number is not from this paper, it is from the original ScienceQA paper. They are they ones that did the benchmarking.

1

IluvBsissa t1_j9j7tmn wrote

Are you joking or serious ?

1

Cryptizard t1_j9j7x5v wrote

Serious, read the paper.

8

IluvBsissa t1_j9j81ht wrote

My disappointment is unmeasurable and my day is ruined.

7

coumineol t1_j9jdp9z wrote

Really? So the time has come where a small-scale AI model being smarter than "ordinary" humans is not impressive.

22

olivesforsale t1_j9jpxi4 wrote

Awe is so last December - impatience is the new mode. They teased us with the future, now we expect it ASAP!

13

Cryptizard t1_j9jxvg2 wrote

It's not ordinary humans, it's people on mechanical turk who are paid to do them as fast as possible and for as little money as possible. They are not motivated to actually think that hard.

5

coumineol t1_j9k4pf5 wrote

That's prejudice. You don't know that.

−1

Cryptizard t1_j9ka13l wrote

No it is economics, they make less money the longer they stop and think about it.

5