While I do think this is important for several reasons, personally I don't see it as all that impactful for what I consider AI capable of going forward.

That's bc pretty much all my assumptions for the next couple years are based on the idea of systems that can loop and reflect on their own actions, re-edit code based on error messages, etc. Which they are very good at

Simcurious t1_jdzatox wrote on March 28, 2023 at 8:13 AM

#2,411,731

That's not correct, the benchmark they used only contained codeforce problems from after 2021.

From Horace's tweets: >Considering the codeforces results in the paper (very poor!), they might have only evaluated it on recent problems.

Riboflavius t1_jdzb56p wrote on March 28, 2023 at 8:17 AM

#2,411,762

Replying to ghostfaceschiller (#2,411,289)

I was reading your reply and couldn't help thinking that the italics and then the missing period make it look like the end of it is already red-shifted because we're accelerating so fast.

hadaev t1_jdzcowi wrote on March 28, 2023 at 8:41 AM

#2,411,906

Replying to Seankala (#2,411,255)

Well we usually expect it from not really ds peoples like biologists using ds methods and making such a trivial mistake.

It doesnt seems hard to search matches in text. Unlike other data types.

mrdevlar t1_jdzdi2t wrote on March 28, 2023 at 8:53 AM

#2,411,994

Proof that no matter where you go, it is always going to be possible to make simple mistakes.

ghostfaceschiller t1_jdzduen wrote on March 28, 2023 at 8:58 AM

#2,412,042

Replying to Riboflavius (#2,411,762)

Looolllll

master3243 t1_jdzec5r wrote on March 28, 2023 at 9:06 AM

#2,412,099

Replying to hadaev (#2,411,906)

Seeing how they made sure the bar exam and the math olympiad tests were recent ones that were explicitly stated to not be in the training dataset to avoid contamination, I trusted that all the other reported tests were also as carefully picked to avoid contamination.

VertexMachine t1_jdzehvy wrote on March 28, 2023 at 9:08 AM

#2,412,120

Interesting. Potentially something that might be also used in the ongoing lawsuit against copilot?

rfxap t1_jdzfxd1 wrote on March 28, 2023 at 9:29 AM

#2,412,269

There are other benchmarks to look at though. Microsoft Research tried an early version of GPT-4 on LeetCode problems that were published after the training data cutoff date, and they got results similar to human performance in all difficulty categories: https://arxiv.org/abs/2303.12712 (page 21)

What should we make of that?

mrpickleby t1_jdzg5e8 wrote on March 28, 2023 at 9:33 AM

#2,412,305

Implies that AI will speed the dissemination of information but not necessarily be helpful in creating new thinking.

sb1729 t1_jdzgfff wrote on March 28, 2023 at 9:37 AM

#2,412,331

Replying to Simcurious (#2,411,731)

They mention that in the article.

[deleted] t1_jdzh4h0 wrote on March 28, 2023 at 9:47 AM

#2,412,414

Replying to hadaev (#2,411,906)

[deleted]

Simcurious t1_jdzhbyu wrote on March 28, 2023 at 9:50 AM

#2,412,439

Replying to sb1729 (#2,412,331)

The title implies that they evaluated on data from before 2021 while the source says they didn't.

ppff01 t1_jdzhmhs wrote on March 28, 2023 at 9:54 AM

#2,412,468

Replying to wazis (#2,411,085)

then*

muskoxnotverydirty t1_jdzi41h wrote on March 28, 2023 at 10:01 AM

#2,412,524

Replying to Simcurious (#2,411,731)

It's correct and it's not correct. The article mentions this, but then they say that it's likely that they weren't able to cleanly separate pre-2021 questions on non-coding benchmarks.

keepthepace t1_jdzm2ic wrote on March 28, 2023 at 10:52 AM

#2,413,061

Replying to rfxap (#2,412,269)

That articles with peer-review is not something that should be avoided, even by Microsoft AI, sorry, "Open"AI

hardmaru t1_jdznq2v wrote on March 28, 2023 at 11:11 AM

#2,413,315

Replying to keepthepace (#2,413,061)

Not sure if this article has been peer reviewed

But saw some “peer reviews” on Twitter :)

See: https://twitter.com/sleepinyourhat/status/1638988283018465300

bjj_starter t1_jdzo3zq wrote on March 28, 2023 at 11:16 AM

#2,413,364

Replying to muskoxnotverydirty (#2,412,524)

But that's pure speculation. They showed that a problem existed with training data, and OpenAI had already dealt with that problem and wasn't hiding it at all - GPT-4 wasn't tested on any of that data. Moreover, it's perfectly fine for problems like the ones it will be tested on to be in the training data, as in past problems. What's important is that what it's actually tested on is not in the training data. There is no evidence that it was tested on training data, at this point.

Moreover, the Microsoft Research team was able to repeat some impressive results in a similar domain on tests that didn't exist before the training data cut-off. There isn't any evidence that this is a problem with a widespread effect on performance. It's also worth noting that it seems pretty personal for the guy behind this paper, judging by the way he wrote his tweet.

bjj_starter t1_jdzoafq wrote on March 28, 2023 at 11:18 AM

#2,413,398

This title is misleading. The only thing they found was that GPT-4 was trained on code questions it wasn't tested on.

keepthepace t1_jdzp4ge wrote on March 28, 2023 at 11:26 AM

#2,413,528

Replying to rfxap (#2,412,269)

Could some parts of the dataset be copied into the LeetCode problem or is there a guarantee that these problems are 100% novel?

sigmatrophic t1_jdzpk9m wrote on March 28, 2023 at 11:31 AM

#2,413,596

Honestly I paid for GTP 4... It's a bit better but felt like gtp3 before they dumbed it down.

plocco-tocco t1_jdzpyf8 wrote on March 28, 2023 at 11:35 AM

#2,413,652

I do not see any evidence of this happening in the article. Also, OpenAI claims to have checked for contamination in every benchmark, so I don't see what the author's are trying to show here.

visarga t1_jdzr4tp wrote on March 28, 2023 at 11:47 AM

#2,413,834

This paper scared me more than any other ML paper. I hoped we have 2-3 more years until what they show in there.

abc220022 t1_jdzrsbu wrote on March 28, 2023 at 11:54 AM

#2,413,934

Replying to rfxap (#2,412,269)

Part of the sales pitch behind LeetCode is that you are working on problems that are used in real coding interviews at tech companies. I believe that most LeetCode problems were invented well before they were published on the LeetCode website, so they still could appear in some form in their training data.

is_it_fun t1_jdzs7dw wrote on March 28, 2023 at 11:58 AM

#2,413,993

Replying to hadaev (#2,411,906)

Biologists are such trash nowadays when it comes to any kind of computational / math methods. Back in our grandfather's days they were really hardcore.

[deleted] t1_jdztsmd wrote on March 28, 2023 at 12:13 PM

#2,414,246

Replying to Seankala (#2,411,255)

[removed]

MotionTwelveBeeSix t1_jdzurlg wrote on March 28, 2023 at 12:21 PM

#2,414,400

Replying to master3243 (#2,412,099)

The bar exams recycle the same questions every year, there’s very little original about them. Its a test of pure memorization

milktoasttraitor t1_jdzuw0z wrote on March 28, 2023 at 12:23 PM

#2,414,425

Replying to rfxap (#2,412,269)

If you look at the prompt they show, they clearly gave it hints which tell it the exact approach to use in order to solve the problem. The problem is also a very slight derivative of another existing, very popular problem on the platform (“Unique Paths”).

This is impressive in another way, but not in the way they were trying to show. They didn’t show the other questions it got right, so no way of telling how good or bad the methodology was overall or what hints they gave it. For that question at least, it’s not good and it makes me skeptical of the results.

thelastpizzaslice t1_jdzv7pu wrote on March 28, 2023 at 12:26 PM

#2,414,478

I once asked it for a parody of Miss American Pie about Star Wars Episode 1 and it gave me Weird Al's song verbatim.

londons_explorer t1_jdzwcfo wrote on March 28, 2023 at 12:36 PM

#2,414,656

Replying to keepthepace (#2,413,528)

Problems like this are never 100% novel.

There are always elements and concepts of the problem and solution that have been copied from other problems.

The easiest way to see this is to ask a non-programmer to come up with a 'programming puzzle'. They'll probably come up with something like "Make an app to let me know when any of my instagram friends are passing nearby and are up for hanging out".

Compare that to a typical leetcode problem, and you'll soon see how leetcode problems are really only a tiny tiny corner of what is possible to do with computers.

jrkirby t1_jdzx1ef wrote on March 28, 2023 at 12:41 PM

#2,414,776

Replying to hadaev (#2,411,906)

I'm guessing the hard part is that you can't "untrain" a model. They hadn't thought "I want to benchmark on these problems later" when they started. Then they spent 20K$+ compute on training. Then they wanted to test it. You can easily find the stuff you want to test on in your training dataset, sure. But you can't so easily remove it and train everything again from scratch.

CollectionLeather292 t1_jdzzazr wrote on March 28, 2023 at 1:00 PM

#2,415,150

Replying to wazis (#2,411,085)

Tl:dr

wazis t1_jdzzs1q wrote on March 28, 2023 at 1:04 PM

#2,415,225

Replying to jrkirby (#2,414,776)

Well they can, but it is expensive

Gunhild t1_jdzzw8j wrote on March 28, 2023 at 1:05 PM

#2,415,243

Replying to thelastpizzaslice (#2,414,478)

Clearly a sign of intelligence; even the AI knows you don't mess with perfection.

muskoxnotverydirty t1_je027xh wrote on March 28, 2023 at 1:24 PM

#2,415,684

Replying to bjj_starter (#2,413,364)

Yeah it's speculation. I agree.

> There is no evidence that it was tested on training data, at this point.

I think what the author is trying to say is that for some of these tests there's no evidence it was tested on training data but there's no evidence that it wasn't. But then the ability to generalize in the specific domain of the tests depends on that difference. If nothing else, it would be nice for those who publish test results to show how much they knew whether test data was in the training data. It seems to me that they could automate a search within the training set to see if exact wordage is used.

Wtiaw t1_je04aq2 wrote on March 28, 2023 at 1:39 PM

#2,416,086

> Note that GPT-4 cannot access the Internet, so memorization is the only explanation

this is not true, it was shown through jailbreaks that it could access the internet

krali_ t1_je053hd wrote on March 28, 2023 at 1:45 PM

#2,416,225

Replying to sigmatrophic (#2,413,596)

I'm considering it, if only for plugin support. Wolfram in particular.

truchisoft t1_je05j63 wrote on March 28, 2023 at 1:48 PM

#2,416,297

The funny thing about these posts is that this is clearly propaganda aimed to low effort people.

Anyone caring about this is either blinded by their own prejudice or just too dumb to even try GPT once themselves.

Everyone else does not need someone telling them that even GPT3.5 is incredible for coding (and a lot of other stuff), it is not perfect but it goes a long way, heck, I was even able to make a simple game in less than 3 hours using 99% GPT3.5 code and DALL-E sprites.

VodkaHaze t1_je06t03 wrote on March 28, 2023 at 1:57 PM

#2,416,570

Replying to rfxap (#2,412,269)

> LeetCode problems that were published after the training data cutoff date

A variation of those problems is likely on github before they're posted?

neonwatty t1_je06w3v wrote on March 28, 2023 at 1:58 PM

#2,416,590

Replying to abc220022 (#2,413,934)

absolutely

kesisci123 t1_je0breh wrote on March 28, 2023 at 2:32 PM

#2,417,540

Big memorization machine.

nixed9 t1_je0cugt wrote on March 28, 2023 at 2:39 PM

#2,417,701

Replying to thelastpizzaslice (#2,414,478)

The next logical prompt would be “try again, and make it original.” What happened then?

ArnoF7 t1_je0dzqg wrote on March 28, 2023 at 2:46 PM

#2,417,882

Funnily, I actually found GPT-4 far worse than what I expected in terms of coding, especially after I looked at its impressive performance on other exams. I guess it’s still a progress in terms of LLM for coding, maybe just a little underwhelming compared to other standardized tests it aces? GPT-4’s performance on codeforces is borderline abhorrent.

And now you are telling me there is data leakage, so the actual performance would be even worse than what’s on paper???

gorobotgorobot t1_je0eptw wrote on March 28, 2023 at 2:51 PM

#2,418,011

Replying to Wtiaw (#2,416,086)

Really? Can you link to examples of that?

austacious t1_je0g6oi wrote on March 28, 2023 at 3:01 PM

#2,418,239

Replying to truchisoft (#2,416,297)

A healthy skepticism in AIML from those in the field is incredibly important and relatively hard to come by. Having the attitude that 'This is great and everything is wonderful' does not lead to meaningful progress addressing very real issues. It's very productive to point out shortcomings of otherwise highly effective models.

cegras t1_je0g90p wrote on March 28, 2023 at 3:01 PM

#2,418,251

Replying to mrpickleby (#2,412,305)

How does the AI perform any better than a Google search? I'd say the AI is even more dangerous as it gives a single, authoritative sounding answer that you have to go to Google and secondary sources to verify anyways!

cegras t1_je0gfd7 wrote on March 28, 2023 at 3:03 PM

#2,418,285

Replying to rfxap (#2,412,269)

If you google most leetcode problems I would bet a coffee that they've existed on the internet long before leetcode came into existence.

MrFlamingQueen t1_je0j29h wrote on March 28, 2023 at 3:20 PM

#2,418,713

Replying to cegras (#2,418,285)

It feels like majority of the people in this discussion have no idea what computer science is and what LeetCode tests.

As you mentioned, there are hundreds of websites devoted to teaching the leetcode design patterns and entire books devoted to learning and practicing these problems.

cegras t1_je0jsud wrote on March 28, 2023 at 3:25 PM

#2,418,838

Replying to MrFlamingQueen (#2,418,713)

Do you know if ChatGPT was allowed to ingest PDFs found on the internet? Even if not, I'm sure there are many sections of famous textbooks reproduced in HTML or parsable form.

st8ic t1_je0lxgn wrote on March 28, 2023 at 3:38 PM

#2,419,184

Replying to truchisoft (#2,416,297)

"bro it's great trust me" isn't exactly a scientific way to think about these issues.

AsliReddington t1_je0mg2b wrote on March 28, 2023 at 3:41 PM

#2,419,279

It's a smarter talking parrot is all.

ianitic t1_je0mjqx wrote on March 28, 2023 at 3:42 PM

#2,419,293

Replying to cegras (#2,418,838)

Oh I haven't tested this on textbooks, but I have asked chatGPT to give me pages of a novel and it did word for word. I suspect it had to have trained on PDFs? I'm highly surprised I haven't seen any news of authors/publishers suing yet tbh.

It is obvious when a book is a part of its training set or not though based on the above test.

RubenC35 t1_je0ra56 wrote on March 28, 2023 at 4:12 PM

#2,420,105

Replying to rfxap (#2,412,269)

Would they be a little bias? I mean Microsoft has spent loads of money in the idea of being the best.

ReasonablyBadass t1_je0rwr3 wrote on March 28, 2023 at 4:16 PM

#2,420,214

Is it possible the older questions are by now about better known problems so more training data existed for them and the newer are about newer concepts, not really represented on the net yet?

meister2983 t1_je0s90f wrote on March 28, 2023 at 4:18 PM

#2,420,271

Replying to ArnoF7 (#2,417,882)

GPT-4 is an extremely good pattern matcher - probably one of the best ever made. Most exams made seem to be able to executed with straight-forward pattern matching (with no backtracking). The same thing applies to basic coding questions - it reasonably performs at the level of a human gluing stack overflow solutions together (with the obvious variable renaming/moving lines around/removing dead code/etc.)

It struggles at logical reasoning (when it can't "pattern match" the logical reasoning to something it's trained on).

Coding example:

Had no problem writing a tax calculator for ordinary income with progressive tax brackets
It struggles to write a program to calculate tax on long term capital gains (US tax code), which is very similar to the above, except has an offset (you start bracket indexing at ordinary income). I'd think this is actually pretty easy for a CS student especially if they saw the solution above -- GPT4 struggled though as it doesn't really "reason" about code the way a human would and would generate solutions obviously wrong to a human.

nomadiclizard t1_je0u5ex wrote on March 28, 2023 at 4:30 PM

#2,420,621

Haha amateurs. I learned not to make that mistake when I split a pose estimation visual dataset into training and validation, but lots of the frames were almost-duplicates so it got contaminated that way. >.<

truchisoft t1_je0un8g wrote on March 28, 2023 at 4:34 PM

#2,420,718

Replying to austacious (#2,418,239)

Oh no no, not my argument here, but the whole title wording looks like a sleazy attack, this is not criticism but seems like a hit piece, since like other commenters mention, other independent tests were ran on GPT4 already and people is already using GPT4 for coding.

MrFlamingQueen t1_je0w3ut wrote on March 28, 2023 at 4:43 PM

#2,420,949

Replying to cegras (#2,418,838)

Not sure on the training corpus, but like you mentioned, there's ton of other forms of textbooks and solution manuals to textbook problems on things like github, stackexchange, etc.

polygon_primitive t1_je0x04y wrote on March 28, 2023 at 4:49 PM

#2,421,148

Replying to cegras (#2,418,251)

For finding answers it's about the same as Google, sometimes better if you then verify the result with external sources, but that's mainly because Google has so badly corrupted their core search product while chasing profit. It's been pretty useful for me for doing the grunt work writing boiler plate code and refactoring stuff tho

fiftyfourseventeen t1_je0z514 wrote on March 28, 2023 at 5:02 PM

#2,421,520

Replying to nomadiclizard (#2,420,621)

That's exactly what happened here lol, they only deduplicated by exact duplicate text so there was lots of similar data in both sets

visarga t1_je0zqxm wrote on March 28, 2023 at 5:06 PM

#2,421,626

Replying to truchisoft (#2,416,297)

ML people spend all day thinking about model limitations and errors, it's only normal that we are not so easily swayed by a non-peer reviewed paper declaring first contact with AGI. Especially from MS who owns 50% of OpenAI

jabowery t1_je107nj wrote on March 28, 2023 at 5:09 PM

#2,421,711

See these entries in the Hutter Prize FAQ:

Why aren't cross-validation or train/test-set used for evaluation?

Why is (sequential) compression superior to other learning paradigms?

Why is Compressor Length superior to other Regularizations?

Why not use Perplexity, as most big language models do?

Is Ockham's razor and hence compression sufficient for AGI?

thorax t1_je107vs wrote on March 28, 2023 at 5:09 PM

#2,421,712

I'm working on an extreme usage model for leveraging GPT4 to generate code, and it's rather good. Not perfect, but impressive is an understatement.

Puzzleheaded_Acadia1 t1_je11l0o wrote on March 28, 2023 at 5:17 PM

#2,421,936

So does that mean that gpt 4 can't think critically? and if not can we make a new kind of ML like LLMs and llama that can think critically and integrated to gpt 4 so it becomes a multimodel that can "see" and think critically.

DaBobcat t1_je12b4q wrote on March 28, 2023 at 5:22 PM

#2,422,050

Here OpenAI and Microsoft were evaluating GPT4 on medical problems. In section 6.2 they specifically said that they found strong evidence that it was trained on "popular datasets like SQuAD 2.0 and the Newsgroup Sentiment Analysis datasets". In the appendix section B they explain how they measured whether it saw something in the training data. Point is, I think benchmarks are quite pointless if the training dataset is private and no one can verify that they did not train it on the test set, which they specifically said that in many cases it did

currentscurrents t1_je12d3k wrote on March 28, 2023 at 5:22 PM

#2,422,057

Replying to ianitic (#2,419,293)

Nobody knows exactly what it was trained on, but there exist several datasets of published books.

>I'm highly surprised I haven't seen any news of authors/publishers suing yet tbh.

They still might. But they don't have a strong motivation; it doesn't really directly impact their revenue because nobody's going to sit in the chatgpt window and read a 300-page book one prompt at a time.

currentscurrents t1_je13kdr wrote on March 28, 2023 at 5:30 PM

#2,422,239

Replying to londons_explorer (#2,414,656)

True! But also, problems in general are never 100% novel. That's why metalearning works.

You can make up for poor reasoning abilities with lots of experience. This isn't bad exactly, but it makes testing their reasoning abilities tricky.

TheEdes t1_je149kf wrote on March 28, 2023 at 5:34 PM

#2,422,355

Replying to MrFlamingQueen (#2,418,713)

Yeah but if you were to come up with a problem in your head that didn't exist word for word then GPT-4 would be doing what they're advertising, however, if the problem was word for word anywhere in the training data then the testing data is contaminated. If the model can learn the design patterns for leetcode style questions by looking at examples of them, then it's doing something really good, if it can only solve problems that it has seen before, then it's nothing special, they just overfit a trillion parameters on a comparatively very small dataset.

currentscurrents t1_je14pi5 wrote on March 28, 2023 at 5:37 PM

#2,422,439

Replying to cegras (#2,418,251)

Clearly, the accuracy is going to have to get better before it can replace Google. It's pretty accurate when it knows what it's talking about, but if you go "out of bounds" the accuracy drops off a cliff without warning.

But the upside is that it can integrate information from multiple sources and you can interactively ask it questions. Google can't do that.

marr75 t1_je14tki wrote on March 28, 2023 at 5:38 PM

#2,422,461

Replying to wazis (#2,411,085)

me irl

[deleted] t1_je15c4c wrote on March 28, 2023 at 5:41 PM

#2,422,547

Replying to bjj_starter (#2,413,398)

[deleted]

currentscurrents t1_je15i85 wrote on March 28, 2023 at 5:42 PM

#2,422,583

Replying to krali_ (#2,416,225)

That's still on a waitlist unfortunately.

GPT-4 is good but slow, at least for now I mostly still use the GPT-3.5 model.

RossoMarra t1_je16mod wrote on March 28, 2023 at 5:49 PM

#2,422,768

Replying to hadaev (#2,411,906)

I really think you are underestimating biologists.

Historical-Tree9132 t1_je17bln wrote on March 28, 2023 at 5:53 PM

#2,422,875

Replying to wazis (#2,411,085)

0/24 0/12 on code problems it never seen before really surprised me

currentscurrents t1_je17y5v wrote on March 28, 2023 at 5:57 PM

#2,423,013

Replying to hardmaru (#2,411,202)

>Why are deep learning technologists so overconfident

>A Narayanan, S Kapoor

>Substack newsletter. AI Snake Oil

You can get your blogposts listed on Google Scholar?

mcilrain t1_je19vif wrote on March 28, 2023 at 6:09 PM

#2,423,352

Replying to cegras (#2,418,838)

Even if it didn't ingest PDFs it probably ingested websites that scraped PDFs to spam search engine results.

mcilrain t1_je1a7cl wrote on March 28, 2023 at 6:11 PM

#2,423,404

Replying to currentscurrents (#2,422,057)

Current tech could be used to allow you to ask an AI assistant to read you a book.

currentscurrents t1_je1ai1i wrote on March 28, 2023 at 6:13 PM

#2,423,453

Replying to nixed9 (#2,417,701)

I asked it for a parody and got something similar to, but different from Weird Al's song: https://pastebin.com/FKrZiEi9

When I asked it to be original I got quite different lyrics: https://pastebin.com/uwpqAnyz

Here's the actual lyrics for reference. This reminds me of how you can get LLMs to be less toxic/biased just by telling them to treat people fairly.

regalalgorithm t1_je1eu1e wrote on March 28, 2023 at 6:40 PM

#2,424,191

FYI, the GPT 4 paper has a whole section on contamination in the appendix - I found it to be pretty convince. Removing contaminatimg data did make it worse at some benchmarks, but also better at others, and overall it wasn't a huge effect.

All-DayErrDay t1_je1g2d8 wrote on March 28, 2023 at 6:47 PM

#2,424,415

Replying to bjj_starter (#2,413,398)

Exactly!

truchisoft t1_je1huuz wrote on March 28, 2023 at 6:58 PM

#2,424,763

Replying to visarga (#2,421,626)

Point taken, this article tho is also filled with holes tho.

notforrob t1_je1lowh wrote on March 28, 2023 at 7:22 PM

#2,425,460

This inspired me to ask GPT-4:
"Can you generate a leetcode easy problem that has never been seen?"

And then ask it to solve the problem it creates. In the few cases I tried it failed miserably.

HonkyTonkPolicyWonk t1_je1mqdp wrote on March 28, 2023 at 7:29 PM

#2,425,649

Well, yeah, ChatGTP is auto-suggest on steroids. It can’t create anything de novo. It reframes and regurgitates what others have done.

No surprises here

mlresearchoor t1_je1mvf7 wrote on March 28, 2023 at 7:30 PM

#2,425,679

OpenAI blatantly ignored the norm to not train on the ~200 tasks collaboratively prepared by the community for BIG-bench. GPT-4 knows the BIG-bench canary ID afaik, which removes the validity of GPT-4 eval on BIG-bench.

OpenAI is cool, but they genuinely don't care about academic research standards or benchmarks carefully created over years by other folks.

WarmSignificance1 t1_je1pdz9 wrote on March 28, 2023 at 7:45 PM

#2,426,078

Replying to cegras (#2,418,251)

I think that ChatGPT has shown how bad so many people are at Googling. And granted, sometimes ChatGPT is just far superior.

But when people say things like "I can ask it how to use a library and it's made me 10x faster over using Google", it just blows my mind. I can usually find the official docs and figure out how to use a library in about the same time as ChatGPT can tell me, without the risk of errors.

thelastpizzaslice t1_je1pphc wrote on March 28, 2023 at 7:47 PM

#2,426,132

Replying to nixed9 (#2,417,701)

I asked it to write another one from Darth Maul's perspective after that and it did a ducking amazing job.

Thorusss t1_je1z0ib wrote on March 28, 2023 at 8:44 PM

#2,427,605

Replying to jrkirby (#2,414,776)

>Then they spent 20K$+ compute on training.

Your estimate is a few magnitudes too low

AuspiciousApple t1_je2aij3 wrote on March 28, 2023 at 9:59 PM

#2,429,626

Replying to Thorusss (#2,427,605)

Idk, thousands of GPUs going brrrr for months, how much can it cost?

$10?

pale2hall t1_je2begu wrote on March 28, 2023 at 10:05 PM

#2,429,764

Chat GPT-4 can't remember it's writing a FireFox add-on not a Chrome Extension.

It's like the most amazing coder ever, but always half-drunk, and completely confident, and always. Here's how almost every single response started after the first....

Apologies for the incomplete response.
Apologies for the confusion. The Express server I provided earlier ...
I apologize for the inconvenience. After reviewing the code, I've noticed some inconsistencies in the code
I apologize for the confusion. It appears that the context menu was removed due to a typo in the content.js file.
I apologize for the confusion. To make the changes you requested, follow the steps below:
Apologies for the confusion, and thank you for providing the additional information. Here's an updated implementation that should resolve the issues:
I apologize for the confusion. Here's an updated solution that should display the response in the popup window and clear the input field on submit. Additionally, I added an indicator that shows the addon is thinking.
Apologies for the confusion, and thank you for the clarification. Based on your requirement, you can make the following changes:
Apologies for the confusion. You are correct that you cannot trigger the reviseMyComment() function in the content script without sending a message from the background script.
My apologies for the confusion. The error you are encountering is because the sendToOpenAI() function is not available in the content script content.js
Apologies for the confusion. I made an error in my previous response.

AquaBadger t1_je2c68z wrote on March 28, 2023 at 10:11 PM

#2,429,889

Replying to WarmSignificance1 (#2,426,078)

to be fair, google has gotten slower to find useful information due to the mass of ads and bought results clogging up searches now. But yes, google is still faster than chatgpt and if cleaned up would be even better

bjj_starter t1_je2ckb0 wrote on March 28, 2023 at 10:14 PM

#2,429,952

Replying to muskoxnotverydirty (#2,415,684)

>If nothing else, it would be nice for those who publish test results to show how much they knew whether test data was in the training data.

Yes, we need this and much more information about how it was actually built, what the architecture is, what the training data was, etc. They're not telling us because trade secrets, which sucks. "Open" AI.

jrkirby t1_je2f63r wrote on March 28, 2023 at 10:32 PM

#2,430,434

Replying to Thorusss (#2,427,605)

2 million dollars or 20 million dollars is greater than 20 thousand. And it makes the main thesis more salient - the more money you've spent training, the less willing you'll be to retrain the entire model from scratch just to run some benchmarks the "proper" way.

trajo123 t1_je2gie9 wrote on March 28, 2023 at 10:42 PM

#2,430,683

How much of the code that devs write on a typical day is truly novel and not just a rehash / combination / adaptation of existing stuff?

He who has not copied code from stackoverflow, let him cast the first insult at ChatGPT.

cegras t1_je2k9dr wrote on March 28, 2023 at 11:09 PM

#2,431,374

Replying to TheEdes (#2,422,355)

ChatGPT is great at learning the nuances of english, i.e. synonyms and metaphors. But if you feed it a reworded leetcode question and it finds the answer within its neural net, has it learned to conceptualize? No, it just learned that synonym ...

pmirallesr t1_je2tf2v wrote on March 29, 2023 at 12:16 AM

#2,433,201

Idk, the procedure to check for contamination described in the release report sounded solid at first glance, and I don't see how this news changes that

-xXpurplypunkXx- t1_je315e6 wrote on March 29, 2023 at 1:14 AM

#2,434,695

Replying to ghostfaceschiller (#2,411,289)

In my experience, gpt tends to hallucinate the same incorrect response and refuses to make the directed corrections to code.

SWESWESWEh t1_je33t7z wrote on March 29, 2023 at 1:34 AM

#2,435,286

Replying to cegras (#2,418,251)

I've had a lot more luck solving novel coding problems with the GPT-4 version of chatGPT then Google. If you stick to older tech and libraries like Java and Spring that have been around forever, it's really good at solving fairly difficult problems if you just keep providing context. With Google, it's basically has someone done this exact thing on SO and gotten an answer, if not oh well

_sbmaruf t1_je369s5 wrote on March 29, 2023 at 1:52 AM

#2,435,812

Sorry for self posting my work here. But you can take a look at our recent work, https://arxiv.org/abs/2303.03004

Coffee_Crisis t1_je392lv wrote on March 29, 2023 at 2:14 AM

#2,436,416

If you search GitHub for unusual variable names or keywords you will often find code that looks very similar to the stuff GPT spits out, in some domains it’s much more copy paste than people think

ghostfaceschiller t1_je3abdo wrote on March 29, 2023 at 2:24 AM

#2,436,679

Replying to -xXpurplypunkXx- (#2,434,695)

Really? I def had that some with 3.5 but 4 has been v good. Not perfect obviously

DreamWithinAMatrix t1_je3c6kl wrote on March 29, 2023 at 2:39 AM

#2,437,024

Replying to currentscurrents (#2,422,057)

There was that time Google was taken to court for scanning and indexing books for Google Books or whatever and Google won:

https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,_Inc.

salgat t1_je3eqx5 wrote on March 29, 2023 at 3:00 AM

#2,437,500

Replying to rfxap (#2,412,269)

GPT4 is the world's best googler. As long as a similar solution existed on the internet in the past, there's a good chance GPT4 can pick it up, even if it's not on leetcode yet.

jer_pint t1_je3ffzz wrote on March 29, 2023 at 3:06 AM

#2,437,642

on a sort of related note, I tested gpt4's ability to play wordle, and it was pretty bad. I think it has to do with the fact that wordle only existed after gpt cutoff: https://www.jerpint.io/blog/gpt-wordle/

MrFlamingQueen t1_je3kywp wrote on March 29, 2023 at 3:54 AM

#2,438,589

Replying to TheEdes (#2,422,355)

Agreed. It's very likely contamination. Even "new" LeetCode problems existed before they were published on the website.

StellaAthena t1_je3tz04 wrote on March 29, 2023 at 5:28 AM

#2,439,953

Replying to regalalgorithm (#2,424,191)

I found this analysis incredibly unconvincing. They used a weaker standard for deduplication than is standard as well as a weaker analysis than the one they did for the GPT-3 paper.

purplebrown_updown t1_je3xwqa wrote on March 29, 2023 at 6:16 AM

#2,440,530

Question. I’m guessing they want to continuously feed more data to gpt so how do they avoid using up all their training. Is this what’s called data leakage?

joeiyoma t1_je42q2t wrote on March 29, 2023 at 7:20 AM

#2,441,303

Chatgpt always have the potential for error, 4 version has a reduced potential for error. My biggest worry is what it will do our creativity. Autopilot all the time!

joeiyoma t1_je42w1a wrote on March 29, 2023 at 7:23 AM

#2,441,329

Replying to ghostfaceschiller (#2,436,679)

So you can imagine, when you are using it and have no clue!

ghostfaceschiller t1_je44ke8 wrote on March 29, 2023 at 7:47 AM

#2,441,580

Replying to joeiyoma (#2,441,329)

What?

Calamero t1_je4doo0 wrote on March 29, 2023 at 10:02 AM

#2,442,928

Replying to joeiyoma (#2,441,303)

It will enable creative people to bring their ideas to reality. It won’t make people less creative. AI technology democratizes the execution part, making it easier for people from all walks of life to transform their visions into reality. It will augment human creativity rather than stifling it.

obolli t1_je4juzh wrote on March 29, 2023 at 11:18 AM

#2,443,962

Replying to mlresearchoor (#2,425,679)

I think they used to. Things change when you come under the pressure of returning profits.

SzilvasiPeter t1_je4pknf wrote on March 29, 2023 at 12:15 PM

#2,445,046

Replying to cegras (#2,418,285)

Should I bet a coffee? No way... that is too much of a deal.

WarmSignificance1 t1_je57s2a wrote on March 29, 2023 at 2:35 PM

#2,448,831

Replying to trajo123 (#2,430,683)

So I actually think that senior devs copy and paste a lot less than everyone imagines.

I can’t remember the last time I’ve copied code from StackOverflow. Actually, I rarely even use StackOverflow at this point. Going directly to the official docs is always best.

WarmSignificance1 t1_je58y3c wrote on March 29, 2023 at 2:43 PM

#2,449,078

Replying to _sbmaruf (#2,435,812)

Looks interesting. Have you tried any of the GPT models against this benchmark?

mr_house7 t1_je5iuk0 wrote on March 29, 2023 at 3:47 PM

#2,451,039

Replying to obolli (#2,443,962)

Microsoft is the one in charge now.

NoRip7374 t1_je6e6rd wrote on March 29, 2023 at 7:05 PM

#2,457,290

At least some good news!

[deleted] t1_je6q2mw wrote on March 29, 2023 at 8:21 PM

#2,459,704

[removed]

TheEdes t1_je6tweq wrote on March 29, 2023 at 8:46 PM

#2,460,585

Replying to cegras (#2,431,374)

Sure but what's being advertised isn't sentience per se, at least with the leetcode part of their benchmarks. The issue here is that they claim that it can do X% on leetcode, but it seems like it's much less on new data. Even if it learned to find previous solutions and replace it with changes it should be able to perform well due to the nature of the problems.

pengo t1_je7vr2t wrote on March 30, 2023 at 1:20 AM

#2,468,823

Replying to Puzzleheaded_Acadia1 (#2,421,936)

Yes, it can think critically, it just doesn't tell you whether it is or isn't at any one time.

_sbmaruf t1_je8iuvl wrote on March 30, 2023 at 4:37 AM

#2,474,001

Replying to WarmSignificance1 (#2,449,078)

We just released the dataset last week. We are in the process of training some autoregressive models.

Nhabls t1_je93uvg wrote on March 30, 2023 at 9:14 AM

#2,477,975

Replying to rfxap (#2,412,269)

The way they defined human performance there is just funny.

Dividing the number of accepted answers by total users.. might as well just make up a number

Nhabls t1_je94npl wrote on March 30, 2023 at 9:26 AM

#2,478,094

Replying to VertexMachine (#2,412,120)

Idk why people downvoted you, you are right.

Nhabls t1_je94xwx wrote on March 30, 2023 at 9:31 AM

#2,478,143

Replying to bjj_starter (#2,413,398)

Not misleading. The fact it performs so differently on easy problems it has seen Vs not , specially when it fails so spectacularly on the latter does raise big doubts about how corrupted and unreliable their benchmarks might be

Nhabls t1_je951xn wrote on March 30, 2023 at 9:32 AM

#2,478,160

Replying to ghostfaceschiller (#2,411,289)

Are they now? Why are you writing empty stuff. Why is this inane stuff so upvoted. Jfc

bjj_starter t1_je98wdx wrote on March 30, 2023 at 10:25 AM

#2,478,866

Replying to Nhabls (#2,478,143)

Okay, but an external team tested it on coding problems which only came into existence after its training finishes, and found human level performance. I don't think your theory explains how that could be the case.

Nhabls t1_je9anrq wrote on March 30, 2023 at 10:48 AM

#2,479,226

Replying to bjj_starter (#2,478,866)

Which team is that? The one at Microsoft that made up the human performance figures in a completely ridiculous way? Basically "We didn't like that pass rates were too high for humans for the hard problems that the model fails on completely so we just divided the accepted number by the entire user base" oh yeah brilliant

The "human" pass rates are also composed of people learning to code trying to see if their solution works. Its a completely idiotic metric, why not go test randos on the street and declare that represents the human coding performance metric while we're at it