bradenjh OP t1_ixeyajh wrote on November 22, 2022 at 10:48 PM

Reply to comment by learn-deeply in [R] Getting GPT-3 quality with a model 1000x smaller via distillation plus Snorkel by bradenjh

Ha! If I was trying to pretend no affiliation, u/learn-deeply/ I probably wouldn't have a username literally matching the author string of the post?

You may also want to give it another read—the GPT-3 models are fine-tuned, that's the point! (The GPT-3 zero-shot baseline that I assume you're referencing is mentioned once as a curiosity but not compared to beyond that). You can even look at the full cross-product of fine-tuning RoBERTa vs GPT-3 on GT labels vs weak labels. With the larger training sets—the distilled and combined set of ~60k—they score essentially identically (within 0.1 point). i.e. you simply don't need all that GPT-3 capacity; all you need is the relevant information that it has for your problem.

Acceptable-Cress-374 t1_ixg7d43 wrote on November 23, 2022 at 5:01 AM

TBF, the article is pretty SEO-y and heavily uses bolded words that repeat throughout the article.

The research part is top-notch, tho, and opens up a lot of avenues for further training based on the (unusable at the amateur level) LLMs available now. Great work and thanks for sharing!