Submitted by bradenjh t3_z26fui in MachineLearning
bradenjh OP t1_ixeyajh wrote
Reply to comment by learn-deeply in [R] Getting GPT-3 quality with a model 1000x smaller via distillation plus Snorkel by bradenjh
Ha! If I was trying to pretend no affiliation, u/learn-deeply/ I probably wouldn't have a username literally matching the author string of the post?
You may also want to give it another read—the GPT-3 models are fine-tuned, that's the point! (The GPT-3 zero-shot baseline that I assume you're referencing is mentioned once as a curiosity but not compared to beyond that). You can even look at the full cross-product of fine-tuning RoBERTa vs GPT-3 on GT labels vs weak labels. With the larger training sets—the distilled and combined set of ~60k—they score essentially identically (within 0.1 point). i.e. you simply don't need all that GPT-3 capacity; all you need is the relevant information that it has for your problem.
Acceptable-Cress-374 t1_ixg7d43 wrote
TBF, the article is pretty SEO-y and heavily uses bolded words that repeat throughout the article.
The research part is top-notch, tho, and opens up a lot of avenues for further training based on the (unusable at the amateur level) LLMs available now. Great work and thanks for sharing!
Viewing a single comment thread. View all comments