Spire_Citron t1_ja0457j wrote on February 25, 2023 at 9:06 PM

The training data is massive and usually not carefully curated because they need so much of it.

starstruckmon t1_ja1102i wrote on February 26, 2023 at 1:09 AM

He's talking about the human preference data used for RHLF fine-tuning ( which is what makes ChatGPT from GPT3 ). It's not really that massive.