kawin_e t1_jdxz4bh wrote on March 28, 2023 at 12:29 AM

The Stanford Human Preferences dataset (SHP): https://huggingface.co/datasets/stanfordnlp/SHP

It contains pairwise preferences for posts (so tuples (post, response_A, response B)), but you can certainly turn it into an instruction dataset by only considering responses that meet a certain cut-off. I'm currently aware of one academic/industry group that is already doing this.

ninjasaid13 t1_jdy2pqq wrote on March 28, 2023 at 12:56 AM

>one academic/industry group

which one?