tim_ohear

tim_ohear t1_ity5tvj wrote

I've often found models that were exciting on paper to be very disappointing when you actually try them. For instance the recent opt releases.

I've used gpt-j a lot and it's really nice, but it takes 24gb of GPU ram in fp16 if you use the full 2048 token context. Eleuther also have the smaller gpt-neo models 1.3b and 2.7b which would be a better fit for your GPU.

For me the fun in these smaller models is how easily you can completely change their "personality" by finetuning on even tiny amounts of text, like a few 100kb. I've achieved subjectively better than gpt3 results (for my narrow purpose) finetuning gpt-j on 3mb of text.

Finetuning requires quite a bit more GPU ram. DeepSpeed can really help but if you're working with tiny amounts of data it will only take a couple of hours on a cloud GPU to get something fun.

4