limapedro

limapedro t1_j13qfxr wrote

The cheaper option would be to run on 2 RTX 3060s! Each GPU costing 300 USD you could buy two for 600ish! Also there's a 16 GB A770 from Intel! To run a very large model you could split the weights into so called blocks, I was able to test it to myself in a simple keras implementation, but the code for conversion is hard to write, although I think I've seen somewhere something similar from HuggingFace!

4