Submitted by AylaDoesntLikeYou t3_11c5n1g in singularity
AylaDoesntLikeYou OP t1_ja2crc9 wrote
Reply to comment by Motion-to-Photons in Meta unveils a new large language model that can run on a single GPU by AylaDoesntLikeYou
With stable diffusion they were able to drastically reduce their generation time to 5- 12 seconds (depending on the GPU) and they were able to reduce vram usage from 16gb to 4gb in less than a month.
These optimizations wouldn't take more than a year, they can happen within months. Weeks in some cases, especially once the model is running on a single device.
qrayons t1_ja375cg wrote
I don't know. It seems like the 13b parameter model is already the optimized version. Obviously I hope I'm wrong though.
-AlkalineWater- t1_ja6tvr3 wrote
It never ends
Viewing a single comment thread. View all comments