learningmoreandmore OP t1_j3kz6zf wrote on January 9, 2023 at 7:27 AM

Reply to comment by Tuggummii in [D] I want to use GPT-J-6B for my story-writing project but I have a few questions about it. by learningmoreandmore

So if I was handling something like 2000-10000+ requests per day for my business, locally isn't going to cut it?

Tuggummii t1_j3kzxy6 wrote on January 9, 2023 at 7:36 AM

Unfortunately I have not enough knowledge to answer that question.

learningmoreandmore OP t1_j3l17yp wrote on January 9, 2023 at 7:52 AM

No problem! Thanks for the insight regarding its capability and costs

Nmanga90 t1_j3xxwj6 wrote on January 11, 2023 at 8:51 PM

Locally will not cut it unless you have a high performance computer with lab grade GPUs for inference. The reason the AI models are so expensive to use is because they are actually pretty expensive to run. They are running probably 2 parallel versions of the model on a single a100, and have likely duplicated this architecture 10,000 times. And an a100 is 10 grand used, 20 grand new. You can also rent them out for about $2 per minute.

spiky_sugar t1_j3ley7v wrote on January 9, 2023 at 11:02 AM

It depends. It really varies depending on what parameters you set for the generation. The choice of decoding and output text length can dramatically change the speed and quality of the outcome.

GPT-J-6B model I would say that it is possible to generate 10000 requests in few hours. Using only CPU will take much longer, but you could maybe generate 2000 requests in 24 hours. But again, it is strongly dependent on input and output text length and decoding type.