Viewing a single comment thread. View all comments

learningmoreandmore OP t1_j3kz6zf wrote

So if I was handling something like 2000-10000+ requests per day for my business, locally isn't going to cut it?

3

Tuggummii t1_j3kzxy6 wrote

Unfortunately I have not enough knowledge to answer that question.

3

learningmoreandmore OP t1_j3l17yp wrote

No problem! Thanks for the insight regarding its capability and costs

2

Nmanga90 t1_j3xxwj6 wrote

Locally will not cut it unless you have a high performance computer with lab grade GPUs for inference. The reason the AI models are so expensive to use is because they are actually pretty expensive to run. They are running probably 2 parallel versions of the model on a single a100, and have likely duplicated this architecture 10,000 times. And an a100 is 10 grand used, 20 grand new. You can also rent them out for about $2 per minute.

1

spiky_sugar t1_j3ley7v wrote

It depends. It really varies depending on what parameters you set for the generation. The choice of decoding and output text length can dramatically change the speed and quality of the outcome.

GPT-J-6B model I would say that it is possible to generate 10000 requests in few hours. Using only CPU will take much longer, but you could maybe generate 2000 requests in 24 hours. But again, it is strongly dependent on input and output text length and decoding type.

2