Viewing a single comment thread. View all comments

Tuggummii t1_j3kyf2w wrote

I'm not a professional, but I can answer some of your questions as my personal opinion.

How good is it at writing short stories?

- I don't think GPT-J is dramatically better than the others, especially for text generation. I often see hallucinating, illogical, misconceived text generation. If you want a result like OpenAI's Davinci-003, you may be disappointed despite your fine tuning.

How resource-expensive is it to use locally?

- You need 40GB+ RAM if you're running on CPU. One of my friends has failed on her 32GB RAM and she had to increase her swap memory, then she succeeded with an extremely slow loading time. ( Almost 7~8 minutes ) If you want GPU power, VRAM with float16 need 32GB+ VRAM ( I saw someone using on 24GB ). CPU generates a text from a prompt in 30~45 seconds whereas a GPU generates a text from the same prompt in 3 to 5 seconds.

7

learningmoreandmore OP t1_j3kz6zf wrote

So if I was handling something like 2000-10000+ requests per day for my business, locally isn't going to cut it?

3

Tuggummii t1_j3kzxy6 wrote

Unfortunately I have not enough knowledge to answer that question.

3

learningmoreandmore OP t1_j3l17yp wrote

No problem! Thanks for the insight regarding its capability and costs

2

Nmanga90 t1_j3xxwj6 wrote

Locally will not cut it unless you have a high performance computer with lab grade GPUs for inference. The reason the AI models are so expensive to use is because they are actually pretty expensive to run. They are running probably 2 parallel versions of the model on a single a100, and have likely duplicated this architecture 10,000 times. And an a100 is 10 grand used, 20 grand new. You can also rent them out for about $2 per minute.

1

spiky_sugar t1_j3ley7v wrote

It depends. It really varies depending on what parameters you set for the generation. The choice of decoding and output text length can dramatically change the speed and quality of the outcome.

GPT-J-6B model I would say that it is possible to generate 10000 requests in few hours. Using only CPU will take much longer, but you could maybe generate 2000 requests in 24 hours. But again, it is strongly dependent on input and output text length and decoding type.

2