Recent comments in /f/deeplearning
j-solorzano t1_jd16nfb wrote
Reply to comment by Board_Stock in Alpaca-7B and Dalai, how can I get coherent results? by Haghiri75
Language models don't remember conversations by themselves. You'd have to implement a memory and then add retrieved memories to the prompt.
j-solorzano t1_jd16g7r wrote
Try adjusting the temperature.
[deleted] t1_jd0wy1z wrote
Reply to comment by Haghiri75 in Alpaca-7B and Dalai, how can I get coherent results? by Haghiri75
[deleted]
Jaffa6 t1_jd03par wrote
Reply to comment by Haghiri75 in Alpaca-7B and Dalai, how can I get coherent results? by Haghiri75
That's odd.
Quantisation should make it go from (e.g.) 32 bit floats to 16bit floats, but I wouldn't expect it to lose that much coherency at all. Did they say somewhere that that's why?
Board_Stock t1_jczmxht wrote
Hello, I've been running the alpaca.cpp on my laptop, and have you figured out how to make it remember conversations yet? sorry if this is a beginner question
timelyparadox t1_jcy4x02 wrote
Reply to comment by [deleted] in How noticeable is the difference training a model 4080 vs 4090 by Numerous_Talk7940
Yep you can get a 3090 for 700
Mondukai t1_jcy3tu4 wrote
Oceanboi t1_jcy3f45 wrote
all comes down to VRAM.
chatterbox272 t1_jcy2h7v wrote
The speed is not the main difference you're going to notice, it's the VRAM. VRAM is a hard limit to work around, so it simply depends whether you need the extra.
RichardBJ1 t1_jcy0nse wrote
Reply to comment by funderbolt in How noticeable is the difference training a model 4080 vs 4090 by Numerous_Talk7940
Ta!
[deleted] t1_jcxwdzm wrote
Reply to comment by mrcet007 in How noticeable is the difference training a model 4080 vs 4090 by Numerous_Talk7940
[deleted]
Haghiri75 OP t1_jcxt80d wrote
I guess I found the reason. Dalai system does quantization on the models and it makes them incredibly fast, but the cost of this quantization is less coherency.
MisterManuscript t1_jcxo0zv wrote
You don't need the 40 series, they're designed with providing ML solutions to games. You're paying extra just to have an in-built optical-flow accelerator that you're not gonna use for model training.
The optical flow accelerator is meant for computing dense optical flow fields as part of many inputs to the DLSS feature that most new games use.
You're better off with the 30 series or lesser.
funderbolt t1_jcxkcth wrote
Reply to comment by RichardBJ1 in How noticeable is the difference training a model 4080 vs 4090 by Numerous_Talk7940
> "law of diminishing returns"
FIXED
mrcet007 t1_jcxgyl7 wrote
Reply to comment by wally1002 in How noticeable is the difference training a model 4080 vs 4090 by Numerous_Talk7940
12/16gb is already hitting limit of what's available on market for consumer gaming GPU. Only GPU for deeplearning with more than 16bgb is 4090 which is already out of range for most individual at $1500
RichardBJ1 t1_jcxcu7e wrote
Probably need an answer from someone who has both and has benchmarked some examples. (EDIT: and I do not!) Personally I find a lot of “law of diminishing(Edit) returns” with this type of thing. Also for me, since I spend 100x more time coding and testing will dummy sets… the actual speed of run is not as critical as people would expect…
wally1002 t1_jcx6232 wrote
For deeplearning higher VRAM is always preferable. 12/16GB limits the kind of models you can run/infer. With LLMs getting democratised it's better to be future proof.
[deleted] t1_jcuekzc wrote
Reply to comment by trajo123 in Seeking Career Advice to go from general CS background to a career in AI/Machine Learning by brown_ja
[deleted]
rezayazdanfar OP t1_jcu24bm wrote
Reply to comment by DeepLearningStudent in How To Scale Transformers’ Memory up to 262K Tokens With a Minor Change? by rezayazdanfar
:) happy to hear it, hope you found it practical in your work. I also aim to use it in my future project. :)
rezayazdanfar OP t1_jcu1yfy wrote
Reply to comment by WallyMetropolis in How To Scale Transformers’ Memory up to 262K Tokens With a Minor Change? by rezayazdanfar
:) yess I see, thanks. If you like we can talk more and give me more feedback or/and comments; thus I can improve. :)
trajo123 t1_jctu6my wrote
Reply to comment by chengstark in Seeking Career Advice to go from general CS background to a career in AI/Machine Learning by brown_ja
Why?
brainhack3r t1_jctctn2 wrote
Reply to comment by thesupernoodle in Best GPUs for pretraining roBERTa-size LLMs with a $50K budget, 4x RTX A6000 v.s. 4x A6000 ADA v.s. 2x A100 80GB by AngrEvv
This is the right answer. Don't guess, test (hey, that rhymed!)
Just make sure your testing mirrors what it would look like to scale upl.
CKtalon t1_jctb1c0 wrote
Reply to Best GPUs for pretraining roBERTa-size LLMs with a $50K budget, 4x RTX A6000 v.s. 4x A6000 ADA v.s. 2x A100 80GB by AngrEvv
Do not be tricked by memory pooling. NVLink might not really improve performance on the A6000s by much (different case for the A100s)
I think it will be a tough choice between 2xA100/ and 4x 6000 Ada
bentheaeg t1_jcsvdy1 wrote
Reply to Best GPUs for pretraining roBERTa-size LLMs with a $50K budget, 4x RTX A6000 v.s. 4x A6000 ADA v.s. 2x A100 80GB by AngrEvv
Not able to reply for sure right now (A6000 Ada are missing open tests), I don´t think that many people can. I work at a scale up though (PhotoRoom), and we're getting a 4xA6000 Ada server next week , we were planning to publish benchmarks vs. our other platforms (DGXs, custom servers, .. from A100 to A600 and 3090), stay tuned !
From a distance, semi educated guess:
- A6000 Ada are really, really good in compute. So models which are really compute bound (think Transformers with very big embeddings) should do well, models which are more IO bound (convnets for instance) will not do as well, especially vs. the A100 which has much faster memory
- the impact of nvlink is not super clear to me, its bandwidth was not really big to begin with anyway. My guess is that it may be more useful for latency bound inter GPU communications, like when using syncbatchnorm.
- there are a lot of training tweaks that you can use (model or pipeline parallel, FSDP, grad accumulation to cut on the comms..), so the best training setup for each platform may differ, it's also a game of apples to oranges, and this is by design
- I would take extra care around the cooling system, if you're not a cloud operator then a server going down will be a mess in your lab. This happened to us 3 times in the past 6 months, always because of the cooling. These machines can tap into 2kW+ H24 , this has to be extracted out and from our limited experience some setups (even from really big names) are not up to the task and go belly up in the middle of a job. 80GB A100s are 400 to 450W, A6000s (Ada or now) are 300W, easier to cool down if you're not buckled up. Not a point against the A100 per say, but a point against the A100 & unproven cooling let's say.
Board_Stock t1_jd1z78z wrote
Reply to comment by j-solorzano in Alpaca-7B and Dalai, how can I get coherent results? by Haghiri75
Yes that's what I meant, I want run the alpaca.cpp in an api like way so that it will automatically enter the previous convo along with the new message in the prompt