Recent comments in /f/deeplearning
FirstOrderCat t1_jcsjgws wrote
Reply to comment by thesupernoodle in Best GPUs for pretraining roBERTa-size LLMs with a $50K budget, 4x RTX A6000 v.s. 4x A6000 ADA v.s. 2x A100 80GB by AngrEvv
​
they don't have a6000 ada yet
FirstOrderCat t1_jcsjdve wrote
Reply to Best GPUs for pretraining roBERTa-size LLMs with a $50K budget, 4x RTX A6000 v.s. 4x A6000 ADA v.s. 2x A100 80GB by AngrEvv
> Based on my study, A6000 ADA has comparable performance to A100 on DL benchmarks. Is this A100 80GB spec a good choice?
it looks like you answered your questions yourself: 4 x a6000 ada will give you the best performance.
thesupernoodle t1_jcsj2iw wrote
Reply to Best GPUs for pretraining roBERTa-size LLMs with a $50K budget, 4x RTX A6000 v.s. 4x A6000 ADA v.s. 2x A100 80GB by AngrEvv
For maybe a few hundred bucks, you can test out the exact configurations you want to buy:
https://lambdalabs.com/service/gpu-cloud
You may even decide that you’d rather just cloud compute, as opposed to spending all that money upfront. It would only cost you about 19 K to run 2xA100 in the cloud for 24/365 for a solid year. And that also includes electricity costs.
Immarhinocerous t1_jcsdckk wrote
Reply to comment by brown_ja in Seeking Career Advice to go from general CS background to a career in AI/Machine Learning by brown_ja
As someone who's mostly self-taught (I have a BSc, but it's in health sciences), I followed a similar route to what they recommended. It gives you:
-
income,
-
experience working in software development - you will hopefully learn a lot from this,
-
exposure to co-workers who you may be able to learn from, especially if your backend role is at a place doing ML.
With income, you can also afford to take more courses. Even if it's only the occasional weekend course, or something you work on a few nights a week, it can help you expand your skillset while gaining other practical skills (backend work with APIs, DBs, cloud infrastructure, etc are all useful).
After doing that awhile, you may be able to land a more focused ML role, or be able to do a master's program (which combined with your SWE experience will give you a leg up on landing the role you want). If you want to go straight into an ML role after SWE, you will definitely need project experience. But you can do that while working, if you're up for it.
One of the best ML people I know has a maths background, works in risk/finance, and is basically entirely self-taught. But the guy is brilliant and insanely passionate about what he does. I just mention him to show that you don't absolutely need to go the master's route. But it could be worthwhile when you can afford it, especially if you're lacking in maths.
SingleTie8914 t1_jcrrv31 wrote
Reply to Seeking Career Advice to go from general CS background to a career in AI/Machine Learning by brown_ja
start as an sde first and self study ML/DL & work on side projects to demonstrate your skills
chengstark t1_jcrfptp wrote
Reply to Seeking Career Advice to go from general CS background to a career in AI/Machine Learning by brown_ja
Dont, it’s a waste of your time.
sqweeeeeeeeeeeeeeeps t1_jcrf571 wrote
Reply to Seeking Career Advice to go from general CS background to a career in AI/Machine Learning by brown_ja
Go to grad school, get really good at optimization, prob/stats, linear algebra, and take plenty ML. Masters usually is minimum for ML positions, but PhDs will dominate positions for any cutting edge research
virgilash t1_jcr4ont wrote
Reply to Seeking Career Advice to go from general CS background to a career in AI/Machine Learning by brown_ja
How solid is your math, op?
Amazing-Warthog5554 t1_jcqzekr wrote
Reply to comment by Ok-Demand-7347 in 5100+ Chat GPT Prompts Excel Sheet by Ok-Demand-7347
how bout 2 bucks
Amazing-Warthog5554 t1_jcqz62z wrote
Reply to comment by Ok-Demand-7347 in 5100+ Chat GPT Prompts Excel Sheet by Ok-Demand-7347
The last time I bought a list of prompts, they were all things I could easily do myself.
brown_ja OP t1_jcq326i wrote
Reply to comment by alki284 in Seeking Career Advice to go from general CS background to a career in AI/Machine Learning by brown_ja
Thanks for the advice.
brown_ja OP t1_jcq2zz9 wrote
Reply to comment by smackson in Seeking Career Advice to go from general CS background to a career in AI/Machine Learning by brown_ja
Noted
smackson t1_jcpxq8l wrote
Reply to Seeking Career Advice to go from general CS background to a career in AI/Machine Learning by brown_ja
Do further study at a university that does have machine learning expertise.
Sorry, not a 5 to 6 month solution there.
alki284 t1_jcpxhj6 wrote
Reply to Seeking Career Advice to go from general CS background to a career in AI/Machine Learning by brown_ja
Frankly, you’ll struggle. A lot of junior ML positions require a masters degree as a minimum. But beyond that, what is your ML background like? Are you comfortable with the maths? Do you have side projects to show case your skills and knowledge?
If you don’t have a background, then going into a back end SWE role and transitioning after a couple of years is also a viable path. You can try and get in ‘ML adjacent’ type roles and gain experience from there
FunQuarter3511 OP t1_jcptitk wrote
Reply to comment by hijacked_mojo in Question on Attention by FunQuarter3511
That makes a ton of sense. Thanks for your help! You are a legend!
hijacked_mojo t1_jcpstsu wrote
Reply to comment by FunQuarter3511 in Question on Attention by FunQuarter3511
Yes, you have the right idea but also add this to your mental model: the queries and values are influenced by their *own* set of weights. So it's not only the keys getting modified, but also queries and values.
In other words, the queries, keys and values weights all get adjusted via backprop to minimize the error. So it's entirely possible on a backprop that the value weights get modified a lot (for example) while the key weights are changed little.
It's all about giving the network the "freedom" to adjust itself to best minimize the error.
FunQuarter3511 OP t1_jcpstej wrote
Reply to comment by p0p4ks in Question on Attention by FunQuarter3511
Fully agree!
I think my issue was that because of the terms query, key, value, I was trying to relate them in a database or hashtable context. But in reality, those terms seem to be misnomers, and backprop will set the key/query pair to whatever is needed such that the dot product for important context will be large and be weighted appropriately.
I was over complicating it.
p0p4ks t1_jcppzf4 wrote
Reply to Question on Attention by FunQuarter3511
I get these confusions all the time. But then I remember we are back propagating the errors. Imagine your case happening and the model output was incorrect, the backprop will take care of fixing the key value being too big or small and fix the output.
FunQuarter3511 OP t1_jcpmkyt wrote
Reply to comment by hijacked_mojo in Question on Attention by FunQuarter3511
>I have a video that goes through everything
First off, this video is amazing! You definitely have a new subscriber in me and I will be sharing! Hope you keep making content!!
So I was originally thinking about this like a python dictionary/hash table where you have keys and values, and you retrieve values when the "query" = key.
Rather what is happening here, is that the "loudest" (by magnitude) key is expected to get the highest weight. This is okay, because the key/query (and value) weight matrix are learned anyways, so during backprop, the most important key will just learn to be louder (in addition to being able to learn from the value weights matrix as well).
In essence, the python dictionary is just the wrong analogy to be using here. We are not necessarily giving greater weights to key/query pairs that are similar. But rather, we want the most important keys to be large, which it will learn.
Does that sound right?
Ok-Demand-7347 OP t1_jcpf5nm wrote
Reply to comment by Amazing-Warthog5554 in 5100+ Chat GPT Prompts Excel Sheet by Ok-Demand-7347
Hi, this list is $9 on Etsy to be transparent because I consider it a pretty valuable resource. I'm a software engineer by trade, and wanted to share a master list of what I learned/prompts I've created. Cheers.
hijacked_mojo t1_jcpaon7 wrote
Reply to Question on Attention by FunQuarter3511
Keys come from weights, and the dot product determines how much attention a particular query vector should get. The weights are then adjusted during backprop to minimize the error, and thereby modify the keys.
I have a video that goes through everything step-by-step:
https://www.youtube.com/watch?v=acxqoltilME
Amazing-Warthog5554 t1_jcp9tuo wrote
Reply to comment by Ok-Demand-7347 in 5100+ Chat GPT Prompts Excel Sheet by Ok-Demand-7347
I can has free?
Ok-Demand-7347 OP t1_jcp96ja wrote
Reply to comment by sEi_ in 5100+ Chat GPT Prompts Excel Sheet by Ok-Demand-7347
Hi this isn't a scam, I compiled this list as a resource for myself and wanted to share it with others. Thanks
sEi_ t1_jcox2m9 wrote
Reply to 5100+ Chat GPT Prompts Excel Sheet by Ok-Demand-7347
I think my adblocker is out of order.
thesupernoodle t1_jcsll6u wrote
Reply to comment by FirstOrderCat in Best GPUs for pretraining roBERTa-size LLMs with a $50K budget, 4x RTX A6000 v.s. 4x A6000 ADA v.s. 2x A100 80GB by AngrEvv
Sure; but the broader point is they can optimize their need with some cheap testing - is the model big enough such that is wants the extra ram of an 80Gig A100?