ReginaldIII
ReginaldIII t1_iurx61x wrote
Reply to comment by Ulfgardleo in [R] Is there any work being done on reduction of training weight vector size but not reducing computational overhead (eg pruning)? by Moose_a_Lini
Yes. You stated what OP had meant, and I responded to you to say that what you (both) were describing was just pruning in general.
ReginaldIII t1_iurmviz wrote
Reply to comment by Ulfgardleo in [R] Is there any work being done on reduction of training weight vector size but not reducing computational overhead (eg pruning)? by Moose_a_Lini
This is just describing pruning though, the whole purpose of better pruning methods is reduce size without compromising performance on the intended task.
If you are embedding the weights of a model in an FPGA then the size of the FPGA is your bottleneck, it's unlikely to be your bandwidth talking to the ground because FPGA's just aren't that big relatively speaking.
Yes ground comms is a factor, but realistically A. How often are you going to be flashing new models onto your orbital systems relative to B. How much inference are you going to be doing with any one given model, and C. How big is the amount of data you'll then need to beam back down to collect those inferences.
Is the upload of the model weights really the dominant factor here?
By all means, strive to make the model as small as possible. But there's nothing special about the edge device being in orbit compared to it being on earth but hard to access.
ReginaldIII t1_it959lw wrote
Reply to comment by phraisely in [P] Look up words by their description by phraisely
Why do you need email addresses at all? Like by all means have an opt-in mailing list to talk about updates. Have a discord server for support. But why force people to provide an email address other than the paywall?
- Try me out - Free - 3 queries per month
- Hobby - $0.99 per 100 queries. Must use within 1 day.
- Professional - $9 per 5,000 queries. Must use within 30 days.
I'm sorry this just isn't at all viable as a payment scheme. You are never going to gain a consistent userbase.
And realistically, if your compute requirements are high enough that this costing model is necessary then you've just learned why Netflix has never used the winning entry to it's recommender systems challenges in production, because they tend to be too computationally expensive to use in anger.
Why not look at a document retrieval system like Marqo https://github.com/marqo-ai/marqo ?
Put words into Marqo as keys with blobs of text of their usage and or written definitions as the documents.
It will probably retrieve relevant words from natural language queries well enough to be used for this purpose. And it will be cheap to evaluate.
ReginaldIII t1_isal2mt wrote
Reply to comment by kajladk in [N] First RTX 4090 ML benchmarks by killver
Nvidia and Puget want to report lucky run. Lots of people do this. They're being fully transparent that they are reporting lucky runs. And it makes sense from their perspective to report their best theoretical performance.
It honestly just doesn't bother me to see them doing it because it's very normal and lots of people report this way. Even if we think an average with an error bar would be fairer.
ReginaldIII t1_is8j7x8 wrote
Reply to comment by kajladk in [N] First RTX 4090 ML benchmarks by killver
I would say there's precedent for lucky run benchmark scores. Consider 3dmark as an example.
https://benchmarks.ul.com/hall-of-fame-2/timespy+3dmark+score+performance+preset/version+1.0
All of those runs with different system configurations are peoples luckiest runs.
ReginaldIII t1_is6e8rx wrote
Reply to comment by mlaprise in [N] First RTX 4090 ML benchmarks by killver
Not really. There's still a lot of models being used in production written for the old TF graph API.
And if you've tested every prior GPU against that standard benchmark model for years you keep doing it so you can see what happens.
Edit: And as is this subs tradition for callous downvoting because your knee jerk reaction wasn't correct... Here's the relevant part of the article for you
> TensorFlow 1.15.5 ResNet50 > This is the NVIDIA maintained version 1 of TensorFlow which typically offers somewhat better performance than version 2. The benchmark is training 100 steps of the ResNet 50 layer convolution neural network (CNN). The result is the highest images-per-second value from the run steps. FP32 and FP16 (tensorcore) jobs were run.
It's a standard benchmark model! And it performs better that those written for TF2. What more do you want?
ReginaldIII t1_irrlphx wrote
Reply to [N] Using machine learning to find an optimal mixture of metals to create a desired alloy by cyphersanthosh
Very similar to an approach my team is working on for drug and compound discovery in computational biochem.
ReginaldIII t1_irn31o6 wrote
Reply to comment by HaohanWang in [D] I'm considering opening my lab to the public and inviting collaborators of interest by [deleted]
> If there is money involved, I will have to scrutinize the CVs and pick the strongest ones.
Yes. Employment law tends to require that.
> I would say I'm doing a fairly generous thing already by sharing my time
Things said by every person who has ever offered an unpaid internship.
> I did meet students whose primary care is the payment (usually because of family factors)
Everyone has a right to be paid for their labour regardless of their circumstances. As an employer you wouldn't even get to ask them why they needed the money you were going to pay them. This is a layer of protection for employees enshrined in employment law. Everyone is entitled to privacy in their personal lives.
> I cannot offer that.
Then don't. But also with respect, don't find a loophole around it either. It is categorically the wrong thing to do.
It is entirely possible for a person to have good intentions and then do the wrong thing.
ReginaldIII t1_iridt3u wrote
Reply to [D] I'm considering opening my lab to the public and inviting collaborators of interest by [deleted]
Or you could apply for some funding and pay them to come work in your lab ... That's also an option.
ReginaldIII t1_irc7nt0 wrote
Reply to comment by master3243 in [R] Discovering Faster Matrix Multiplication Algorithms With Reinforcement Learning by EducationalCicada
I'm done.
You only care about the contribution to matmul. Fine.
There's a much bigger contribution to RL being used to solve these types of problems (wider than just matmul). But fine.
Goodbye.
ReginaldIII t1_irav3ov wrote
Reply to comment by master3243 in [R] Discovering Faster Matrix Multiplication Algorithms With Reinforcement Learning by EducationalCicada
So if the paper is ready to be made public. Why not release the code publicly at the same time.
> It is replicable, they literally have the code.
Replicable by the people who have access to the code.
If you are ready to publish the method in Nature you can damn well release the code with it! Good grief, what the fuck are you even advocating for?
ReginaldIII t1_irakn7b wrote
Reply to comment by master3243 in [R] Discovering Faster Matrix Multiplication Algorithms With Reinforcement Learning by EducationalCicada
> How did you already conclude that "no one can [...] actually apply it"
Because I read the paper and their supplementary docs and realized there's no way anyone could actually implement this given its current description.
> ML is currently suffering from the fact that people expect each paper to be a huge leap on its own,
I don't expect every paper to be a huge leap I expect when a peer reviewed publication is publicly released in NATURE that it is replicable!
ReginaldIII t1_ir9w5x1 wrote
Reply to comment by ThatInternetGuy in [R] Discovering Faster Matrix Multiplication Algorithms With Reinforcement Learning by EducationalCicada
> It's apparent that RTX Tensor Cores and CUTLASS have really solved it.
You mean more efficiency was achieved using a novel type of hardware implementing a state of the art algorithm?
So if we develop methods for searching for algorithms with even better op requirements, we can work on developing hardware that directly leverages those algorithms.
> Why do you guys think matrix multiplication is currently slow with GPU, I don't get that.
I don't think that. I think that developing new hardware and implementing new algorithms that leverage that hardware is how it gets even faster.
And it's an absurd statement for you to make because it's entirely relative. Go back literally 4 years and you could say the same thing despite how much has happened since.
> This has never been figured out for ages; however, it's up to the debate if the AI could improve the
> The other guy said it's an unsolved problem. There is nothing unsolved when it comes to matrix multiplication. It has been vastly optimized over the years since RTX first came out.
The "other guy" is YOU!
ReginaldIII t1_ir9uakx wrote
Reply to comment by ThatInternetGuy in [R] Discovering Faster Matrix Multiplication Algorithms With Reinforcement Learning by EducationalCicada
> however, it's up to the debate if the AI could improve the GEMM design to allow an even faster matrix-matrix multiplication.
Nvidia have been applying RL for chip design and optimization: https://developer.nvidia.com/blog/designing-arithmetic-circuits-with-deep-reinforcement-learning/
So I think it's pretty clear that they think it's possible.
ReginaldIII t1_ir9tsc8 wrote
Reply to comment by master3243 in [R] Discovering Faster Matrix Multiplication Algorithms With Reinforcement Learning by EducationalCicada
You're correct, I haven't pointed out anything wrong with the paper conceptually. It appears to work. Their matmul results are legitimate and verifiable. Their JAX benchmarks do produce the expected results.
In exactly the same way AlphaZero and AlphaFold do demonstrably work well. But it's all a bit moot and useless when no one can take this seemingly powerful method and actually apply it.
If they had released the matmul code yesterday people today would already be applying it to other problems and discussing it like we have done with StableDiffusion in recent weeks. But with a massively simplified pipeline to getting results because there's no dataset dependency, only compute, which can just be remedied with longer training times.
ReginaldIII t1_ir6jxgl wrote
Reply to [R] Discovering Faster Matrix Multiplication Algorithms With Reinforcement Learning by EducationalCicada
Incredibly dense paper. The paper itself doesn't give us much to go on realistically.
The supplementary paper gives a lot of algorithm listings in pseudo python code, but significantly less readable than python.
The github repo gives us nothing to go on except for some bare bones notebook cells for loading their pre-baked results and executing them in JAX.
Honestly the best and most concise way they could possibly explain how they applied this on the matmul problem would be the actual code.
Neat work but science weeps.
ReginaldIII t1_ir6ixyl wrote
Reply to comment by bigfish_in_smallpond in [R] Discovering Faster Matrix Multiplication Algorithms With Reinforcement Learning by EducationalCicada
Faster, higher throughput, less energy usage... Yes it literally pays for itself.
ReginaldIII t1_ir0i174 wrote
Reply to comment by DuLLSoN in [N] Stable Diffusion reaches new record (with explanation + colab link) by Norlax_42
Yeah, fair enough I just tried it again. Maybe they randomly gave me a more beefy GPU last night. Still 13 seconds for 3.
ReginaldIII t1_ir067hc wrote
Reply to comment by DuLLSoN in [N] Stable Diffusion reaches new record (with explanation + colab link) by Norlax_42
import keras_cv
from tensorflow import keras
keras.mixed_precision.set_global_policy("mixed_float16")
model = keras_cv.models.StableDiffusion(img_width=512, img_height=512, jit_compile=True)
images = model.text_to_image("photograph of an astronaut riding a horse", batch_size=5)
ReginaldIII t1_iqzu0z9 wrote
I've been using the KerasCV's implementation with a T4 GPU on Colab with 16 bit floats and jitted to do batch size 5, 25 steps in 13 seconds. So I don't think it's fair to say you outright beat Keras' performance.
Amazing work all the same.
ReginaldIII t1_iv0gqkt wrote
Reply to comment by DeepGamingAI in [D] DALLĀ·E to be made available as API, OpenAI to give users full ownership rights to generated images by TiredOldCrow
ItsFreeRealEstate.jpg