currentscurrents t1_j2yp98t wrote on January 4, 2023 at 9:38 PM

Reply to comment by Red-Portal in [Discussion] If ML is based on data generated by humans, can it truly outperform humans? by groman434

For some tasks it seems hard to even define the question. What would it even mean to have superhuman performance at art?

currentscurrents t1_j2uxtmv wrote on January 4, 2023 at 2:56 AM

Reply to comment by SnooHesitations8849 in [R] AMD Instinct MI25 | Machine Learning Setup on the Cheap! by zveroboy152

Yeah... it's no A100, but it's on par with the high-end gamer cards of today. For much less money.

currentscurrents t1_j2uwlrh wrote on January 4, 2023 at 2:47 AM

Reply to comment by Mental-Swordfish7129 in [R] Do we really need 300 floats to represent the meaning of a word? Representing words with words - a logical approach to word embedding using a self-supervised Tsetlin Machine Autoencoder. by olegranmo

I think interpretability will help us build better models too. For example, in this paper they deeply analyzed a model trained to do a toy problem - addition mod 113.

They found that it was actually working by doing a Discrete Fourier Transform to turn the numbers into sinewaves. Sinewaves are great for gradient descent because they're easily differentiable (unlike modular addition on the natural numbers, which is not differentiable), and if you choose the right frequency it'll repeat every 113 numbers. The modular addition algorithm worked by doing a bunch of addition and multiplication operations on these sinewaves, which gave the same result as modular addition.

This lets you answer an important question; why wasn't the network generalizable to other bases other than mod 113? Well, the frequency of the sinewaves was hardcoded into the network, so it couldn't work for any other bases.

The opens the possibility to do neural network surgery, and change the frequency to work with any base.

currentscurrents t1_j2uvlg2 wrote on January 4, 2023 at 2:40 AM

Reply to comment by SatoshiNotMe in RIFFUSION real time AI music generation with stable diffusion , Text to Music AI [R] by [deleted]

I'm impressed that it worked at all, with how different spectrograms are from natural images.

currentscurrents t1_j2uul25 wrote on January 4, 2023 at 2:32 AM

Reply to [R] AMD Instinct MI25 | Machine Learning Setup on the Cheap! by zveroboy152

Interesting how that card went from $15k to $100 in the space of five years.

I'm holding out hope the A100 will do the same once it's a couple generations old.

currentscurrents t1_j2tuq1a wrote on January 3, 2023 at 10:25 PM

Reply to comment by Mental-Swordfish7129 in [R] Do we really need 300 floats to represent the meaning of a word? Representing words with words - a logical approach to word embedding using a self-supervised Tsetlin Machine Autoencoder. by olegranmo

There's a lot of old ideas that are a ton more useful now that we have more compute in one GPU than in their biggest supercomputers.

currentscurrents t1_j2trd40 wrote on January 3, 2023 at 10:04 PM

Reply to comment by artsybashev in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon

If only it could run on a card that doesn't cost as much as a car.

I wonder if we will eventually hit a wall where more compute is required for further improvement, and we can only wait for GPU manufacturers. Similar to how they could never have created these language models in the 80s, no matter how clever their algorithms - they just didn't have enough compute power, memory, or the internet to use as a dataset.

currentscurrents t1_j2srptn wrote on January 3, 2023 at 6:27 PM

Reply to comment by EmmyNoetherRing in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon

I've seen other research that pruning as a continual process during training can actually improve performance. Which is interesting since that is what the brain does.

currentscurrents OP t1_j2k1t74 wrote on January 1, 2023 at 10:32 PM

Reply to comment by keepthepace in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

Interesting! I have heard about these, but it doesn't look like there's been much work on them in the last few years - it's mostly 2014-2018 papers.

currentscurrents OP t1_j2hdtor wrote on January 1, 2023 at 8:12 AM

Reply to comment by abc220022 in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

Interesting! That's a very different way to implement modular addition, but it makes sense for the network to do it that way.

currentscurrents OP t1_j2hdsvv wrote on January 1, 2023 at 8:12 AM

Reply to comment by MrAcurite in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

Someone else posted this example, which is kind of what I was interested in. They trained a neural network to do a toy problem, addition mod 113, and then were able to determine the algorithm it used to compute it.

>The algorithm learned to do modular addition can be fully reverse engineered. The algorithm is roughly:

>Map inputs x,y→ cos(wx),cos(wy),sin(wx),sin(wy) with a Discrete Fourier Transform, for some frequency w.

>Multiply and rearrange to get cos(w(x+y))=cos(wx)cos(wy)−sin(wx)sin(wy) and sin(w(x+y))=cos(wx)sin(wy)+sin(wx)cos(wy)

>By choosing a frequency w=2πnk we get period dividing n, so this is a function of x + y (mod n)

>Map to the output logits z with cos(w(x+y))cos(wz)+sin(w(x+y))sin(wz)=cos(w(x+y−z)) - this has the highest logit at z≡x+y(mod n), so softmax gives the right answer.

>To emphasise, this algorithm was purely learned by gradient descent! I did not predict or understand this algorithm in advance and did nothing to encourage the model to learn this way of doing modular addition. I only discovered it by reverse engineering the weights.

This is a very different way to do modular addition, but it makes sense for the network. Sine/cosine functions represent waves that repeat every frequency, so if you choose the right frequency you can implement the non-differentiable modular addition function just working with differentiable functions.

Extracting this algorithm is useful for generalization; while the original network only worked for mod 113, with the algorithm we can plug in any value for the frequency. Of course this is a toy example and there are much faster ways to do modular addition, but maybe it could work for more complex problems too.

currentscurrents OP t1_j2h21zn wrote on January 1, 2023 at 5:45 AM

Reply to comment by MrAcurite in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

>For example, a sort function is necessarily general over the entire domain of the entries for which it is valid, whereas a neural network will only approximate a function over the subfield of the domain for which it was trained, all bets are off elsewhere; it doesn't generalize.

But you can teach neural networks to do things like solve arbitrary mazes. Isn't that pretty algorithmic?

currentscurrents OP t1_j2gfvev wrote on January 1, 2023 at 2:20 AM

Reply to comment by enzlbtyn in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

Nice, this is the kind of thing I'm looking for! Their approach is different (training a neural network to write programs to solve a task, instead of converting a network that's been trained for the task) but I think it may be equivalent.

currentscurrents OP t1_j2gdumb wrote on January 1, 2023 at 2:04 AM

Reply to comment by RandomIsAMyth in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

By classical algorithm, I mean something that doesn't use a neural network. Traditional programming and neural networks are two very different ways to solve problems, but they can solve many of the same problems.

That sounds like a translation problem, which neural networks are good at. Just like in translation, it would have to understand the higher-level idea behind the implementation.

It's like text-to-code, but network-to-code instead.

currentscurrents OP t1_j2gctk4 wrote on January 1, 2023 at 1:55 AM

Reply to comment by Dylan_TMB in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

Interesting!

This feels like it falls under emulating a neural network, since you've done equivalent computations - just in a different form.

I wonder if you could train a neural network with the objective of creating the minimal decision tree.

currentscurrents OP t1_j2g9mvy wrote on January 1, 2023 at 1:30 AM

Reply to comment by Dylan_TMB in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

Thanks, that is the question I'm trying to ask! I know explainability is a bit of a dead-end field right now so it's a hard problem.

An approximate or incomprehensible algorithm could still be useful if it's faster or uses less memory. But I think to accomplish that you would need to convert it into higher-level ideas; otherwise you're just emulating the network.

Luckily neural networks are capable of converting things into higher-level ideas? It doesn't seem fundamentally impossible.

currentscurrents OP t1_j2g5x92 wrote on January 1, 2023 at 1:02 AM

Reply to comment by On_Mt_Vesuvius in [D] Is there any research into using neural networks to discover classical algorithms? by currentscurrents

Thanks for the link, that's good to know about!

But maybe I should have titled this differently. I'm interested in taking a network that solves a problem networks are good at, and converting it into a code representation as a way to speed it up. Like translating between between the two different forms of computation.

currentscurrents t1_j2fduvv wrote on December 31, 2022 at 9:31 PM

Reply to comment by Longjumping_Essay498 in [Discussion] is attention an explanation? by Longjumping_Essay498

You can get some information this way, but not everything you would want to know. You can try it yourself with BertViz.

The information you do get can be useful though. For example in image processing, you can use the attention map from an object classifier to see where the object is in the image.

currentscurrents t1_j2f996k wrote on December 31, 2022 at 8:58 PM

Reply to [Discussion] is attention an explanation? by Longjumping_Essay498

Attention maps can be a type of explanation.

It tells you what the model was looking at when it generated a word or identified an image, but it doesn't tell you why it looked at those bits or why it made the decision it did. You can get some useful information by looking at it, but not everything you need to explain the model.

currentscurrents t1_j2ege8h wrote on December 31, 2022 at 5:37 PM

Reply to comment by Ok_Reference_7489 in An Open-Source Version of ChatGPT is Coming [News] by lambolifeofficial

https://github.com/lucidrains/PaLM-rlhf-pytorch

currentscurrents t1_j2ef37r wrote on December 31, 2022 at 5:28 PM

Reply to comment by Ok_Reference_7489 in An Open-Source Version of ChatGPT is Coming [News] by lambolifeofficial

Right, he's not the developer - it's just an article about the project.

currentscurrents t1_j2czmdk wrote on December 31, 2022 at 9:16 AM

Reply to comment by bluehands in [R] 2022 Top Papers in AI — A Year of Generative Models by designer1one

Basically anything you can generate, you can also classify. Most of the image generators use CLIP for guidance, so if they can generate a sad face (and they can), CLIP can tell you whether or not a face is sad.

currentscurrents t1_j2csenb wrote on December 31, 2022 at 7:41 AM

Reply to comment by nogop1 in [R] LAMBADA: Backward Chaining for Automated Reasoning in Natural Language - Google Research 2022 - Significantly outperforms Chain of Thought and Select Inference in terms of prediction accuracy and proof accuracy. by Singularian2501

The number of layers is a hyperparameter, and people do optimization to determine the optimal values for hyperparameters.

Model size does seem to be a real scaling law. It's possible that we will come up with better algorithms that work on smaller models, but it's also possible that neural networks need to be big to be useful. With billions of neurons and an even larger number of connections/parameters, the human brain is certainly a very large network.

currentscurrents t1_j2cm36p wrote on December 31, 2022 at 6:26 AM

Reply to An Open-Source Version of ChatGPT is Coming [News] by lambolifeofficial

TL;DR they want to take another language model (Google’s PaLM) and do Reinforcement Learning with Human Feedback (RLHF) on it like OpenAI did for ChatGPT.

At this point they haven't actually done it yet, since they need both compute power and human volunteers to do the training:

>Human volunteers will be employed to rank those responses from best to worst, using the rankings to create a reward model that takes the original model’s responses and sorts them in order of preference, filtering for the top answers to a given prompt.

>However, the process of aligning this model with what users want to accomplish with ChatGPT is both costly and time-consuming, as PaLM has a massive 540 billion parameters. Note that the cost of developing a text-generating model with only 1.5 billion parameters can reach up to $1.6 million.

Since it has 540b parameters, you will still need a GPU cluster to run it.

currentscurrents t1_j2by81g wrote on December 31, 2022 at 2:53 AM

Reply to [R] LAMBADA: Backward Chaining for Automated Reasoning in Natural Language - Google Research 2022 - Significantly outperforms Chain of Thought and Select Inference in terms of prediction accuracy and proof accuracy. by Singularian2501

So, if I'm understanding right:

Backwards chaining is an old classical algorithm for logic proving.
They've implemented backwards chaining using a bunch of language models, so it works well with natural text.
Given a knowledge base (which are available as datasets these days), it can decompose a statement and check if it's logically consistent with that knowledge.
The reason they're interested in this is to use it as a training function to make language models more accurate.

This is effectively an old "expert system" from the 70s built out of neural networks. I wonder what other classical algorithms you can implement with neural networks.

I also wonder if you could use this to create its own knowledge base from internet data. Since the internet is full of contradicting information, you would have to compare new data against existing data somehow and decide which to keep.