currentscurrents

currentscurrents t1_j4ijlqv wrote

You can fine-tune image generator models and some smaller language models.

You can also do tasks that don't require super large models, like image recognition.

>that's beyond just some toy experiment?

Don't knock toy experiments too much! I'm having a lot of fun trying to build a differentiable neural computer or memory-augmented network in pytorch.

3

currentscurrents t1_j4a2las wrote

>Specifically, 1) we design an expert system to generate a melody by developing musical elements from motifs to phrases then to sections with repetitions and variations according to pre-given musical form; 2) considering the generated melody is lack of musical richness, we design a Transformer based refinement model to improve the melody without changing its musical form. MeloForm enjoys the advantages of precise musical form control by expert systems and musical richness learning via neural models.

16

currentscurrents t1_j49u28o wrote

I don't think it's that simple - whether or not generative AI is considered "transformative" has not yet been tested by the courts.

Until somebody actually gets sued over this and it goes to court, we don't know how the legal system is going to handle it. There is currently a lawsuit against Github Copilot, so we will probably know in the next couple years.

5

currentscurrents t1_j490rvn wrote

It's meaningful right now because there's a threshold where LLMs become awesome, but getting there requires expensive specialized GPUs.

I'm hoping in a few years consumer GPUs will have 80GB of VRAM or whatever and we'll be able to run them locally. While datacenters will still have more compute, it won't matter as much since there's a limit where larger models would require more training data than exists.

3

currentscurrents t1_j48csbo wrote

Reply to comment by RandomCandor in [D] Bitter lesson 2.0? by Tea_Pearce

If it is true that performance scales infinitely with compute power - and I kinda hope it is, since that would make superhuman AI achievable - datacenters will always be smarter than PCs.

That said, I'm not sure that it does scale infinitely. You need not just more compute but also more data, and there's only so much data out there. GPT-4 reportedly won't be any bigger than GPT-3 because even terabytes of scraped internet data isn't enough to train a larger model.

6

currentscurrents t1_j4716tp wrote

Reply to comment by ml-research in [D] Bitter lesson 2.0? by Tea_Pearce

Try to figure out systems that can generalize from smaller amounts of data? It's the big problem we all need to solve anyway.

There's a bunch of promising ideas that need more research:

  • Neurosymbolic computing
  • Expert systems built out of neural networks
  • Memory augmented neural networks
  • Differentiable neural computers
8

currentscurrents t1_j4702g0 wrote

Reply to comment by mugbrushteeth in [D] Bitter lesson 2.0? by Tea_Pearce

Compute is going to get cheaper over time though. My phone today has the FLOPs of a supercomputer from 1999.

Also if LLMs become the next big thing you can expect GPU manufacturers to include more VRAM and more hardware acceleration directed at them.

9

currentscurrents t1_j44pu0u wrote

Is it though? These days it seems like even a lot of research papers are just "we stuck together a bunch of pytorch components like lego blocks" or "we fed a transformer model a bunch of data".

Math is important if you want to invent new kinds of neural networks, but for end users it doesn't seem very important.

7

currentscurrents OP t1_j44nngb wrote

The paper does talk about this and calls transformers "first generation compositional systems" - but limited ones.

>Transformers, on the other hand, use graphs, which in principle can encode general, abstract structure, including webs of inter-related concepts and facts.

> However, in Transformers, a layer’s graph is defined by its data flow, yet this data flow cannot be accessed by the rest of the network—once a given layer’s data-flow graph has been used by that layer, the graph disappears. For the graph to be a bona fide encoding, carrying information to the rest of the network, it would need to be represented with an activation vector that encodes the graph’s abstract, compositionally-structured internal information.

>The technique we introduce next—NECST computing—provides exactly this type of activation vector.

They then talk about a more advanced variant called NECSTransformers, which they consider a 2nd generation compositional system. But I haven't heard of this system before and I'm not clear if it actually performs better.

10

currentscurrents OP t1_j43s8ki wrote

In the paper they talk about "first generation compositional systems" and I believe they would include differentiable programming in that category. It has some compositional structure, but the structure is created by the programmer.

Ideally the system would be able to create it's own arbitrarily complex structures and systems to understand abstract ideas, like humans can.

4

currentscurrents t1_j3eo4uc wrote

There's plenty of work to be done in researching language models that train more efficiently or run on smaller machines.

ChatGPT is great, but it needed 600GB of training data and megawatts of power. It must be possible to do better; the average human brain runs on 12W and has seen maybe a million words tops.

2

currentscurrents t1_j3emas4 wrote

>I hate to break your bubble, but the task is also achievable even with GPT2

Is it? I would love to know how. I can run GPT2 locally, and that would be fantastic level of zero-shot learning to be able to play around with.

I have no doubt you can fine-tune GPT2 or T5 to achieve this, but in my experience they aren't nearly as promptable as GPT3/ChatGPT.

>Specifically the task you gave it is likely implicitly present in the dataset, in the sense that the dataset allowed the model to learn the connections between the words you gave it

I'm not sure what you're getting at here. It has learned the connections and meanings between words of course, that's what a language model does.

But it still followed my instructions, and it can follow a wide variety of other detailed instructions you give it. These tasks are too specific to have been in the training data; it is successfully generalizing zero-shot to new NLP tasks.

1

currentscurrents t1_j3eiw5w wrote

I think you're missing some of the depth of what it's capable of. You can "program" it to do new tasks just by explaining in plain english, or by providing examples. For example many people are using it to generate prompts for image generators:

>I want you to act as a prompt creator for an AI image generator.

>Prompts are descriptions of artistic images than include visual adjectives and art styles or artist names. The image generator can understand complex ideas, so use detailed language and describe emotions or feelings in detail. Use terse words separated by commas, and make short descriptions that are efficient in word use.

>With each image, include detailed descriptions of the art style, using the names of artists known for that style. I may provide a general style with the prompt, which you will expand into detail. For example if I ask for an "abstract style", you would include "style of Picasso, abstract brushstrokes, oil painting, cubism"

>Please create 5 prompts for an mob of grandmas with guns. Use a fantasy digital painting style.

This is a complex and poorly-defined task, and it certainly was not trained on this since the training stops in 2021. But the resulting output is exactly what I wanted:

>An army of grandmas charging towards the viewer, their guns glowing with otherworldly energy. Style of Syd Mead, futuristic landscapes, sleek design, fantasy digital painting.

Once I copy-pasted it into an image generator it created a very nice image.

I think we're going to see a lot more use of language models for controlling computers to do complex tasks.

0

currentscurrents OP t1_j34uma6 wrote

>That alone I doubt it, even if it could theoretically reproduce how the brain works with the same power efficiency it doesn't mean you would have the algorithm to efficiently use this hardware.

I meant just in terms of compute efficiency, using the same kind of algorithms we use now. It's clear they won't magically give you AGI, but Innatera claims 10000x lower power usage with their chip.

This makes sense to me; instead of emulating a neural network using math, you're building a physical model of one on silicon. Plus, SNNs are very sparse and an analog one would only use power when firing.

>Usual ANNs are designed for current tasks and current tasks are often designed for usual ANNs. It's easier to use the same datasets but I don't think the point of SNNs is just to try to perform better on these datasets but rather to try more innovative approaches on some specific datasets.

I feel like a lot of SNN research is motivated by understanding the brain rather than being the best possible AI. It also seems harder to get traditional forms of data into and out of the network, like you have to convert images into spike timings - for which there are several methods each with downsides and upsides.

2

currentscurrents t1_j2zidye wrote

I think that actually measures how good it is at getting popular on social media, which is not the same task as making good art.

There's also some backlash against AI art right now, so this might favor models that can't be distinguished from human art rather than models that are better than human art.

8