currentscurrents

currentscurrents t1_j9gp4uq wrote

> From an information theory standpoint, it creates potential information loss due to the lower dimensionality.

Exactly! That's the point.

The bottleneck forces the network to throw away the parts of the data that don't contain much information. It learns to encode the data in an information-dense representation so that the decoder on the other side of the bottleneck can work with high-level ideas instead of pixel values.

If you manually tweak the values in the bottleneck, you'll notice it changes high-level ideas in the data like the gender or shape of a face, not pixel values. This is how autoencoders work; a unet is basically an autoencoder with skip connections.

Interestingly, biological neural networks that handle feedforward perception seem to do the same thing. Take a look at the structure of an insect antenna; thousands of input neurons bottleneck down to only 150 neurons, before expanding again for processing in the rest of the brain.

26

currentscurrents t1_j96vbfj wrote

Microsoft has confirmed the rules are real:

>We asked Microsoft about Sydney and these rules, and the company was happy to explain their origins and confirmed that the secret rules are genuine.

The rest, who knows. I never got access before they fixed it. But there are many screenshots from different people of it acting quite unhinged.

2

currentscurrents t1_j96pkvw wrote

> The models are larger because there's maybe 100x the information in a high res image than a paragraph of text.

That's actually not true. Today's LLMs are 175B parameters, Stable Diffusion is 890 million.

Images contain a lot of pixels, but most of those pixels are easy to predict and don't contain much high-level information. A paragraph of text can contain many complex abstract ideas, while an image usually only contains a few objects with simple relationships between them.

In many image generators (like Imagen), the language model they use to understand the prompt is several times bigger than the diffuser they use to generate the image.

7

currentscurrents t1_j8zz4n3 wrote

Look at things like replika.ai that give you a "friend" to chat with. Now imagine someone evil using that to run a romance scam.

Sure the success rate is low, but it can search for millions of potential victims at once. The cost of operation is almost zero compared to human-run scams.

On the other hand, it also gives us better tools to protect against it. We can use LLMs to examine messages and spot scams. People who are lonely enough to fall for a romance scam may compensate for their loneliness by chatting with friendly or sexy chatbots.

6

currentscurrents t1_j8zy3m4 wrote

In the modern economy the best way to make a lot of money is to make a product that a lot of people are willing to pay money for. You can make some money scamming people, but nothing close to the money you'd make by creating the next iphone-level invention.

Also, that's not a problem of AI alignment, that's a problem of human alignment. The same problem applies to the current world or the world a thousand years ago.

But in a sense I do agree; the biggest threat from AI is not that it will go Ultron, but that humans will use it to fight our own petty struggles. Future armies will be run by AI, and weapons of war will be even more terrifying than now.

1

currentscurrents t1_j8zwnht wrote

It depends on whether it's exploiting my psychology to sell me something I don't need, or if it's gathering information to find something that may actually be useful for me. I suspect the latter is a more useful strategy in the long run because people tend to adjust to counter psychological exploits.

If I'm shown an advertisement for something I actually want... that doesn't sound bad? I certainly don't like ads for irrelevant things like penis enlargement.

0

currentscurrents t1_j8zugnd wrote

The lucky thing is that neural networks aren't evil by default; they're useless and random by default. If you don't give them a goal they just sit there and emit random garbage.

Lack of controllability is a major obstacle to the usability of language models or image generators, so there's lots of people working on it. In the process, they will learn techniques that we can use to control future superintelligent AI.

0

currentscurrents t1_j8zq4tn wrote

>I wouldn’t say it’s common to design networks with information flow in mind

I disagree. The entire point of the attention mechanism in transformers is to have a second neural network to control the flow of information.

Similarly, the autoencoder structure is ubiquitous these days, and it's based around the idea of forcing information to flow through a bottleneck. Some information must be thrown away, so the neural network learns which parts of the data are most important to keep, and you get a good understanding of the structure of the data.

I'd say many of the recent great ideas in the field have come from manipulating information flow in interesting ways.

11

currentscurrents t1_j8zh4aa wrote

>scraping all kinds of copyrighted materials and then profiting off the models while the people doing all the labor are getting either nothing (for content generation)

Yeah, but these people won't be doing that labor anymore. Now that text-to-image models have learned how to draw, they don't need a constant stream of artists feeding them new art.

Now artists can now work at a higher level, creating ideas that they can render into images using the AI as a tool. They'll be able to create much larger and more complex projects, like a solo indie artist creating an entire anime.

>LLMs... barely have any legitimate use-cases

Well, one big use case: they make image generators possible. Those rely on embeddings from language models, which are a sort of neural representation of the ideas behind the text. It grants the other network the ability to work with plain english.

Right now embeddings are mostly used to guide generation (across many fields, not just images) and semantic search. But they are useful for communicating with a neural network performing any task, and my guess is that the long-term impact of LLMs will be that computers will understand plain english now.

1

currentscurrents t1_j8op44d wrote

Does it though? There was a reproducibility survey recently that found that many optimizers claiming better performance did not in fact work for anything other than the tasks tested in their papers.

Essentially they were doing hyperparameter tuning - just the hyperparameter was the optimizer design itself.

64

currentscurrents t1_j8em94v wrote

Samsung's working on in-memory processing. This is still digital logic and Von Neumann, but by putting a bunch of tiny processors inside the memory chip, each has their own memory bus they can access in parallel.

Most research on non-Von-Neumann architectures is focused on SNNs. Both startups and big tech are working on analog SNN chips. So far these are proof of concept; they work and achieve extremely low power usage, but they're not at a big enough scale to compete with GPUs.

1

currentscurrents t1_j8c51f0 wrote

...and getting radically improved performance across several important tasks because of calling those APIs.

Plus, calling APIs is very important for integration into real systems because they can trigger real-world actions. Imagine a Siri that calls a bunch of different APIs based on complex instructions you give it.

30

currentscurrents t1_j8agutn wrote

GPU manufacturers are aware of the memory bandwidth limitation, so they don't put in more tensor cores than they would be able to feed with the available memory bandwidth.

>Moving away from transistors, the A100 has 6,912 FP32 CUDA cores, 3,456 FP64 CUDA cores and 422 Tensor cores. Compare that to the V100, which has 5,120 CUDA cores and 640 Tensor cores, and you can see just how much of an impact the new process has had in allowing NVIDIA to squeeze more components into a chip that’s only marginally larger than the one it replaces.

Notice that the A100 actually has less tensor cores than the V100. The tensor cores got faster, but they're still memory bottlenecked, so there's no advantage to having more of them.

3