currentscurrents t1_j9gp4uq wrote on February 21, 2023 at 9:15 PM

Reply to [D] Bottleneck Layers: What's your intuition? by _Arsenie_Boca_

> From an information theory standpoint, it creates potential information loss due to the lower dimensionality.

Exactly! That's the point.

The bottleneck forces the network to throw away the parts of the data that don't contain much information. It learns to encode the data in an information-dense representation so that the decoder on the other side of the bottleneck can work with high-level ideas instead of pixel values.

If you manually tweak the values in the bottleneck, you'll notice it changes high-level ideas in the data like the gender or shape of a face, not pixel values. This is how autoencoders work; a unet is basically an autoencoder with skip connections.

Interestingly, biological neural networks that handle feedforward perception seem to do the same thing. Take a look at the structure of an insect antenna; thousands of input neurons bottleneck down to only 150 neurons, before expanding again for processing in the rest of the brain.

currentscurrents t1_j9g3sj6 wrote on February 21, 2023 at 5:54 PM

Reply to [R] ChatGPT for Robotics: Design Principles and Model Abilities by CheapBreakfast9

Interesting! I feel like one of the biggest uses for LLMs will be controlling other systems using plain english instructions.

currentscurrents t1_j99ivt8 wrote on February 20, 2023 at 7:41 AM

Reply to [P] I've been commissioned to make 1000+ variations of my unique geometric art, while retaining its essential characteristics. It's been suggested that I use GAN to create permutations of my art. Any advice/directions? by eternalvisions

You could definitely do this with StableDiffusion embeddings.

currentscurrents t1_j99iq9v wrote on February 20, 2023 at 7:39 AM

Reply to comment by buyIdris666 in [D] what are some open problems in computer vision currently? by Fabulous-Let-822

Video has even less information density, since frames are similar to each other! Video codecs can get crazy compression rates like 99% on slow-moving video.

But you still have to process a lot of pixels, so text-to-video generators are held back by memory requirements.

currentscurrents t1_j98uu8u wrote on February 20, 2023 at 3:27 AM

Reply to comment by NotARedditUser3 in [P] Looking to use Chat-GPT for your business? Data-Centric Fine-Tuning Is All You Need! by Only-Caterpillar4057

There's been a lot of SEO garbage posted around here lately. I think there's really only one active mod, so it's basically a free-for-all if he isn't online at the moment.

currentscurrents t1_j97v09x wrote on February 19, 2023 at 10:43 PM

Reply to comment by Cheap_Meeting in [R] [N] In this paper, we show how a conversational model, 3.5x smaller than SOTA, can be optimized to outperform the baselines through Auxiliary Learning. Published in the ACL Anthology: "Efficient Task-Oriented Dialogue Systems with Response Selection as an Auxiliary Task." by radi-cho

In Bulgaria, no less.

currentscurrents t1_j96vbfj wrote on February 19, 2023 at 6:34 PM

Reply to comment by blablanonymous in [D] What are the worst ethical considerations of large language models? by BronzeArcher

Microsoft has confirmed the rules are real:

>We asked Microsoft about Sydney and these rules, and the company was happy to explain their origins and confirmed that the secret rules are genuine.

The rest, who knows. I never got access before they fixed it. But there are many screenshots from different people of it acting quite unhinged.

currentscurrents t1_j96pkvw wrote on February 19, 2023 at 5:54 PM

Reply to comment by buyIdris666 in [D] what are some open problems in computer vision currently? by Fabulous-Let-822

> The models are larger because there's maybe 100x the information in a high res image than a paragraph of text.

That's actually not true. Today's LLMs are 175B parameters, Stable Diffusion is 890 million.

Images contain a lot of pixels, but most of those pixels are easy to predict and don't contain much high-level information. A paragraph of text can contain many complex abstract ideas, while an image usually only contains a few objects with simple relationships between them.

In many image generators (like Imagen), the language model they use to understand the prompt is several times bigger than the diffuser they use to generate the image.

currentscurrents t1_j96n0v8 wrote on February 19, 2023 at 5:37 PM

Reply to comment by stringerbell50 in [D] what are some open problems in computer vision currently? by Fabulous-Let-822

Isn't that doing pretty good these days? CNNs can not only segment, but even semantically label every pixel in an image.

On a practical level, I have used Photoshop's new object select and love it. It does a better job at masking than I do.

currentscurrents t1_j90hs2i wrote on February 18, 2023 at 7:54 AM

Reply to comment by Phoneaccount25732 in [D] Formalising information flow in NN by bjergerk1ng

Yeah, the skip connections allow higher layers to have access to information from lower layers. Same thing goes for U-Nets; they're basically an autoencoder with skip connections.

currentscurrents t1_j8zz4n3 wrote on February 18, 2023 at 4:20 AM

Reply to comment by BronzeArcher in [D] What are the worst ethical considerations of large language models? by BronzeArcher

Look at things like replika.ai that give you a "friend" to chat with. Now imagine someone evil using that to run a romance scam.

Sure the success rate is low, but it can search for millions of potential victims at once. The cost of operation is almost zero compared to human-run scams.

On the other hand, it also gives us better tools to protect against it. We can use LLMs to examine messages and spot scams. People who are lonely enough to fall for a romance scam may compensate for their loneliness by chatting with friendly or sexy chatbots.

currentscurrents t1_j8zy3m4 wrote on February 18, 2023 at 4:10 AM

Reply to comment by tornado28 in [D] What are the worst ethical considerations of large language models? by BronzeArcher

In the modern economy the best way to make a lot of money is to make a product that a lot of people are willing to pay money for. You can make some money scamming people, but nothing close to the money you'd make by creating the next iphone-level invention.

Also, that's not a problem of AI alignment, that's a problem of human alignment. The same problem applies to the current world or the world a thousand years ago.

But in a sense I do agree; the biggest threat from AI is not that it will go Ultron, but that humans will use it to fight our own petty struggles. Future armies will be run by AI, and weapons of war will be even more terrifying than now.

currentscurrents t1_j8zwnht wrote on February 18, 2023 at 3:57 AM

Reply to comment by NotARedditUser3 in [D] What are the worst ethical considerations of large language models? by BronzeArcher

It depends on whether it's exploiting my psychology to sell me something I don't need, or if it's gathering information to find something that may actually be useful for me. I suspect the latter is a more useful strategy in the long run because people tend to adjust to counter psychological exploits.

If I'm shown an advertisement for something I actually want... that doesn't sound bad? I certainly don't like ads for irrelevant things like penis enlargement.

currentscurrents t1_j8zugnd wrote on February 18, 2023 at 3:37 AM

Reply to comment by tornado28 in [D] What are the worst ethical considerations of large language models? by BronzeArcher

The lucky thing is that neural networks aren't evil by default; they're useless and random by default. If you don't give them a goal they just sit there and emit random garbage.

Lack of controllability is a major obstacle to the usability of language models or image generators, so there's lots of people working on it. In the process, they will learn techniques that we can use to control future superintelligent AI.

currentscurrents t1_j8zs6o6 wrote on February 18, 2023 at 3:17 AM

Reply to comment by tornado28 in [D] What are the worst ethical considerations of large language models? by BronzeArcher

We will control what the robots want, because we designed them.

That's the core of AI alignment; controlling the AI's goals.

currentscurrents t1_j8zq4tn wrote on February 18, 2023 at 2:59 AM

Reply to comment by [deleted] in [D] Formalising information flow in NN by bjergerk1ng

>I wouldn’t say it’s common to design networks with information flow in mind

I disagree. The entire point of the attention mechanism in transformers is to have a second neural network to control the flow of information.

Similarly, the autoencoder structure is ubiquitous these days, and it's based around the idea of forcing information to flow through a bottleneck. Some information must be thrown away, so the neural network learns which parts of the data are most important to keep, and you get a good understanding of the structure of the data.

I'd say many of the recent great ideas in the field have come from manipulating information flow in interesting ways.

currentscurrents t1_j8zi84t wrote on February 18, 2023 at 1:53 AM

Reply to comment by tornado28 in [D] What are the worst ethical considerations of large language models? by BronzeArcher

>Disruptive applications will take jobs. Customer service, content creation, journalism, and software engineering are all fields that may lose jobs as a result of large language models.

I don't wanna work though. I'm all for having robots do it.

currentscurrents t1_j8zh4aa wrote on February 18, 2023 at 1:44 AM

Reply to comment by prehensile_dick in [D] What are the worst ethical considerations of large language models? by BronzeArcher

>scraping all kinds of copyrighted materials and then profiting off the models while the people doing all the labor are getting either nothing (for content generation)

Yeah, but these people won't be doing that labor anymore. Now that text-to-image models have learned how to draw, they don't need a constant stream of artists feeding them new art.

Now artists can now work at a higher level, creating ideas that they can render into images using the AI as a tool. They'll be able to create much larger and more complex projects, like a solo indie artist creating an entire anime.

>LLMs... barely have any legitimate use-cases

Well, one big use case: they make image generators possible. Those rely on embeddings from language models, which are a sort of neural representation of the ideas behind the text. It grants the other network the ability to work with plain english.

Right now embeddings are mostly used to guide generation (across many fields, not just images) and semantic search. But they are useful for communicating with a neural network performing any task, and my guess is that the long-term impact of LLMs will be that computers will understand plain english now.

currentscurrents t1_j8qa46s wrote on February 16, 2023 at 4:17 AM

Reply to comment by Red-Portal in [D] Lion , An Optimizer That Outperforms Adam - Symbolic Discovery of Optimization Algorithms by ExponentialCookie

This is a hand-designed optimizer. By definition, learned optimizer researchers would rather we learn an optimizer than hand-design one.

Learned optimizers are probably the future, but the compute budget required to create one is prohibitive.

currentscurrents t1_j8p83n7 wrote on February 15, 2023 at 11:26 PM

Reply to comment by CabSauce in [D] Is anyone working on ML models that infer and train at the same time? by Cogwheel

>Distributed models would have to be updated. How do we update weights from two sources? (There might be options for this, I haven't looked.)

This sounds like federated learning.

currentscurrents t1_j8op44d wrote on February 15, 2023 at 9:19 PM

Reply to [D] Lion , An Optimizer That Outperforms Adam - Symbolic Discovery of Optimization Algorithms by ExponentialCookie

Does it though? There was a reproducibility survey recently that found that many optimizers claiming better performance did not in fact work for anything other than the tasks tested in their papers.

Essentially they were doing hyperparameter tuning - just the hyperparameter was the optimizer design itself.

currentscurrents t1_j8em94v wrote on February 13, 2023 at 7:15 PM

Reply to comment by That_Violinist_18 in The Inference Cost Of Search Disruption – Large Language Model Cost Analysis [D] by norcalnatv

Samsung's working on in-memory processing. This is still digital logic and Von Neumann, but by putting a bunch of tiny processors inside the memory chip, each has their own memory bus they can access in parallel.

Most research on non-Von-Neumann architectures is focused on SNNs. Both startups and big tech are working on analog SNN chips. So far these are proof of concept; they work and achieve extremely low power usage, but they're not at a big enough scale to compete with GPUs.

currentscurrents t1_j8c51f0 wrote on February 13, 2023 at 5:07 AM

Reply to comment by TheRealMichaelScoot in [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho

...and getting radically improved performance across several important tasks because of calling those APIs.

Plus, calling APIs is very important for integration into real systems because they can trigger real-world actions. Imagine a Siri that calls a bunch of different APIs based on complex instructions you give it.

currentscurrents t1_j8agutn wrote on February 12, 2023 at 9:36 PM

Reply to comment by That_Violinist_18 in The Inference Cost Of Search Disruption – Large Language Model Cost Analysis [D] by norcalnatv

GPU manufacturers are aware of the memory bandwidth limitation, so they don't put in more tensor cores than they would be able to feed with the available memory bandwidth.

>Moving away from transistors, the A100 has 6,912 FP32 CUDA cores, 3,456 FP64 CUDA cores and 422 Tensor cores. Compare that to the V100, which has 5,120 CUDA cores and 640 Tensor cores, and you can see just how much of an impact the new process has had in allowing NVIDIA to squeeze more components into a chip that’s only marginally larger than the one it replaces.

Notice that the A100 actually has less tensor cores than the V100. The tensor cores got faster, but they're still memory bottlenecked, so there's no advantage to having more of them.

currentscurrents t1_j86uoft wrote on February 12, 2023 at 2:01 AM

Reply to comment by eigenham in [D] Can Google sue OpenAI for using the Transformer in their products? by t0t0t4t4

Yeah, Alice Corp. v. CLS Bank significantly limited the scope of software patents. It ruled that adding "on a computer" to an abstract idea does not itself make it patentable.

I believe that true inventions of real algorithms (like movie codecs) are still patentable though.