Viewing a single comment thread. View all comments

nul9090 t1_j9sqmaf wrote

In my view, the biggest flaw of transformers is the fact that they have quadratic complexity. This basically means they will not become significantly faster anytime soon. The context window size will grow slowly too.

Linear transformers and Structured State Space Sequence (S4) models are promising approaches to solve that though.

My hunch it that LLMs should be very useful in the near-term but, in the future, they will be of little value to AGI architecture but I am unable to convincingly explain why.

27

Tavrin t1_j9tns0j wrote

If this is true, the context window of GPT is about to do a big leap forward (32k tokens context window instead of the usual 4k or now 8k). Still I agree with you that actual transformers don't feel like they will be the ones taking us all the way to AGI (still there is a of progress that can still be done with them even without more computing power and I'm sure we'll see them used for more and more crazy and useful stuff)

9

fangfried OP t1_j9ss98n wrote

What do you think would be an upper bound complexity for AGI? Do we need to get it to linear or would nlogn suffice?

2

nul9090 t1_j9th1xg wrote

Well, at the moment, we can't really have any idea. Quadratic complexity is definitely really bad. It limits how far we can push the architecture. It makes it hard to make it on to consumer hardware. But if we are as close to a breakthrough as some people believe maybe it isn't a problem.

6

dwarfarchist9001 t1_j9w8701 wrote

Going from O(n^2) to O(nlog(n)) for context window size let's you have a context window of 1.3 million tokens using the same space needed for GPT-3's 8000 tokens.

3

purepersistence t1_j9viz1k wrote

The post is about LLMs. They will never be AGI. AGI will take AT LEAST another level of abstraction and might in theory be fed potential responses from a LLM, but it's way too soon to say that would be appropriate vs. a whole new kind of model based on more than just parsing text and finding relationships. There's a lot more to the world than text, and you can't get it by just parsing text.

2

tatleoat t1_j9tf9a9 wrote

RWKV has an unlimited context window

2

turnip_burrito t1_j9sr58c wrote

That's a really good point. The Hungry Hungry Hippos and RWVST (still can't remember the acronym :'( ) papers are two good examples of the things you mentioned. Transformers now give the impression of being "cumbersome".

1

[deleted] t1_j9sr5u9 wrote

[deleted]

−1

turnip_burrito t1_j9sr922 wrote

Bad bot.

1

[deleted] t1_j9sr9kr wrote

[deleted]

−1