Submitted by IluvBsissa t3_11e7csf in singularity
We "know" GPT-4 may have a context window of 32 000 tokens at least. That's enough for writing a 50 pages short story or a small program.
How many token do we need for it to generate huge programs like a new social media, web navigator, specialized corporate software or even a whole operating system ? That's around 20-50 million lines of code max.
Of course, there might be a huge loss of consistency, and a large amount of bugs, but that's just a thought game.
When we are at it, how many tokens is your average line of code in LLM context ?
It's just that I really want to emulate some non-free browser extensions that help me in my studies, and it's probably around a million lines of code.
Thank you.
EDIT : Assuming 32 000 tokens = 50 pages = 1250 lines of code, that means we need 40 000 x bigger context window for generating 50 million lines of code, so around 1 280 000 000 tokens. That's quite a lot.
basilgello t1_jacon0o wrote
Software is architecture defined in code. Minimal common sense reasoning is definitely not enough to write and maintain huge software codebases. And LLMs pass even these reasoning tests with "n-th order of understanding". Writing snippets is one thing, but forward-and reverse-engineering of complex problems is another because the number of possible ways to achieve the same result grows exponentially, but evaluating the optimality of each solution is another task different from what LLM does.