farmingvillein t1_j6nxa0i wrote
Reply to comment by ezelikman in [R] Parsel: A (De-)compositional Framework for Algorithmic Reasoning with Language Models - Stanford University Eric Zelikman et al - Beats prior code generation sota by over 75%! by Singularian2501
> If you generate 10 complete implementations, you have 10 programs. If you generate 10 implementations of four subfunctions, you have 10,000 programs. By decomposing problems combinatorially, you call the language model less
Yup, agreed--this was my positive reference to "the big idea". Decomposition is almost certainly very key to any path forward in scaling up automated program generation in complexity, and the paper is a good example of that.
> Parsel is intentionally basically indented natural language w/ unit tests. There's minimal extra syntax for efficiency and generality.
I question whether the extra formal syntax is needed, at all. My guess is, were this properly ablated, it probably would not be. LLMs are--in my personal experience, and this is obviously born out thematically--quite flexible to different ways in representing, say, unit input and outputs. Permitting users to specify in a more arbitrary manner--whether in natural language, pseudocode, or extant programming languages--seems highly likely to work equally well, with some light coercion (i.e., training/prompting). Further, natural language allows test cases to be specified in a more general way ("unit tests: each day returns the next day in the week, Sunday=>Monday, ..., Saturday=>Sunday") that LLMs are well-suited to work with. Given LLM's ability to pick up on context and apply it, as well, there is a good chance that free-er form description of test cases are likely to drive improved performance.
If you want to call that further research--"it was easier to demonstrate the value of hierarchical decomposition with a DSL"--that's fine and understood, but I would call it out as a(n understandable) limitation of the paper and an opportunity for future research.
Viewing a single comment thread. View all comments