Comments

You must log in or register to comment.

belacscole t1_j8bf6ol wrote

I wonder if this is the ultimate path to reaching general intelligence. After all, humans evolved by learning to master tools.

108

big_gondola t1_j8biqv4 wrote

I might say we gain general intelligence by creating different models for different tasks and gain experience on when to call which. This has the when to call which, but not the creation of new models.

48

diviludicrum t1_j8bxeji wrote

I still think u/belacscole is right - this is analogical to the rudimentary use of tools, which can be done by some higher primates and a small handful of other animals. Tool use requires a sufficient degree of critical thinking to recognise a problem exists and select the appropriate tool for solving it. If done with recursive feedback, this would lead to increasingly skilful tool selection and use over time, resulting in better detection and solution of problems over time. Of course, if a problem cannot possibly be solved with the tools available, no matter how refined their usage is, that problem would never be overcome this way - humans have faced these sorts of technocultural chokepoints repeatedly throughout our history. These problems require the development of new tools.

So the next step in furthering the process is abstraction, which takes intelligence from critical thinking to creative thinking. If a tool-capable AI can be trained on a dataset that links diverse problems with the models that solve those problems and the process that developed those models, such that it can attempt to create and then implement new tools to solve novel problems, then assess its own success (likely via supervised learning, at least at first), we may be able to equip it with the “tool for making tools”, such that it can solve the set of all AI-solvable problems (given enough time and resources).

41

currentscurrents t1_j8c51f0 wrote

...and getting radically improved performance across several important tasks because of calling those APIs.

Plus, calling APIs is very important for integration into real systems because they can trigger real-world actions. Imagine a Siri that calls a bunch of different APIs based on complex instructions you give it.

30

sloganking t1_j8cculc wrote

It's not just calling APIs. This model is independently teaching itself how to use new APIs and when to use them. The process is pretty much the same for any API, and doesn't require much extra effort by the programmer to add a new one.

This paper also states it is one of the first to have models learn to use APIs in an unsupervised way, meaning they teach themselves instead of relying on a ton of human annotated data.

22

robotix_dev t1_j8cekxc wrote

I’ve long thought this is the next stepping stone in the path the path to AGI. The next big step IMO is dynamic, online model augmentation to enable learning new concepts.

Both of those combined seem like a basic approximation of what goes on in our brain.

4

Varpie t1_j8cftrx wrote

I'm surprised this hasn't been done before. This paper mostly cites works from the last 2-3 years, but surely, something similar was done previously (maybe not using the same kind of model)? In fact, isn't it pretty close to what search engines do to provide instant results when given an equation or an address for instance? Does anyone know of such work?

4

Taenk t1_j8ckvh2 wrote

Now what if the tool the LLM uses is the training API for itself …

39

drcopus t1_j8cn7av wrote

It would be interesting if it learned which API to use from a description of the API so as to allow it to generalise to new ones!

8

leepenkman t1_j8co3gr wrote

Also checkout https://text-generator.io its a multi modal model so visits any input links, downloads web pages and images are analyzed with NNs to make better text.

Also does speech to text/text to speech so can talk

As many have said lots of these things will likely/hopefully come together into something big, needs a few things like the when to train new tools/model zoo thing, but internally Text Generator is based on multiple models too and has some internal decision making for which model is best on every request (so you dont need to pick a code/text model it does it automatically) which is similar but it's not training new nets.

−3

clex55 t1_j8d2oe3 wrote

The next step must be creating and programming those tools and incorporating them on the fly.

4

swegmesterflex t1_j8d3t4r wrote

Had this idea and was planning to play around with it when I had more free time. Good to see some evidence it’s a promising direction. I speculate you can actually get a LOT out of this if you’re clever with it. A tool for long term memory could be done by having a lookup table with text embeddings as keys. A tool for vision could be made with an image captioning model + maybe some segmentation to get a richer text description of the image. Many more things you could come up with, that I think could work well if you find some clever way of turning them into text.

14

yashdes t1_j8d8lf9 wrote

I've definitely wondered about this exact thing myself, especially when talking to chatgpt when it responds with insert x here, why couldn't that just be taken out and replaced with the appropriate API call

4

Despacereal t1_j8d971u wrote

In a way yes. I think general intelligence (consciousness in most animals) developed evolutionarily to manage a wide variety of sensory inputs and tasks, and to bridge the gaps between them.

As we develop more individual areas of AI, we will naturally start to combine them to create more powerful programs, such as Toolformer combining the strengths of LLMs and other models. Once we have these connections between capabilities, it should be easier to develop new models that learn these connections more deeply and can do more things.

Some of the things that set us apart from other animals are our incredible language and reasoning capabilities which allow us to understand and interact with an increasingly complex world and augment our capabilities with tools. The perceived understanding that LLMs display using only patterns in text is insane. Combine that with the pace of developments in Chain of Thought reasoning, use of Tools, other areas handling visuals, sound, and motion, and multimodal AI, and the path to AGI is becoming clearer than the vision of a MrBeast™ cataracts patient.

1

uristmcderp t1_j8db0gw wrote

The whole assessing its own success is the bottleneck for most interesting problems. You can't have a feedback loop unless it can accurately evaluate if it's doing better or worse. This isn't a trivial problem either, since humans aren't all that great at using absolute metrics to describe quality, once past a minimum threshold.

9

Varpie t1_j8ducbj wrote

Interesting, though it is from October 2022, still very recent. I'm guessing using transformers for it is a recent approach, but I'm curious about the previous approaches, which this paper doesn't talk about.

1

extracensorypower t1_j8e1azu wrote

Every tool except Jira, of course. Nothing sentient could figure that out.

19

bballerkt7 t1_j8e6l5f wrote

Because AI being able to use APIs is a big step towards it being able to interact with the real world effectively, specifically the digital world. Imagine chatgpt being able to now do things for you in the digital world like go online shopping for you or trade stocks etc.

2

pyepyepie t1_j8e7gjp wrote

Thanks :) I agree it's useful but I don't see how it's related to AGI. Additionally, it was already done a long time ago, many "AI" agents used the internet before. I feel that the real challenge is to control language models using structured data, perform planning, etc., not to use language models to interact with the world (which seems trivial to me, sorry), but of course, it's just my opinion - which is probably not even that smart.

6

ksatriamelayu t1_j8ebhn4 wrote

Keep in mind that our current theories in Neuroscience broadly agrees something similar is going on with mammalian, even reptilian brains. Hell, maybe even worm brains.

There's autonomous systems everywhere that calls each other for updates and in some certain brains, enough complexity that something that can called thinking occurs.

Practically, offloading calculations to a python REPL, machine translation to GTranslate API call, and knowledge search to Wikipedia corpus is going to let LLMs do what they do best - mask users intent and generate believable enough corpus. Let the facts stay factual and the hallucination stay hallucination.

5

urbanfoh t1_j8elywk wrote

Isn't it almost certainly possible due to the universal approximation theorem?

Assuming consciousness is a function of external variables a large enough network with access to these variables should be able to approximate consciousness.

1

UnderstandingDry1256 t1_j8ev9bx wrote

An obvious idea is to connect gpt to browser api and let it go and learn 😄

2

cd_1999 t1_j8fmlej wrote

Have you heard of Searle's Chinese Room?

Some people (sorry I can't give you references off the top of my head) argue there's something special about the biological nervous system, so the material substrate is not irrelevant. (Sure you could reverse engineer the whole biological system, but that would probably take much longer).

4

farmingvillein t1_j8frv87 wrote

> not to use language models to interact with the world (which seems trivial to me, sorry),

The best argument here is that "true" intelligent requires "embedded" agents, i.e., agents that can interact with our (or, at least, "a") world (to learn).

Obviously, no one actually knows what will make AGI work, if anything...but it isn't a unique/fringe view OP is suggesting.

1

Soundwave_47 t1_j8fu3r6 wrote

Somewhat, and no.

We generally define AGI as an intelligence (which, in the current paradigm, would be a set of algorithms) that has decision making and inference capabilities in a broad set of areas, and is able to improve its understanding of that which it does not know. Think of it like school subjects, it might not be an expert in all of {math, science, history, language, economics}, but it has some notion of how to do basic work in all of those areas.

This is extremely vague and not universally agreed upon (for example, some say it should exceed peak human capabilities in all tasks).

1

VelveteenAmbush t1_j8fusa5 wrote

> I feel that the real challenge is to control language models using structured data, perform planning, etc.

I think the promise of tool-equipped LLMs is that these tools may be able to serve that sort of purpose (as well as, like, being calculators and running wikipedia queries). Could imagine an LLM using a database module as a long-term memory, to keep a list of instrumental goals, etc.. You could even give it access to a module that lets it fine-tune itself or create successor LLMs in some manner. All very speculative of course.

3

SleekEagle t1_j8ix4fz wrote

Authors publish papers on research, experiments, findings, etc. They do not always release the code for the models they are studying.

The lucidrains' repos implement the models, creating an open-source implementation for the research

The next step would then be to train the model, which requires a lot more than just the code (most notably, money). I assume you're referring to these trained weights when you say "the needed AI model". Training would require a huge amount of time and money for a team, never mind a single person, to train even one of these models let alone a whole portfolio of them

For this reason, it's not very reasonable to expect lucidrains or any other person to train these models - the open-source implementations are a great contribution on their own!

9

Ok-Variety-8135 t1_j8l9g5j wrote

If we treat the output of transformer as inner monolog and only perform real output when it calls <action> say: something </action>.

It can speak proactively, and hiding their inner thought, just like human does.

2

SnooStories4137 t1_j8lrsug wrote

Some reinforcement learning like algorithm seems like really interesting next step here. Observation = task (like qa or mask filling), actions = api call where the output updates the observation via concatenation as in the paper, environment is apis and database and python installation etc, state is network weights, reward is loss function before and after update to observation.

I feel like even if the only api is just generating text using itself to update the observation ('to help itself think') intuitively seems like it could help for some things. Rather than try to fill in the mask right away, it might recognize better to first 'think a little' to update its working memory (which is of course the observation here).

1

dgrsmith t1_j8n4kwe wrote

From a cognitive point of view, humans and animals have modules that they rely on for certain tasks. For Human Neuropsych assessment, the combination of the function of these modules gives you a score for general intelligence, with each module contributing toward the whole. Having a removed or changed “module” for one reason or another will sometimes cause localized task failures (e.g., neurodegenerative disease or brain injury) or approach to tasks that is atypical (e.g., atypical brain development). Maybe we can think of specific cognitive functions as being API calls to a modules in this “tool use” paradigm? This is likely not an original thought, and if anyone has references or has heard of this idea, please let me know!

1

MysteryInc152 t1_j8ppoiq wrote

I'd rather the basic senses at least (vision as well as audio) be pretrained as well. We know from Multimodal chain of thought as well as scaling laws for generative mixed modal language models that multimodal models far outperform single modal models on the same data and scale. You won't get that kind of performance gain leveraging those basic senses to outside tools.

https://arxiv.org/abs/2302.00923

https://arxiv.org/abs/2301.03728

2