A simple mental model of how a consequentialist agent works, which I find to be a useful tool for thinking about general AI systems, is to break it down into modules that each do separate tasks: search, prediction, and evaluation. I find this disambiguation makes it easier to think about future systems, even though I expect it is more likely in practice for AGI to be built as an amalgamation of these modules.
The way this system works to form an agent is fairly simple. For a given input, the search module generates possible actions the agent could take, the prediction module estimates the outcomes for each action, and the evaluation module determines how good the outcomes are. From there, either the evaluations are used to select an action for implementation, or the process is restarted with the search module using the information from the other two modules to generate new actions.
The first module, search, is the critical component of agency that necessitates the other two modules. We could easily imagine a powerful non-consequentialist agent that avoids doing search, functioning like a giant pile of heuristics or a lookup table that takes an input and instinctively reacts with an output. This would be analogous to Daniel Kahneman’s System 1 thinking or Abram Demski’s Control (as opposed to System 2 thinking and Selection respectively). That agent has no need to know the consequences of the action it selects, rather the process that creates the agent determines whether the instincts are good. Many people, including myself, believe that current state of the art systems like GPT-4 and Claude 3 are close to this kind of agent, with the question of whether future systems will be consequentialist being substantially more controversial.
If an agent is doing a search over actions, then it has to be actively considering and comparing multiple options. This requires predicting and evaluating their outcomes, which is where the latter two modules come in. However, even if those other modules are flawless and the agent knows exactly how much it will like any action it can think of, there’s no guarantee it can identify the globally best option. That’s why I think of the search module as the agent’s creativity – if it’s not doing search then there’s no creativity involved, and the more effectively it searches the more it can come up with creative actions.
The second module, prediction, is what I think of as the core of intelligence. It requires a causal understanding of the world to predict how hypothetical actions the agent could take will change it. When people talk about an agent's world model, that is equivalent to its predictive ability. Any understanding that helps it make better predictions must be reflected in the world model, and all information in the world model influences at least some predictions. In practice, an agent’s predictions likely involve some uncertainty, providing a distribution over outcomes as output rather than being completely confident in any one.
The final module, evaluation, is the most straightforward. It contains preferences, which are used to evaluate the outcomes or distributions over outcomes generated by the prediction module. These preferences can also be thought of as the agent’s goals.
So, why is this mental model useful? What insights does the modular breakdown suggest? I would frame it in contrast to the standard view, which breaks an optimizer down into a world model plus a goal. Then the first takeaway here is that search is a distinct component, and if search is not occurring then there may well not be a world model or goal either. The second takeaway is that doing prediction is equivalent to having a world model, which motivates the study of predictive models. While neither one of these points is completely novel, both are important in how I think about AI safety, which gives me hope that thinking in terms of the three modules may be a useful tool when considering other questions as well.
May I ask why you wrote this? Guesses:
- To find others with a similar view to collaborate with
- Because you think this model should propogate
If the first is true, I have a similar view. I've been thinking about this a lot ever since an experience with gpt-4-base left me astounded at its apparent ability to infer traits of authors and forced me to decouple intelligence from 'goals over the world' or 'agency'. I've also been theorizing about training processes which could select for purely-'intelligence/creativity'-type systems without goals. I'm a fan of the predictive models agenda and evhub's posts.
I haven't published anything yet since it's hard for me to write + exfohazard concerns, but I've been meaning to find some collaborators to make progress with. If this interests you, here's my contact info: {discord: `quilalove`, matrix: `@quilauwu:matrix.org`, email: `quila1@protonmail.com`}. (I prefer text / can't do calls)