The "long-term goal" of AI is defined by the author to be "the
creation and understanding of intelligence" [p1]. But this begs
the question: how is "intelligence" defined? Can we find a
general way to characterize intelligent behaviour? If this cannot
be done, we risk the balkanization of AI into a set of related
specializations rather than an integrated field of study [p2].
The author proposes four candidate definitions of
intelligence: perfect rationality, calculative
rationality, metalevel rationality, and bounded
optimality.
The definitions will be compared in terms of the class of
systems that satisfy the definition, and the sorts of research
directions that the definition encourages.
The author advocates the "agent-based" approach to AI. That is, AI
is considered to be the study of systems that "do the right
thing", for some definition of "right" [p3].
An agent is defined by the mapping from sequences of percepts to
actions.
This approach to AI focuses on the agent's behaviour, without
making assumptions about the process used by the agent to
decide on what actions to take: "What counts in the first
instance is what the agent does, not necessarily what it
thinks, or even whether it thinks at all" [p3].
Perfect rationality
A perfectly rational agent always acts in such a way as to have
the maximum expectation of success given the information
available [p4].
Constructing a perfectly rational AI is impossible, regardless of
the speed of computational devices: "physical mechanisms take time
to process information and select actions, hence the behaviour of
real agents cannot immediately reflect changes in the environment
and will generally be sub-optimal" [p6].
Perfect rationality "can be used to establish an upper bound on
the performance of any possible system," by comparing the
performance of the system with the theoretical performance of a
perfectly rational agent in the same environment [p5].
Calculative rationality
"Calculative rationality is displayed by programs that, if
executed infinitely fast, would result in perfectly rational
behaviour" [p6]. This definition of rationality was used for
much of the early work in AI.
Problem: many calculatively rational agents need to solve
intractable decision problems (NP-Hard problems and similar)
[p7].
Correctly solving such problems can require enormous
computational effort; barring an unexpected breakthrough like
P=NP, this is unlikely to change.
This violates the assumption that the program implementing this
behaviour can be executed infinitely fast, even given
enormous improvements in machine resources.
One workaround is to ignore reasoning tasks that require
exponential time (Levesque 1986, Levesque and Brachman 1987). (The
assumption that we can solve polynomial-time problems infinitely
quickly is at least somewhat more plausible.)
The most common solution is to employ various approximations and
domain-specific performance improvements to attempt to achieve
reasonable behaviour [p8]. Much AI research could be described
in this way.
This basically involves accepting that "intelligent agents will
be unable to act rationally in all circumstances" under this
definition of rational behaviour — most of these
techniques do not guarantee optimality [p8].
The "standard methodology" is to "first build calculatively
rational agents and then speed them up" [p22].
The author is skeptical of the value of this technique: the
algorithm we would choose if it could be executed infinitely
fast may bear little or no relation to the optimal algorithm
given finite time and resources [p22].
Metalevel rationality
The preceding definitions of intelligence do not take into
account the time required by the agent to select a rational
action: they assume that the agent will choose the "right" action
infinitely fast.
This is inconsistent with many realistic environments. The time
that the agent takes to choose an action may well be non-trivial,
and the time taken to make this decision often influences the
outcome. Metalevel rationality is defined by Good 1971 as "the
maximization of expected utility taking into account
deliberation costs".
Metalevel rationality is typically implemented via a metalevel
architecture [p8], where the agent is divided into two
components:
The "object level carries out computations concerned
with the application domain" [p8]. These are the computations
that allow the agent to perform whatever actions it is intended
to perform.
The "metalevel is a second decision-making process whose
application domain consists of the object-level computations
themselves and the computational objects and states that they
affect" [p8].
In the metalevel, object-level computations are viewed as
actions: their cost is the time taken to perform the action, and
the benefit is whatever improvement in decision quality the
action yields. Reasoning at the metalevel includes reasoning
about how long to spend making the decision, how to allocate the
available time among the available computational actions, and how
to order those actions.
One area of debate is whether metareasoning should incorporate
domain-specific knowledge and reasoning techniques.
In systems with domain-specific meta-reasoning, "metaknowledge
is a sort of 'extra' domain knowledge ... that one has to add
to an AI system to get it to work well" [p10].
However, we can reason about the cost and effect of
computations in a domain-independent fashion: "in principle, no
additional domain knowledge is needed to assess the benefits of
a computation" [p10].
Performing metareasoning from first principles is extremely
expensive, so most practical systems incorporate
domain-specific knowledge in some form. It might also be
possible to learn this knowledge from experiences with
computations in a particular application domain.
There is much room for future work in this area. For example,
current metalevel systems are myopic: they perform
metareasoning for a single computational action at a time. This
is not always ideal: "the value of a computation may not be
apparent as an improvement in decision quality until further
computations have been done" [p10].
The problem of selecting the optimal sequence of computational
actions is usually intractable — the author suggests that
machine-learning techniques might be employed to workaround
this.
Another problem with metalevel reasoning is that it does not
account for the time taken to perform metareasoning itself. "Since
metareasoning is expensive, it cannot be carried out optimally";
therefore, a tradeoff between time and optimality of metareasoning
must be made [p10].
This tradeoff cannot be made by introducing a "metametalevel",
as that raises the question of how much metametareasoning ought
to be done. Introducing more levels of reasoning just causes
"conceptual regress" [p11].
Bounded optimality
Bounded optimality was informally defined by Horvitz 1989 as "the
optimization of computational utility given a set of assumptions
about expected problems and constraints on resources" [p11].
These assumptions are defined by the machine the program is
running on.
This definition builds on the complementary notion of
feasability, as defined by Russell and Subramanian
1995. The set of all agent functions that can be implemented
by some program running on a specified machine is the set of
feasible agents for that machine. The best agent from the set
of feasible agents behaves with "bounded optimality" for the
specified machine [p11].
One problem with bounded optimality results is that they are
specific to a single formal machine [p13].
This problem is analogous to that faced in the analysis of
algorithm performance: we typically want to prove performance
results about algorithms that are independent of the details of
a particular machine configuration. This is typically achieved
using the O() notation.
Russell and Subramanian 1995 introduce a similar idea to allow
the construction of robust bounded optimality analysis.
Worst-case asymptotic bounded optimality (ABO)
means roughly that the agent program "is basically along the
right lines if it just needs a faster (larger) machine to have
worst-case behavior as good as that of any other program in all
environments" [p13].
This is the definition of intelligence preferred by the author.
Future work
The remainder of the paper sketches some areas for future research
related to bounded optimality and ABO agents.
Calculative rationality improvements
Much of the prior work on calculative rationality has assumed that
the environment is fully observable, deterministic, static, and
discrete.
While there has been some work for defining rational behaviour in
more uncertain domains (such as POMDPs for stochastic,
partially-observable environments), the author suggests that more
fundamental work is needed here. Defining calculatively rational
behaviour for a more general class of environments would be the
"AI
equivalent of bricks, beams, and mortar with which AI architects
can build the equivalent of cathedrals" [p14].
ABO proofs
There is a crude methodology for proving that agent programs are
ABO: [p16]
Define a virtual machine that can execute a set of agent
programs.
The programs are divided into subsets — in current
research, agents are often divided into "architectural classes"
based on their structural properties [p20].
Comparisons can be made between the best programs in each
subset. For example, one class of agents might be shown to be
asymptotically dominated by another class [p20].
For practical problems, this is difficult to do, and yields
results that are too specific to the details of the proof
methodology. A research goal is to enable bounded optimality
results that are more general [p17].
Optimization techniques
Earlier work has identified some techniques for optimizing the
temporal behaviour of an agent's components (e.g. Zilberstein and
Russell 1996). While these techniques are currently somewhat
primitive, the important point is that they are domain
independent: they are unrelated to the content of the components
they are optimizing [p17].
Therefore, these techniques "enable bounded optimality results
that do not depend on the specific temporal aspects of the
environment class" -- that is, they are domain-independent tools
for building ABO agents [p17].
Another way to optimize an agent's temporal behaviour is via
machine learning techniques such as inductive learning,
reinforcement learning, and knowledge compilation (Russell and
Subramanian 1995, Laird et al. 1986).
Offline vs. online
An offline mechanism for creating bounded-optimal agents is
independent of the agent itself (and is therefore not subject to
bounded optimality constraints). A typical offline agent
constructor would create agents that are ABO for a specific
environment [p19].
In online agent construction, the construction mechanism is
a part of the agent itself. This allows the agent to be more
flexible: an agent can be constructed that is bounded optimal for
whatever environment the agent finds itself in [p19].
Variable agent architecture
Agents that learn can adapt their behavior, but prior research on
adaption has usually applied learning techniques only to select
components of the agent. Given finite memory, there are problems
where "almost complete revision of the agent's mental structure is
needed to achieve good performance" [p20] -- that is, learning
techniques would be applied to the entirety of the agent's
architecture.