http://www.jimdavies.org/summaries/
Matrić,
M.
J. (1997). Behaviour-based
control: examples
from
navigation, learning,
and
group
behaviour.
Journal of Experimental & Theoretical Artificial
Intelligence, 9 (2-3),
323-336.
@Article{Mataric1997,
author = {Matarić, Maja M.},
title = {Behavior-Based
Control: Examples from Navigation, Learning, and Group Behavior},
journal = {Journal of Experimental
&
Theoretical
Artificial Intelligence},
year = {1997},
volume = {9},
number = {2-3},
pages = {323--326}
}
Author
of
the
summary:
Stephen
Jones,
2011,
steve@boutiquepsychology.net
Cite
this paper for:
- Demonstrations
of
a
selection
of
behaviour-based
agent
control
architectures
applied
to
navigation, path finding, multiple agent behaviour, and behaviour
learning.
- Comparative
summaries of deliberative, reactive, hybrid, and behaviour-based
approaches to control
- ‘Behaviour’ in behaviour-based
systems must be defined according to implementation
required by each problem environment [325]
- Behaviour-based
approach fundamentally different than reactive approach.
[325]
- Constraints on
definition of behaviour-based design.[325]
- Whether any clear advantage exists between behaviour-based and
other architecture in a given comparison depends on
how the behaviours are designed and applied. [326]
- SYSTEM:
Toto is a behaviour-based, non-hybrid system featuring both real-time
reaction and higher level reasoning. [326]
- SYSTEM: Nerd Herd
demonstrates that local, behaviour-based control of
multi-agent systems scales better to large groups and outperforms
centralized control systems. [330]
- Current
learning in autonomous agents is prohibitively slow due to the common
use of large numbers of simple ‘reactions’ as the basic
representational unit.[332]
- SYSTEM: Don Group
decreases learning time compared to reactive systems
by using comparatively fewer, more complex behaviours as the basic
representational unit [332], and deriving feedback from internal
reinforcement systems. [333]
- The results of the Toto and Nerd Herd experiments suggest that
centralized behaviour control is
unnecessary in behaviour co-ordination.[334]
- Methods
for the automation of basis behaviour set selection through genetic
learning are currently being explored. [334]
The full article can be found at
http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.41.5055&rep=rep1&type=pdf
Introduction
System control architectures fall into deliberative, reactive,
deliberative-reactive hybrid, or behaviour-based categories.
Each architecture provides unique principles for control system
organization, and constraints on how problems may be approached. [323]
Deliberative, Reactive, and Hybrid
Approaches
Deliberative (plan-based) approaches are ‘top down’ systems, performing
centralized operations on centralized representations of world
knowledge, such as environment maps.
If the problem space is known at design-time, tasks can be explicitly
sequenced, and performance easily evaluated.
However, deliberative systems are criticized as performing poorly in
unpredictable, dynamic environments, since replanning has a prohibitive
processing cost where possible at all. [323]
Reactive systems have been favored where task situations require
real-time response to environments.
The systems use minimal internal representations (often tables
containing simple mappings of environmental condition to physical
response), and do not use internal operations (e.g., ‘search’).
Accordingly, real-time replanning is not possible, and appropriate
condition-response mapping must be predicable at design time. [324]
Hybrid approaches are usually comprised of both a reactive system for
the real time safety of the agent, and a centralized, deliberative
system to select appropriate problem solving sequences. [324]
Behaviour-Based Approaches
‘Behavior’ in behaviour-based systems must be defined according to
implementation required by each problem environment
The fundamental difference between reactive and behavioural based
systems is that the latter allow storage and modification of internal
states, performed by distributed internal behaviours.
Reasoning is not centralized, but distributed throughout behavioural
repertoire, making behaviour selection strategy a key design challenge.
[325]
Some attempted design strategies for the behaviour selection challenge:
Brooks 1986:
Subsumption Architecture imposes a
control hierarchy on the behaviours.
Payton et al 1982: Voting scheme selects behaviours.
Maes 1991: Spreading activation selects behaviours.
[325]
Behavioural based systems often programmed with internally stored
behaviours typical of emergent behaviours seen in reactive systems,
allowing higher order emergent behaviours, such as group foraging.
Constraints on Behavior-Based Design Definition:
1. Behaviors are simple.
2. Behaviors can be incrementally added to the
system.
3. Execution of behaviours is not serialized.
4. Scope of the behaviour not as time-limited as
reactions in reactive systems.
5. Behaviors interact with behaviours of other
agents
in the world.
This relaxed definition allows a variety of innovative implementations,
but creates difficulty in comparative system analysis. [325]
Whether any clear advantage exists between behaviour-based and other
architecture in a given comparison depends on how the behaviours are
designed and applied. [326]
Experimental Examples
Toto
Toto is a non-hybrid, behaviour-based system controlled robot featuring
both real-time reaction and higher level reasoning capabilities.
Some reactive rules are used for motion, however navigation, boundary
following, landmark detection, map learning, and path finding are
realized by interactive behaviours acting as world-monitoring
perceptual
filters. [327]
Figure 2 presents a schematic for Toto’s control architecture. [327].
The corridor-finding behaviour
monitors Toto’s movement, updating
straight-movement confidence, and sensing surrounding boundaries. [328]
Landmark detecting behaviours
send objects identified as known landmarks
(exceeding a threshold) to map
behaviours, the most similar of which are
returned, orientating Toto in the environment. (New maps are created by
adding the landmarks to an empty map shell, if no maps are sufficiently
similar). [328]
The path planning behaviour
spread activation through all landmarks
(landmark behaviours, which include measurements of distance) in the
map
(map behaviour), and selects the path marked by the least distance, and
generates reactive motion commands moving Toto to the next landmark in
the planned route. [328]
Figure 3 clarifies the creation and use of a map behaviour.[328]
Maps can be updated, and routes modified or discarded for dynamic path
blockages.
Toto’s behaviour selection strategy is accomplished by a selection of
mutually inhibitory or non-interfering behaviours running in parallel,
and a memory-capable spreading activation network.
Representations are active,
meaning procedural and distributed.
[329]
Nerd Herd (Multi Agent Control)
The compounding bandwidth demands of online, centralized control of
multiple agents in dynamic, noisy environments greatly restrict
possible group size.
Local, reactive group control allows large group sizes, but yields poor
outcome prediction.
The Nerd Herd mobile robot control system integrates reactive rules and
behaviours (the basis set)
designed and combined to optimally interact
with the behaviour of other agents in order to produce higher level
group behaviour. [329]
Nerd Herd demonstrates that local, behaviour-based control of
multi-agent systems scales better to large groups and outperforms
centralized control systems. [330]
Flocking behaviour resulted from summing
(real time combination of behavioural outputs) of safe-wandering,
aggregation, dispersion, and
homing basis behaviours. [330]
Foraging behaviour resulted from
switching (conditional alternating
between behavioural outputs by the lateral inhibition of all but one
basis behaviour at a time) between the same behaviours used in
flocking.
[331].
Don Group (Behavior Selection Learning)
Basis behaviours also allow agents to use learn to improve behaviour
selection (e.g., which basis behaviours to inhibit) using feedback
produced when interacting with the environment and other agents. [331]
Standard learning approaches in autonomous agents is prohibitively slow
due to the necessary honing of large numbers of simple ‘reactions’ as
the basic representational unit.
Don Group decreased learning time compared to reactive systems by using
comparatively fewer, complex behaviours as the basic representational
unit.
It was necessary to increase the amount of feedback as the
demonstration progressed due to a noisy and dynamic environment. [332]
Sensory feedback after behaviour completion aids correlation between
condition and appropriate behaviour.
Internal progress estimation feedback allows agent to learn when to
continue or abandon certain behaviours.
No centralized reinforcement was necessary as all reward and punishment
generated from internal reinforcement systems
Using groups of 3-4 agents, in 95% of trials groups using both internal
and sensory reinforcement learned efficient foraging behaviour (in
under
15 minutes per trial), vs. 60% of trials using sensory reinforcement
alone, and 30% of trials using a standard method found most effective
in static domains (the Q method). [333]
Discussion
The results of the Toto and Nerd Herd experiments suggest that
centralized behaviour control is unnecessary in behaviour co-ordination.
Matarić 1994: Outlines formal criteria for selecting and designing sets
of basis behaviour.
Methods for the automation of basis behaviour set selection through
genetic learning are currently being explored.
Mahadavan and Connell 1991, Brooks 1990, Simsaria & Matarić 1995,
Matarić 1996: Further support for the use of behaviour as a basis for
robotic learning.
[334]
Back
to
the
Cognitive
Science
Summaries
homepage
Cognitive Science Summaries Webmaster:
JimDavies (jim@jimdavies.org)