Matric Summary in Progress

http://www.jimdavies.org/summaries/

Matrić, M. J. (1997). Behaviour-based control: examples from navigation, learning, and group behaviour. Journal of Experimental & Theoretical Artificial Intelligence, 9 (2-3), 323-336.

@Article{Mataric1997,
author =     {Matarić, Maja M.},
title =    {Behavior-Based Control: Examples from Navigation, Learning, and Group Behavior},
journal =      {Journal of Experimental & Theoretical Artificial Intelligence},
year =       {1997},
volume =     {9},
number =     {2-3},
pages =        {323--326}
}

Author of the summary: Stephen Jones, 2011, steve@boutiquepsychology.net

Cite this paper for:

Demonstrations of a selection of behaviour-based agent control architectures applied to navigation, path finding, multiple agent behaviour, and behaviour learning.
Comparative summaries of deliberative, reactive, hybrid, and behaviour-based approaches to control
‘Behaviour’ in behaviour-based systems must be defined according to implementation required by each problem environment [325]
Behaviour-based approach fundamentally different than reactive approach. [325]
Constraints on definition of behaviour-based design.[325]
Whether any clear advantage exists between behaviour-based and other architecture in a given comparison depends on how the behaviours are designed and applied. [326]
SYSTEM: Toto is a behaviour-based, non-hybrid system featuring both real-time reaction and higher level reasoning. [326]
SYSTEM: Nerd Herd demonstrates that local, behaviour-based control of multi-agent systems scales better to large groups and outperforms centralized control systems. [330]
Current learning in autonomous agents is prohibitively slow due to the common use of large numbers of simple ‘reactions’ as the basic representational unit.[332]
SYSTEM: Don Group decreases learning time compared to reactive systems by using comparatively fewer, more complex behaviours as the basic representational unit [332], and deriving feedback from internal reinforcement systems. [333]
The results of the Toto and Nerd Herd experiments suggest that centralized behaviour control is unnecessary in behaviour co-ordination.[334]
Methods for the automation of basis behaviour set selection through genetic learning are currently being explored. [334]

The full article can be found at http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.41.5055&rep=rep1&type=pdf

Introduction
System control architectures fall into deliberative, reactive, deliberative-reactive hybrid, or behaviour-based categories.
Each architecture provides unique principles for control system organization, and constraints on how problems may be approached. [323]

Deliberative, Reactive, and Hybrid Approaches
Deliberative (plan-based) approaches are ‘top down’ systems, performing centralized operations on centralized representations of world knowledge, such as environment maps.
If the problem space is known at design-time, tasks can be explicitly sequenced, and performance easily evaluated.
However, deliberative systems are criticized as performing poorly in unpredictable, dynamic environments, since replanning has a prohibitive processing cost where possible at all. [323]

Reactive systems have been favored where task situations require real-time response to environments.
The systems use minimal internal representations (often tables containing simple mappings of environmental condition to physical response), and do not use internal operations (e.g., ‘search’).
Accordingly, real-time replanning is not possible, and appropriate condition-response mapping must be predicable at design time. [324]

Hybrid approaches are usually comprised of both a reactive system for the real time safety of the agent, and a centralized, deliberative system to select appropriate problem solving sequences. [324]

Behaviour-Based Approaches
‘Behavior’ in behaviour-based systems must be defined according to implementation required by each problem environment
The fundamental difference between reactive and behavioural based systems is that the latter allow storage and modification of internal states, performed by distributed internal behaviours.
Reasoning is not centralized, but distributed throughout behavioural repertoire, making behaviour selection strategy a key design challenge. [325]
Some attempted design strategies for the behaviour selection challenge:
    Brooks 1986: Subsumption Architecture imposes a control hierarchy on the behaviours.
    Payton et al 1982: Voting scheme selects behaviours.
    Maes 1991: Spreading activation selects behaviours. [325]

Behavioural based systems often programmed with internally stored behaviours typical of emergent behaviours seen in reactive systems, allowing higher order emergent behaviours, such as group foraging.

Constraints on Behavior-Based Design Definition:
    1. Behaviors are simple.
    2. Behaviors can be incrementally added to the system.
    3. Execution of behaviours is not serialized.
    4. Scope of the behaviour not as time-limited as reactions in reactive systems.
    5. Behaviors interact with behaviours of other agents in the world.
This relaxed definition allows a variety of innovative implementations, but creates difficulty in comparative system analysis. [325]

Whether any clear advantage exists between behaviour-based and other architecture in a given comparison depends on how the behaviours are designed and applied. [326]

Experimental Examples

Toto
Toto is a non-hybrid, behaviour-based system controlled robot featuring both real-time reaction and higher level reasoning capabilities.
Some reactive rules are used for motion, however navigation, boundary following, landmark detection, map learning, and path finding are realized by interactive behaviours acting as world-monitoring perceptual filters. [327]

Figure 2 presents a schematic for Toto’s control architecture. [327].

The corridor-finding behaviour monitors Toto’s movement, updating straight-movement confidence, and sensing surrounding boundaries. [328]
Landmark detecting behaviours send objects identified as known landmarks (exceeding a threshold) to map behaviours, the most similar of which are returned, orientating Toto in the environment. (New maps are created by adding the landmarks to an empty map shell, if no maps are sufficiently similar). [328]
The path planning behaviour spread activation through all landmarks (landmark behaviours, which include measurements of distance) in the map (map behaviour), and selects the path marked by the least distance, and generates reactive motion commands moving Toto to the next landmark in the planned route. [328]

Figure 3 clarifies the creation and use of a map behaviour.[328]

Maps can be updated, and routes modified or discarded for dynamic path blockages.
Toto’s behaviour selection strategy is accomplished by a selection of mutually inhibitory or non-interfering behaviours running in parallel, and a memory-capable spreading activation network.
Representations are active, meaning procedural and distributed. [329]

Nerd Herd (Multi Agent Control)
The compounding bandwidth demands of online, centralized control of multiple agents in dynamic, noisy environments greatly restrict possible group size.
Local, reactive group control allows large group sizes, but yields poor outcome prediction.

The Nerd Herd mobile robot control system integrates reactive rules and behaviours (the basis set) designed and combined to optimally interact with the behaviour of other agents in order to produce higher level group behaviour. [329]
Nerd Herd demonstrates that local, behaviour-based control of multi-agent systems scales better to large groups and outperforms centralized control systems. [330]

Flocking behaviour resulted from summing (real time combination of behavioural outputs) of safe-wandering, aggregation, dispersion, and homing basis behaviours. [330]
Foraging behaviour resulted from switching (conditional alternating between behavioural outputs by the lateral inhibition of all but one basis behaviour at a time) between the same behaviours used in flocking. [331].

Don Group (Behavior Selection Learning)
Basis behaviours also allow agents to use learn to improve behaviour selection (e.g., which basis behaviours to inhibit) using feedback produced when interacting with the environment and other agents. [331]
Standard learning approaches in autonomous agents is prohibitively slow due to the necessary honing of large numbers of simple ‘reactions’ as the basic representational unit.

Don Group decreased learning time compared to reactive systems by using comparatively fewer, complex behaviours as the basic representational unit.

It was necessary to increase the amount of feedback as the demonstration progressed due to a noisy and dynamic environment. [332]

Sensory feedback after behaviour completion aids correlation between condition and appropriate behaviour.
Internal progress estimation feedback allows agent to learn when to continue or abandon certain behaviours.
No centralized reinforcement was necessary as all reward and punishment generated from internal reinforcement systems

Using groups of 3-4 agents, in 95% of trials groups using both internal and sensory reinforcement learned efficient foraging behaviour (in under 15 minutes per trial), vs. 60% of trials using sensory reinforcement alone, and 30% of trials using a standard method found most effective in static domains (the Q method). [333]

Discussion
The results of the Toto and Nerd Herd experiments suggest that centralized behaviour control is unnecessary in behaviour co-ordination.

Matarić 1994: Outlines formal criteria for selecting and designing sets of basis behaviour.
Methods for the automation of basis behaviour set selection through genetic learning are currently being explored.

Mahadavan and Connell 1991, Brooks 1990, Simsaria & Matarić 1995, Matarić 1996: Further support for the use of behaviour as a basis for robotic learning.
[334]

Back to the Cognitive Science Summaries homepage
Cognitive Science Summaries Webmaster:
JimDavies (jim@jimdavies.org)