Russell & Norvig (1991): Do the right thing

[ CogSci Summaries home | UP | email ]
http://www.jimdavies.org/summaries/

Rational Metareasoning, Chapter 3: Do the Right Thing, S. Russell and E. Weld.

Author of the summary: Yaxin Liu, 1999, yxliu@cc

Cite this paper for:

Rational metareasoning

Decision theoretic meta-control

Meta-greedy strategy

Single-step assumption

Main Ideas:

It discusses how decision theory principles can be applied to meta-reasoning, what assumption about reasoning is needed, and what implications we can have. More importantly, what information we can make use of and what simplifications are necessary to do meta-reasoning.

Basic principles

Computations as internal actions, to be selected among based on their expected utilities
Utility is from expected effects: time (and environmental changes) and revision of agent's intended external actions

What computation is NOT for

+ Time cost outweigh the gain
+ Computation cannot make differences

Assumptions

+ Outcome of external actions are known (deterministic)
+ Utility of each outcome unknown

As dynamic probabilities and utilities (Good) (?)

Models of deliberation

Three models of different levels of detail, and more assumptions added

Notation

    * A: external actions
    * S: internal actions, computation steps
    * W: world state, both external and internal
    * [X]: the world state resulted from action X and current world state
    * [X, W]: the world state resulted from action X and state W
    * U(W): utility of state W, only depends on the external part
    * S: sequence of computation, T for future computation
    * S.S_j: sequence of S followed by S_j
    * e: evidence made available by S, e_j is by S_j
    * \hat{Q}^S: estimate of Q, based on S
    * \alpha: current default action
    * \alpha_T: external action supported by computation T
    * \beta_1, \beta_2, ...: current second-, third- best external actions

External model:

Consider utility/probability of external actions and internal states; a default (current best) external action \alpha; computation is to refine \alpha; random choice can be the baseline

* Basic formula: E[U([A])] = \sum_k P(W_k) U([A, W_k])
* Value of computation: V(S) = U([S]) - U(\alpha)
* For complete computation, S is followed by an external action:
U([S]) = U([\alpha_S, [S]])
* For partial computation:
U([S]) = \sum_T P(T) U([\alpha_T, [S.T])

* Ideal control algorithm:
+ Keep doing the available computation S with the highest V(S) until all are negative
+ Do action \alpha

* We definitely need approximation, because the utilities are not known in advance, so we have to guess! In this model, one simplification (approximation) usually holds, which is time cost.

* Time and its cost, derivation as follows
    + Intrinsic utility: U([A, [S]]) = U_I(A) - C(A, S)
    + Ideally, C(A, S) = C(S), independent of A
    + Further, assume only the duration of S matters, C(S) = TC(|S|)
    + Benefit of computation: \Delta(S) = U(\alpha_S) - U(\alpha)
    + The value of computation S then is: U(S) = \Delta(S) - TC(|S|)

Estimated utility model:

\alpha maximize utility; computation is to refine utility estimation; posterior estimation of utility based on computation results (or evidences)

* Estimates and partial information
    + \hat{U}^S([A]) = E[U([A]) | e] for sequence S (up to now)
    + \hat{U}^{S.S_j}([A]) = E[U(A) | e and e_j]for further computation
    + Discussion: for non-axiometic probabilities, [more like possibilities]

* Analysis for complete computations
    + From the previous model:
\hat{V}^S(S_j) = E[(U(\alpha_{S_j}) - U(\alpha)) | e and e_j] - TC(|S_j|)
    + E[\hat{V}(S_j)] =
\int_{u} max(u) p_j(u) d u - \int_{-\inf}^{\inf} u p_{\alpha j}(u) d u
[This we speculate is the \Delta part elaborated]
    + p are to be collected either by statistics, or by using current distribution if calculation is exact; require further simplification assumptions

* Simplifying assumptions
* Meta-greedy algorithms: expand one step, then estimate the ultimate effects; no commitment to which external action to take
* Single-step assumption: any computation is complete, assume commit to action after one step of computation

* Partial computations
* Problem without partial computation formulation: credit assignment, seperability of benefit \Delta

* Qualitative behavior: Only compute when it helps

Concrete model:

use object-level features, as what computation changes what value

* Principles:
+ A computation only affects a certain components of the internal structure (seperability of j)
+ Changes to these components affect the agent's choice of external action in known ways (possibility for f)

* The formulas are confusing, seems the utility of external action is independent of what these actions are. It is used in chapter 4 and 5 as the starting point.

Summary author's notes:

none

Back to the Cognitive Science Summaries homepage
Cognitive Science Summaries Webmaster:

JimDavies (jim@jimdavies.org)

Last modified: Thu Apr 15 11:07:19 EDT 1999