[ CogSci Summaries home | UP | email ]
http://www.jimdavies.org/summaries/

Rational Metareasoning, Chapter 3: Do the Right Thing, S. Russell and E. Weld.

Author of the summary: Yaxin Liu, 1999, yxliu@cc

Cite this paper for:

Main Ideas:

It discusses how decision theory principles can be applied to meta-reasoning, what assumption about reasoning is needed, and what implications we can have.  More importantly, what information we can make use of and what simplifications are necessary to do meta-reasoning.

Basic principles

What computation is NOT for

    + Time cost outweigh the gain
    + Computation cannot make differences

Assumptions

    + Outcome of external actions are known (deterministic)
    + Utility of each outcome unknown

As dynamic probabilities and utilities (Good) (?)

Models of deliberation

Three models of different levels of detail, and more assumptions added

Notation

    * A: external actions
    * S: internal actions, computation steps
    * W: world state, both external and internal
    * [X]: the world state resulted from action X and current world state
    * [X, W]: the world state resulted from action X and state W
    * U(W): utility of state W, only depends on the external part
    * S: sequence of computation, T for future computation
    * S.S_j: sequence of S followed by S_j
    * e: evidence made available by S, e_j is by S_j
    * \hat{Q}^S: estimate of Q, based on S
    * \alpha: current default action
    * \alpha_T: external action supported by computation T
    * \beta_1, \beta_2, ...: current second-, third- best external actions

 External model:

  Consider utility/probability of external actions and internal states; a default (current best) external action \alpha; computation is to refine \alpha; random choice can be the baseline

  * Basic formula: E[U([A])] = \sum_k P(W_k) U([A, W_k])
  * Value of computation: V(S) = U([S]) - U(\alpha)
  * For complete computation, S is followed by an external action:
      U([S]) = U([\alpha_S, [S]])
  * For partial computation:
     U([S]) = \sum_T P(T) U([\alpha_T, [S.T])

  * Ideal control algorithm:
      + Keep doing the available computation S with the highest V(S) until all are negative
      + Do action \alpha

  * We definitely need approximation, because the utilities are not known in advance, so we have to guess!  In this model, one simplification (approximation) usually holds, which is time cost.

  * Time and its cost, derivation as follows
    + Intrinsic utility: U([A, [S]]) = U_I(A) - C(A, S)
    + Ideally, C(A, S) = C(S), independent of A
    + Further, assume only the duration of S matters, C(S) = TC(|S|)
    + Benefit of computation: \Delta(S) = U(\alpha_S) - U(\alpha)
    + The value of computation S then is: U(S) = \Delta(S) - TC(|S|)
 

Estimated utility model:

   \alpha maximize utility; computation is to refine utility estimation; posterior estimation of utility based on computation results (or evidences)

* Estimates and partial information
    + \hat{U}^S([A]) = E[U([A]) | e] for sequence S (up to now)
    + \hat{U}^{S.S_j}([A]) = E[U(A) | e and e_j] for further computation
    + Discussion: for non-axiometic probabilities, [more like possibilities]

* Analysis for complete computations
    + From the previous model:
  \hat{V}^S(S_j) = E[(U(\alpha_{S_j}) - U(\alpha)) | e and e_j] - TC(|S_j|)
    + E[\hat{V}(S_j)] =
 \int_{u} max(u) p_j(u) d u -  \int_{-\inf}^{\inf} u p_{\alpha j}(u) d u
  [This we speculate is the \Delta part elaborated]
    + p are to be collected either by statistics, or by using current distribution if calculation is exact; require further simplification assumptions

* Simplifying assumptions
    * Meta-greedy algorithms: expand one step, then estimate the ultimate effects; no commitment to which external action to take
    * Single-step assumption: any computation is complete, assume commit to action after one step of computation

* Partial computations
    * Problem without partial computation formulation: credit assignment, seperability of benefit \Delta

* Qualitative behavior: Only compute when it helps
 

Concrete model:

use object-level features, as what computation changes what value

  * Principles:
    + A computation only affects a certain components of the internal structure (seperability of j)
    + Changes to these components affect the agent's choice of external action in known ways (possibility for f)

  * The formulas are confusing, seems the utility of external action is independent of what these actions are.  It is used in chapter 4 and 5 as the starting point.
 

Summary author's notes:


Back to the Cognitive Science Summaries homepage
Cognitive Science Summaries Webmaster:
JimDavies (jim@jimdavies.org)

Last modified: Thu Apr 15 11:07:19 EDT 1999