It discusses how decision theory principles can be applied to meta-reasoning, what assumption about reasoning is needed, and what implications we can have. More importantly, what information we can make use of and what simplifications are necessary to do meta-reasoning.
* Basic formula: E[U([A])] = \sum_k P(W_k) U([A, W_k])
* Value of computation: V(S) = U([S]) - U(\alpha)
* For complete computation, S is followed by an external action:
U([S]) = U([\alpha_S, [S]])
* For partial computation:
U([S]) = \sum_T P(T) U([\alpha_T,
[S.T])
* Ideal control algorithm:
+ Keep doing the available computation
S with the highest V(S) until all are negative
+ Do action \alpha
* We definitely need approximation, because the utilities are not known in advance, so we have to guess! In this model, one simplification (approximation) usually holds, which is time cost.
* Time and its cost, derivation as follows
+ Intrinsic utility: U([A, [S]]) = U_I(A) -
C(A, S)
+ Ideally, C(A, S) = C(S), independent
of A
+ Further, assume only the duration of S
matters, C(S) = TC(|S|)
+ Benefit of computation: \Delta(S) = U(\alpha_S)
- U(\alpha)
+ The value of computation S then is: U(S)
= \Delta(S) - TC(|S|)
* Estimates and partial information
+ \hat{U}^S([A]) = E[U([A]) | e]
for sequence S (up to now)
+ \hat{U}^{S.S_j}([A]) = E[U(A) | e
and e_j] for further computation
+ Discussion: for non-axiometic probabilities, [more
like possibilities]
* Analysis for complete computations
+ From the previous model:
\hat{V}^S(S_j) = E[(U(\alpha_{S_j}) - U(\alpha))
| e and e_j] - TC(|S_j|)
+ E[\hat{V}(S_j)] =
\int_{u} max(u) p_j(u) d u -
\int_{-\inf}^{\inf} u p_{\alpha j}(u) d u
[This we speculate is the \Delta part elaborated]
+ p are to be collected either by statistics,
or by using current distribution if calculation is exact; require further
simplification assumptions
* Simplifying assumptions
* Meta-greedy algorithms: expand one step, then
estimate the ultimate effects; no commitment to which external action to
take
* Single-step assumption: any computation is complete,
assume commit to action after one step of computation
* Partial computations
* Problem without partial computation formulation:
credit assignment, seperability of benefit \Delta
* Qualitative behavior: Only compute when it helps
* Principles:
+ A computation only affects a certain components
of the internal structure (seperability of j)
+ Changes to these components affect the agent's
choice of external action in known ways (possibility for f)
* The formulas are confusing, seems the utility of external action
is independent of what these actions are. It is used in chapter 4
and 5 as the starting point.