decide which informational arc to re-insert we apply the notion of
domination as described in the previous section. For this purpose we
need Theorem 1 below. Here, for a node x nd(d) and a set of
nodes A, we say that x is (not) dominated by A at d, if x i
policies initially are uniform1 , and they are updated using the order
dk, . . . , d1. In particular this leads to more efficient algorithms for
solving IDs (Nilsson and Lauritzen (2000), and Madsen and Nilsson
(2001). 2.5 Redundant information in LIMIDs
shows the expected utility of EU(qi). Figure (b) shows time usages. 89
5 Lower bounds for general LIMIDs In the next two sections we
present methods for computing bounds for general LIMIDs. The
methods are similar, though more complex, to those presented
d2. Since C 6= , we define A, L 0 , and R by (letting Si cfw_s1, . . . ,
si): A = V \ fa(L0)Cd3 (d3) = S4 L 0 = (L0)Cd3 )Ad3 = LS3d3
(because s4 is a descendant of d3) R = cfw_n | n is requisite for d3 in L 0
= cfw_s3 Because R C = , we redefine L 0 and
bounds An upper bound for a LIMID L is a number that is larger (or
equal) to EU(L). We compute an upper bound for L by adding fill-in
arcs into the decision nodes of L, and subsequently evaluate the
obtained LIMID. We thus define: Definition 1 A LIMID L 0
adding arcs into dk. Then we convert dk into a chance node, and
make dk1 extremal by adding arcs into dk1. Proceeding in this
way, we eventually construct a soluble covering of L. At each stage of
this procedure, a subset of the decision nodes are made ex
be seen that fa(j ) L0 ( de(dj ) | (B cfw_dj) Now, (10) follows
because faL0(dj ) = faL(dj ), deL0(dj ) = deL(dj ), and the fact that dseparation is preserved under arc-removal (since L L0 ). Appendix
C For two undirected graphs, we shall say that G 0 is
faL0(i1) and faL(di) = faL0(di), i. e. faL0(i1) L ( deL0(di) |
faL0(di). From Lemma 3, it now follows that faL0(i1) L0 (
deL0(di) | faL0(di), i. e. L 0 has partial solution ordering dj+1, . . . , dk.
Conversely, suppose B 6 Bj . By (9), we have that B 6L
we generate, a local maximum strategy is computed using Single
Policy Updating. We proceed in this manner, until we reach a LIMID
Ln where evaluation is impossible. At this stage, the process is
stopped, and the best strategy computed so far is returned.
EU(L j+1). 2. (Soluble) L j is soluble with exact solution ordering
d1, . . . , dk. 3. (Covering) L j is a covering of L. Proof: The first
property follows trivially since L j Lj+1. To prove the second
property, we note faLj (i1) faLj (di), and the result
production economy. CHAPTER 8 Sammendrag Infektise sygdomme
er en fast bestanddel af moderne svineproduktion. En forstelse af
smittemekanismerne er ndvendigt for at kunne bekmpe
problemerne. Strre bedrifter og effektive produktionssystemer har
get produkt
decisions because they dominate earlier observations and decisions.
However, intuitively, this approach also makes sense: More recent
observations tend to be more informative than distant observations.
86 4 Bounds for POMDPs s1 s2 s3 s4 o1 o2 o3 d1 d2 d3
algorithms is illustrated on the Robot example. 84 4 Bounds for
POMDPs 4.1 POMDPs Consider a finite state, finite action POMDP. Let
t cfw_1, 2, . . . , k represent stage t of the decision problem. The finite
parameter k represents the planning horizon of
approximations and assumptions. The approach we take in this paper
is based on LIMIDs, and the evaluation of limited memory strategies.
As described in the present paper and in Lauritzen and Nilsson
(2001), there are good arguments for using limited memor
the (hypothetical) case where the decision maker has additional
knowledge on the true state of each variable in F. We now define:
Definition 2 Suppose x and y are parents of decision node d in a
LIMID L. Then, x is dominated by y at d if FL(x, d) FL(y, d)
cycles. There are three types of nodes in the graph: Chance nodes,
shown as circles, represent random variables. Decision nodes, shown
as boxes, represent decision variables. Finally, value nodes, shown as
diamonds, represent (local) utility functions. Ar
Organization GPS Global Positioning System ICAR - International
Committee for Animal Recording IFPRI The International Food
Policy Research Institute LN2 Liquid Nitrogen MAAIF Ministry of
Agriculture Animal Industry and Fisheries MFPED Ministry of
Finance
situation where the Robot does not keep record on previous
observations is modelled by the LIMID shown in Figure 3. 80 2
Limited Memory Influence Diagrams s3 o3 d3 s3 s4 d3 s2 s3 d2 s2 o2
d2 s1 s2 d1 s1 o1 d1 Figure 4: Junction tree for the LIMID
represen
are performed: Retract: Retract the current policy for di from q to
obtain qdi := q \ cfw_di . Optimize: Compute a new policy for di : di
= arg maxdi EU (qdi cfw_di ) Replace: Redefine q := qdi cfw_di .
The policies are updated until they converge to a st
considered. In Figure 8(a), the lower and upper bounds obtained by
the above procedures in the case of 10 decisions are given. Here, the
x indicates the modification of Procedure 1 in which the strategy 88
4 Bounds for POMDPs 0 2 4 6 8 10 3.8 4.0 4.2 4.4
a wall always fails. Otherwise, the desired moves always succeeds.
Noisy actors on the other hand have a 0.089 probability of not
moving, a 0.001 probability of moving in the opposite direction, a
probability of 0.01 for moving in the +90 degree direction
full algorithm for computing lower bound strategies for a POMDP is
presented below in algorithmic terms. Here, and throughout, we
abbreviate Procedure Single Policy Updating into SPU. Input: LIMID L
corresponding to POMDP P with decisions d1, . . . , dk.
Science. 106 References Littman, M. L., Cassandra, A. R., and
Kaelbling, L. P. (1995). Learning policies for partially observable
environments: Scaling up. In Proceedings of the Twelfth International
Conference on Machine Learning, (ed. A. Prieditis and S
nodes = cfw_d1, . . . , dk. If de(di+1) i = , (7) then L 0 =
(Lfa(1)d1 ) )fa(k1)dk , is a soluble covering of L with exact
solution ordering d1, . . . , dk. 6.1 A low complexity upper bound 95
Proof: By construction faL0(di) faL0(i1) for all i, and the re
problem, the computed limited memory strategy reaches the goal
with probability 99.987%. The Maze3 problem is the most
challenging problem. In this case, the obtained limited memory
strategy reaches the goal state with probability 89.71%. Again, the
optim
solution ordering d1, . . . , dk, which proves 1. To see the second
statement, it may be advantageous to regard Figure 6 and 7 while
reading the arguments below. For i cfw_m, m + 1, we note that L i can
be written as: L i = (Ls1d1+i ) )skidk . 4.4 Bounds
are non-requisite for dj in L j . The proof is by induction. Note that A1
Aj faL(j1), (19) and by (7), deL(dj ) = deLj (dj ). (20) and
consider the cases: j = k: From (18), (19), (20) and k 1 successive use
of Lemma 3, it follows that, in L k , Ak is d-s
of the game, the agent is randomly placed in one of the non-goal
states. Each time the agent is present in the goal 2.1 LIMID
representation 79 Figure 1: Agent environment of the Robot
example; empty squares are indistinguishable to the agent, whereas
the
therefore, L 2 = (L 3 )cfw_x2,y2d2 Then, non-requisite arcs in L 2 are
removed. In L 2 , all parents of d1 are requisite, and all parents of d2,
except cfw_x2, y2, are non-requisite. Abbreviating Oi = cfw_nsi ,ssiwsiesi we
thus obtain: L 2 (L 2 )min = (L
To af afhandlingens artikler handler derfor om at trffe sekventielle
beslutninger i domner prget af ufuldstndige observationer. En
grafisk og objekt-orienteret notation prsenteres til kompakt
specifikation af ovennvnte beslutningsproblemer. Hermed bliver