「両的計画」の版間の差分

2008年3月23日 (日) 17:43時点における最新版

【りょうてきけいかく (bynamic programming)】

概要

いわゆる動的計画法を2元連立的に考えた逐次最適化法. 単調性「非減少性」に代わって両調性「非減少性または非増加性のいずれか」の下では, 部分最大化問題群と部分最小化問題群の両群を考えて, 両群の相隣る問題間の関係を両帰式としてを導く. これを逐次解いて, 最後に与問題の最適解を求める方法である.負値乗法型, 負値乗加法型などの評価系が両的計画で解ける. 確率系ではマルコフ両決定過程ともいう.

詳説

　動的計画法は単調性 monotonicity と再帰性 recursiveness (可分性 separability) の下で適用される. この単調性は目的関数の「非減少性」を意味しているが, これを両調性 bitonicity「非減少性または非増加性のいずれか」まで拡大解釈すると, より広い逐次決定過程が考えられる. これを両的計画と呼ぶ. 特に, 確率システム上ではマルコフ両決定過程[3]という. これは確率的動的計画法を単にマルコフ決定過程ということに準じている.

　両的計画法によって最大化問題を解く場合, 与問題に対する部分最大化問題群ばかりでなく部分最小化問題群をも考える必要がある. このとき, 最大値関数と最小値関数の間に成り立つ連立再帰式を両帰式という. 負値乗法型評価系[1], 負値乗加法型評価系 [2] の最適化問題や最短最長ルート問題などは両帰式で解ける.

　さて, 逐次決定過程が次の要素で与えられるとしよう：

構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle S\, } は状態空間, 構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle s_{n} \in S\, } は第構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle n\, } 状態, 構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle A\, } は決定空間

構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle A(s_{n})\, } は構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle s_{n}\, } での可能決定空間, 構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle a_{n} \in A(s_{n})\, } は第構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle n\, } 決定

r:S\times A\to \mathbf {R} ^{1}

は利得関数, 構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle \beta : S \times A \to (-1,\,1)\, } は割引き関数

構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle k : S \to \mathbf{R}^{1}\, } は終端関数, 構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle T : S \times A \to S\, } は状態変換

構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle p = \{p(t|s,a)\}\, } はマルコフ推移法則, 構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle \sum_{t \in S}p(t|s,a) = 1, p(t|s,a) \ge 0\, }

このとき, 確定系上の負値乗加法評価系の最大化または最小化は

構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle \begin{array}{lll} {\rm max.~and~min.} & r_{1} + \beta_{1}r_{2} + \beta_{1}\beta_{2}r_{3} + \cdots \\ & + \beta_{1}\beta_{2} \cdots \beta_{N-1}r_{N} + \beta_{1}\beta_{2} \cdots \beta_{N}k \\ \mbox{s. t.} \; T(s_n,a_n) = & s_{n+1}, ~ a_{n} \in A(s_{n}) \quad (n = 1, 2, \ldots, N), \end{array}\, }

で表わされる. ただし構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle r_{n} = r(s_{n},a_{n}),~~\beta_{n}= \beta(s_{n},a_{n})\, } . このとき, 第構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle n\, } 段の状態構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle s_{n}\, } から始まる部分問題

構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle \begin{array}{lll} {\rm max.~and~min.} & r_{n} + \beta_{n}r_{n+1} + \beta_{n}\beta_{n+1}r_{n+2} + \cdots \\ & + \beta_{n}\beta_{n+1} \cdots \beta_{N-1}r_{N} + \beta_{n}\beta_{n+1} \cdots \beta_{N}k \\ \mbox{s. t.} \; T(s_m,a_m) = & s_{m+1}, ~ a_{m} \in A(s_{m}) \quad (m = n, n+1, \ldots, N ), \end{array}\, }

の最大値を構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle U_{n}(s_{n})\, } , 最小値を構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle u_{n}(s_{n})\, } とすると, 両最適値関数は両帰式

構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle U_{n}(s) = \max_{a:-}T(s,a; u_{n+1}) \vee \max_{a:+}T(s,a; U_{n+1})\, }

構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle u_{n}(s) = \min_{a:-}T(s,a; U_{n+1}) \wedge \min_{a:+}T(s,a; u_{n+1}) \quad \quad \mbox{(1)}\, }

構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle U_{N+1}(s) = u_{N+1}(s) = k(s)\, }

を満たす. ここに~ 構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle T(s,a; w) := r(s,a) + \beta(s,a)w(T(s,a)), ~ a:-(+)\, } は構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle \beta(s,a) <(\ge)\, 0\, } なる構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle a\, } である.

　また, マルコフ推移法則構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle p = \{p(t|s,a)\}\, } 上での期待値最適化問題

構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle \begin{array}{ll} {\rm max.~and~min.} & E[\,r_{1} + \beta_{1}r_{2} + \beta_{1}\beta_{2}r_{3} + \cdots \\ & + \beta_{1}\beta_{2} \cdots \beta_{N-1}r_{N} + \beta_{1}\beta_{2} \cdots \beta_{N}k\,] \end{array}\, }

構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle \mbox{s. t.} \; \; p(\cdot|s_n,a_n) ~ \sim s_{n+1}, ~ a_{n} \in A(s_{n}) \; (n = 1,2, \ldots ), \, }

の両帰式は一次変換

構文解析に失敗 (MathML、ただし動作しない場合はSVGかPNGで代替（最新ブラウザーや補助ツールに推奨）: サーバー「https://en.wikipedia.org/api/rest_v1/」から無効な応答 ("Math extension cannot connect to Restbase."):): {\displaystyle T(s,a; w) := r(s,a) + \beta(s,a) {\displaystyle \sum_{t \in S}w(t)p(t|s,a)}\, }

を用いて(1)と同じ型で与えられる[4].

参考文献

[1] R. Bellman, Dynamic Programming, Princeton Univ. Press, 1957.

[2] S. Iwamoto, "From Dynamic Programming to Bynamic programming," Journal of Mathematical Analysis and Applications, 177 (1993), 56-74.

[3] S. Iwamoto, "On Bidecision Processes," Journal of Mathematical Analysis and Applications, 187 (1994), 676-699.

[4] T. Fujita and K. Tsurusaki, "Stochastic Optimization of Multiplicative Functions with Negative Value," Journal of the Operations Research Society of Japan, 41 (1998), 351-373.

@@ 1行目: / 1行目: @@
 '''【りょうてきけいかく (bynamic programming)】'''
+=== 概要 ===
 いわゆる動的計画法を2元連立的に考えた逐次最適化法. 単調性「非減少性」に代わって両調性「非減少性または非増加性のいずれか」の下では, 部分最大化問題群と部分最小化問題群の両群を考えて, 両群の相隣る問題間の関係を両帰式としてを導く. これを逐次解いて, 最後に与問題の最適解を求める方法である.負値乗法型, 負値乗加法型などの評価系が両的計画で解ける. 確率系ではマルコフ両決定過程ともいう.
-詳しくは[[《両的計画》|基礎編：両的計画]]を参照.
+=== 詳説 ===
+　動的計画法は単調性 monotonicity と再帰性 recursiveness (可分性 separability) の下で適用される. この単調性は目的関数の「非減少性」を意味しているが, これを両調性 bitonicity「非減少性または非増加性のいずれか」まで拡大解釈すると, より広い逐次決定過程が考えられる. これを[[両的計画]]と呼ぶ. 特に, 確率システム上では[[マルコフ両決定過程]][3]という. これは確率的動的計画法を単にマルコフ決定過程ということに準じている.
+　両的計画法によって最大化問題を解く場合, 与問題に対する部分最大化問題群ばかりでなく部分最小化問題群をも考える必要がある. このとき, 最大値関数と最小値関数の間に成り立つ連立再帰式を[[両帰式 (動的計画法における)|両帰式]]という. [[負値乗法型評価系]][1], [[負値乗加法型評価系]] [2] の最適化問題や[[最短最長ルート問題]]などは両帰式で解ける.
+　さて, 逐次決定過程が次の要素で与えられるとしよう：
+:<math>S\, </math> は状態空間, <math>s_{n} \in S\, </math> は第<math>n\, </math>状態, <math>A\, </math>は決定空間
+:<math>A(s_{n})\, </math>は<math>s_{n}\, </math>での可能決定空間, <math>a_{n} \in A(s_{n})\, </math>は第<math>n\, </math>決定
+:<math>r : S \times A \to \mathbf{R}^{1}</math>は利得関数, <math>\beta : S \times A \to (-1,\,1)\, </math>は割引き関数
+:<math>k : S \to \mathbf{R}^{1}\, </math>は終端関数, <math>T : S \times A \to S\, </math>は状態変換
+:<math>p = \{p(t|s,a)\}\, </math>はマルコフ推移法則, <math>\sum_{t \in S}p(t|s,a) = 1, p(t|s,a) \ge 0\, </math>
+このとき, 確定系上の負値乗加法評価系の最大化または最小化は
+<center>
+<math>\begin{array}{lll}
+{\rm max.~and~min.} & r_{1} + \beta_{1}r_{2} +
+ \beta_{1}\beta_{2}r_{3} + \cdots   \\
+& + \beta_{1}\beta_{2} \cdots \beta_{N-1}r_{N} +
+ \beta_{1}\beta_{2} \cdots \beta_{N}k \\
+\mbox{s. t.} \; T(s_n,a_n) = &  s_{n+1}, ~ a_{n} \in A(s_{n}) \quad (n = 1, 2, \ldots, N),
+\end{array}\, </math>
+</center>
+で表わされる. ただし <math>r_{n} = r(s_{n},a_{n}),~~\beta_{n}= \beta(s_{n},a_{n})\, </math>.  このとき, 第<math>n\, </math>段の状態  <math>s_{n}\, </math> から始まる部分問題
+<center>
+<math>\begin{array}{lll}
+{\rm max.~and~min.} & r_{n} + \beta_{n}r_{n+1} +
+ \beta_{n}\beta_{n+1}r_{n+2} + \cdots   \\
+& + \beta_{n}\beta_{n+1} \cdots \beta_{N-1}r_{N} +
+ \beta_{n}\beta_{n+1} \cdots \beta_{N}k   \\
+\mbox{s. t.} \; T(s_m,a_m) = & s_{m+1}, ~ a_{m} \in A(s_{m}) \quad (m = n, n+1, \ldots, N ),
+\end{array}\, </math>
+</center>
+の最大値を <math>U_{n}(s_{n})\, </math>, 最小値を <math>u_{n}(s_{n})\, </math> とすると, 両最適値関数は両帰式
+<table align="center">
+<tr>
+<td>
+<math>U_{n}(s) = \max_{a:-}T(s,a; u_{n+1}) \vee \max_{a:+}T(s,a; U_{n+1})\, </math>
+</td>
+</tr>
+<tr>
+<td><math>u_{n}(s) = \min_{a:-}T(s,a; U_{n+1}) \wedge \min_{a:+}T(s,a; u_{n+1})
+\quad  \quad  \mbox{(1)}\, </math>
+</td>
+</tr>
+<tr>
+<td><math>U_{N+1}(s) = u_{N+1}(s) = k(s)\, </math> </td>
+</tr>
+</table>
+を満たす. ここに~  <math>T(s,a; w) := r(s,a) + \beta(s,a)w(T(s,a)), ~ a:-(+)\, </math>は <math>\beta(s,a) <(\ge)\, 0\, </math> なる <math>a\, </math> である.
+　また, マルコフ推移法則 <math>p = \{p(t|s,a)\}\, </math> 上での期待値最適化問題
+<table align="center">
+<tr>
+<td><math>\begin{array}{ll}
+{\rm max.~and~min.} & E[\,r_{1} + \beta_{1}r_{2} +
+ \beta_{1}\beta_{2}r_{3} + \cdots   \\
+& + \beta_{1}\beta_{2} \cdots \beta_{N-1}r_{N} +
+ \beta_{1}\beta_{2} \cdots \beta_{N}k\,]
+\end{array}\, </math></td>
+</tr>
+<tr>
+<td><math>\mbox{s. t.} \; \; p(\cdot|s_n,a_n) ~ \sim s_{n+1}, ~ a_{n} \in A(s_{n}) \;
+  (n = 1,2, \ldots ), \, </math>
+</td>
+</tr>
+</table>
+の両帰式は一次変換
+<center>
+<math>T(s,a; w) := r(s,a) + \beta(s,a) {\displaystyle \sum_{t \in S}w(t)p(t|s,a)}\, </math>
+</center>
+を用いて(1)と同じ型で与えられる[4].
+----
+'''参考文献'''
+[1] R. Bellman, ''Dynamic Programming'', Princeton Univ. Press, 1957.
+[2] S. Iwamoto, "From Dynamic Programming to Bynamic programming," ''Journal of Mathematical Analysis and Applications'', '''177''' (1993), 56-74.
+[3] S. Iwamoto, "On Bidecision Processes," ''Journal of Mathematical Analysis and Applications'', '''187''' (1994), 676-699.
+[4] T. Fujita and K. Tsurusaki, "Stochastic Optimization of Multiplicative Functions with Negative Value," ''Journal of the Operations Research Society of Japan'', '''41''' (1998), 351-373.
+[[Category:動的・確率・多目的計画|りょうてきけいかく]]

「両的計画」の版間の差分

2008年3月23日 (日) 17:43時点における最新版

概要

詳説

案内メニュー

検索