【くりかえしほう (iterative method)】
基本的には多変数同時問題を1変数問題の繰り返しで解く方法. 例えば, 原始政策 μ = { μ 1 , μ 2 } ∈ Π p {\displaystyle \mu =\{\mu _{1},\mu _{2}\}\in \Pi _{p}\,} の2変数同時最適化問題
m a x μ ∑ ∑ ( x 2 , x 3 ) g ( x 1 , u 1 , x 2 , u 2 , x 3 ) ⋅ p ( x 2 | x 1 , u 1 ) p ( x 3 | x 2 , u 2 ) {\displaystyle \mathbf {max} _{\mu }\sum \sum _{(x_{2},x_{3})}g(x_{1},u_{1},x_{2},u_{2},x_{3})\cdot p(x_{2}\vert x_{1},u_{1})p(x_{3}\vert x_{2},u_{2})\,}
を解く代わりに, μ 2 {\displaystyle \mu _{2}\,} による最適化の後に μ 1 {\displaystyle \mu _{1}\,} による最適化を行なう問題
m a x μ 1 m a x μ 2 ∑ ∑ ( x 2 , x 3 ) g ( x 1 , u 1 , x 2 , u 2 , x 3 ) ⋅ p ( x 2 | x 1 , u 1 ) p ( x 3 | x 2 , u 2 ) {\displaystyle {\mathbf {max} }_{\mu _{1}}\mathbf {max} _{\mu _{2}}\sum \sum _{(x_{2},x_{3})}g(x_{1},u_{1},x_{2},u_{2},x_{3})\cdot p(x_{2}\vert x_{1},u_{1})p(x_{3}\vert x_{2},u_{2})\,}
を解く方法. ただし, u 1 = μ 1 ( x 1 ) , u 2 = μ 2 ( x 1 , u 1 , x 2 ) {\displaystyle u_{1}=\mu _{1}(x_{1}),u_{2}=\mu _{2}(x_{1},u_{1},x_{2})\,} .