In an earlier post I defined the

Is $u < 1$? Worth €100.

Determine $u$. Worth another €100, even if $u=1$. I'm happy to announce that there is now a solution:

The price of memory, $u$, is 1. I don't want to repeat the precise definition of $u$, but here is a short recap: Suppose you have an acyclic finite MDP and a strategy $\sigma$ that guarantees to visit a red transition with probability 1. This strategy $\sigma$ might use memory, i.e., might base its behaviour in a state on the history of the run, as indicated in yellow:

Any such strategy $\sigma$ induces a randomized strategy $\overline\sigma$ that acts like $\sigma$ in average, but is memoryless. In each state, when averaged over all runs, the strategies $\sigma$ and $\overline\sigma$ take the same actions with the same probabilities:

One can show that $\sigma$ and $\overline\sigma$ visit red transitions equally often in average; in particular, $\overline\sigma$ visits at least…

*price of memory*, $u$, and asked:Is $u < 1$? Worth €100.

Determine $u$. Worth another €100, even if $u=1$. I'm happy to announce that there is now a solution:

The price of memory, $u$, is 1. I don't want to repeat the precise definition of $u$, but here is a short recap: Suppose you have an acyclic finite MDP and a strategy $\sigma$ that guarantees to visit a red transition with probability 1. This strategy $\sigma$ might use memory, i.e., might base its behaviour in a state on the history of the run, as indicated in yellow:

Any such strategy $\sigma$ induces a randomized strategy $\overline\sigma$ that acts like $\sigma$ in average, but is memoryless. In each state, when averaged over all runs, the strategies $\sigma$ and $\overline\sigma$ take the same actions with the same probabilities:

One can show that $\sigma$ and $\overline\sigma$ visit red transitions equally often in average; in particular, $\overline\sigma$ visits at least…