One Idea

Tell me the price of memory and I give you €100

Markov Decision Processes (MDPs) are Markov chains plus nondeterminism: some states are random, the others are controlled (nondeterministic). In the pictures, the random states are round, and the controlled states are squares: The random states (except the brown sink state) come with a probability distribution over the successor states. In the controlled states, however, a controller chooses a successor state. What does the controller want to achieve? That depends. In this blog post, the objective is very simple: take a red transition . The only special thing about red transitions is that the controller wants to take them. We consider only MDPs with the following properties: There are finitely many states and transitions. The MDP is acyclic, that is, it has no cycles. There are a unique start state from which any run starts (in the pictures: blue, at the top) and a unique sink state where any run ends (in the pictures: brown, at the bottom). No matter what the controller does,

One Idea

Search This Blog

Posts

Tell me the price of memory and I give you €100