Markov Decision Processes (MDPs) are Markov chains plus nondeterminism:
some states are random, the others are controlled (nondeterministic).
In the pictures, the random states are round, and the controlled state are squares:

The random states come with a distribution over successor states, but in the controlled states a

What objective should the controller aim at? In this post, the objective will be the following:

Since the controller wants to visit (infinitely many) green states…

The random states come with a distribution over successor states, but in the controlled states a

*controller*chooses a successor state (or a probability distribution over the successor states). For instance, the controller could stay on the leftmost column forever, by always choosing to go one state down. Or the controller could go right at some point; in the random state a successor is picked randomly, either the initial state or the state on the right, according to the blue probabilities.What objective should the controller aim at? In this post, the objective will be the following:

**visit green states infinitely often, and red states only finitely often**. Here is the previous MDP with colours:Since the controller wants to visit (infinitely many) green states…