## Coding the GridWorld Example from DeepMind’s Reinforcement Learning Course in Python

1. State B, moving out from which yields us +5 reward and moves to cell B’ (no matter if we go up, down, left or right)
2. States from which we may go out of the grid. So suppose we take the cell (state) in the first row, first column. If we go UP from this state then we go out of the grid. This will yield us a reward of -1 and we will come back to the state from where we started.
3. Other than that moving out of any other cell(state) yields us a reward of 0 and we move to the new cell.
1. Similarly if we go left — — -> 0.25*(-1 + 0.9*0) = -0.25
2. If we go right — — -> 0.25 *(0 + 0.9*0) = 0 # Here there are only two differences. First the immediate reward is 0, because we are still inside the grid after going right. And now we are in the cell to the right which has a value function of 0.
3. Similarly if we go down — — -> 0.25 * (0 + 0.9*0) = 0

## So now the value function of the current state , i.e. first row, first column is -0.50

Similarly, we can calculate value functions of all other states.

--

--