Coding the GridWorld Example from DeepMind’s Reinforcement Learning Course in Python

Fig 3.2 [1]
Here is a description of the GridWorld example [1]
Fig 3.3 [1]
Formula 3.14 [1]
  1. State B, moving out from which yields us +5 reward and moves to cell B’ (no matter if we go up, down, left or right)
  2. States from which we may go out of the grid. So suppose we take the cell (state) in the first row, first column. If we go UP from this state then we go out of the grid. This will yield us a reward of -1 and we will come back to the state from where we started.
  3. Other than that moving out of any other cell(state) yields us a reward of 0 and we move to the new cell.
  1. Similarly if we go left — — -> 0.25*(-1 + 0.9*0) = -0.25
  2. If we go right — — -> 0.25 *(0 + 0.9*0) = 0 # Here there are only two differences. First the immediate reward is 0, because we are still inside the grid after going right. And now we are in the cell to the right which has a value function of 0.
  3. Similarly if we go down — — -> 0.25 * (0 + 0.9*0) = 0

So now the value function of the current state , i.e. first row, first column is -0.50

Similarly, we can calculate value functions of all other states.


  1. Reinforcement Learning : An Introduction | Second Edition by Richard S. Sutton & Andrew G. Barto
  2. DeepMind Reinforcement Learning Course by David Silver



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store