Bellman optimal equation for Q

By | September 20, 2020
Share the joy
  •  
  •  
  •  
  •  
  •  
  •  

 

q_function

Q(s, a), the expected return from starting state s, by taking action a at time t.
r(s, a), reward at state s, by taking action a
maxQ(s’, a’),  maximized expected return for next state-action(s’,a’). Need to find the a’, which maximizes it.