Daily Archives: September 20, 2020

Bellman optimal equation for Q

 

q_function

Q(s, a), the expected return from starting state s, by taking action a at time t.
r(s, a), reward at state s, by taking action a
maxQ(s’, a’),  maximized expected return for next state-action(s’,a’). Need to find the a’, which maximizes it.