Bellman optimal equation for Q

By lipeng | September 20, 2020

0 Comment

Share the joy

Q(s, a), the expected return from starting state s, by taking action a at time t.
r(s, a), reward at state s, by taking action a
maxQ(s’, a’), maximized expected return for next state-action(s’,a’). Need to find the a’, which maximizes it.

M	T	W	T	F	S	S
« Aug				Oct »
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30