Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes
Peter Marbach, John Tsitsiklis
Discrete Event Dynamic Systems, vol. 13, no. 1-2, pp. 111–148, Kluwer Academic Publishers, January 2003
Bibtex