Simulation-based optimization of Markov reward processes: implementation issues

Peter Marbach, John Tsitsiklis

Proceedings of the 38th IEEE Conference on Decision and Control, , December 1999

 

Abstract

We consider discrete time, finite state space Markov reward processes which depend on a set of parameters. Previously, we proposed a simulation-based methodology to tune the parameters to optimize the average reward. The resulting algorithms converge with probability 1, but may have a high variance. Here, we propose two approaches to reduce the variance, which however introduce a new bias into the update direction. We report numerical results which indicate that the resulting algorithms are robust with respect to a small bias

 

Bibtex

Bib