Title: The Simplex and Policy-Iteration Methods are Strongly Polynomial for the Markov Decision Problem with a Constant Discount Factor
Speaker: Professor Yinyu Ye (Stanford University)
Date: Friday, March 30
Time: 2:30 pm – 3:30 pm
Place: Room 451 Mohler Lab, 200 W. Packer Avenue
Abstract: We prove that the classic policy-iteration method (Howard 1960), including the Simplex method (Dantzig 1947) with the most-negative-reduced-cost pivoting rule,
is a strongly polynomial-time algorithm for solving the Markov decision problem (MDP) with a constant discount factor. Furthermore, the computational complexity of the
policy-iteration method (including the Simplex method) is superior to that of the only known strongly polynomial-time interior-point algorithm for solving this problem.
The result is surprising since the Simplex method with the same pivoting rule was shown to be exponential for solving a general linear programming (LP) problem,
the Simplex (or simple policy-iteration) method with the smallest-index pivoting rule was shown to be exponential for solving an MDP regardless of discount rates,
and the policy-iteration method was recently shown to be exponential for solving a undiscounted MDP. We also extend the result to solving MDPs with sub-stochastic
and transient state transition probability matrices.
is a strongly polynomial-time algorithm for solving the Markov decision problem (MDP) with a constant discount factor. Furthermore, the computational complexity of the
policy-iteration method (including the Simplex method) is superior to that of the only known strongly polynomial-time interior-point algorithm for solving this problem.
The result is surprising since the Simplex method with the same pivoting rule was shown to be exponential for solving a general linear programming (LP) problem,
the Simplex (or simple policy-iteration) method with the smallest-index pivoting rule was shown to be exponential for solving an MDP regardless of discount rates,
and the policy-iteration method was recently shown to be exponential for solving a undiscounted MDP. We also extend the result to solving MDPs with sub-stochastic
and transient state transition probability matrices.
Bio: Yinyu Ye received the B.S. degree in System Engineering from the Huazhong University of Science and Technology, Wuhan, China, and the M.S. and Ph.D. degrees in
Management Science & Engineering from Stanford University, Stanford. Currently, he is a full Professor of Management Science and Engineering and Institute of
Computational and Mathematical Engineering and the Director of the MS&E Industrial Affiliates Program, Stanford University. His current research interests include
Continuous and Discrete Optimization, Mathematical Programming, Algorithm Design and Analysis, Computational Game/Market Equilibrium, Metric Distance
Geometry, Graph Realization, Dynamic Resource Allocation, and Stochastic and Robust Decision Making, etc.
Management Science & Engineering from Stanford University, Stanford. Currently, he is a full Professor of Management Science and Engineering and Institute of
Computational and Mathematical Engineering and the Director of the MS&E Industrial Affiliates Program, Stanford University. His current research interests include
Continuous and Discrete Optimization, Mathematical Programming, Algorithm Design and Analysis, Computational Game/Market Equilibrium, Metric Distance
Geometry, Graph Realization, Dynamic Resource Allocation, and Stochastic and Robust Decision Making, etc.
He was the recipient of John von Neumann Theory Prize by INFORMS (2009), and recipient of the 2009 IBM Faculty Award. He was elected vice chair of the SIAM
Activity Group on Optimization (SIAG/OPT) in 2008. He was elected to theINFORMS fellow in 2006, and awarded the Farkas prize by INFORMS Optimization Society
in the same year. He was the area editor of Math of Operation Research (2009-now), Operation Research (2006-2009), and the Co-Chief Editor of Pacific J. of Optimization (2005-2009).
Activity Group on Optimization (SIAG/OPT) in 2008. He was elected to theINFORMS fellow in 2006, and awarded the Farkas prize by INFORMS Optimization Society
in the same year. He was the area editor of Math of Operation Research (2009-now), Operation Research (2006-2009), and the Co-Chief Editor of Pacific J. of Optimization (2005-2009).
ALL FULL-TIME ISE PHD STUDENTS ARE REQUIRED TO ATTEND AND
ALL ISE GRADUATE STUDENTS ARE ENCOURAGED TO ATTEND