带Markov跳的离散时间随机控制系统的最大值原理

蔺香运; 王鑫瑞; 张维海

引用本文:	蔺香运,王鑫瑞,张维海.带Markov跳的离散时间随机控制系统的最大值原理[J].控制理论与应用,2024,41(5):895~904.[点击复制]
	LIN Xiang-yun,WANG Xin-rui,ZHANG Wei-hai.A maximum principle for optimal control of discrete-time stochastic systems with Markov jump[J].Control Theory and Technology,2024,41(5):895~904.[点击复制]

带Markov跳的离散时间随机控制系统的最大值原理

A maximum principle for optimal control of discrete-time stochastic systems with Markov jump

摘要点击 2830 全文点击 128 投稿时间：2021-08-27 修订日期：2023-10-22

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2022.10807

2024,41(5):895-904

中文关键词最大值原理最优控制 Markov跳倒向随机差分方程 Hamilton-Jacobi-Bellman方程

英文关键词 maximum principle optimal control Markov jump backward stochastic difference equations HamiltonJacobi-Bellman equations

基金项目国家自然科学基金项目(62273212, 61973198), 山东省泰山学者项目研究基金项目, 山东省自然科学基金项目(ZR2020MF062)

作者	单位	E-mail
蔺香运^*	山东科技大学	lxy9393@sina.com
王鑫瑞	山东科技大学
张维海	山东科技大学

中文摘要

本文研究一类同时含有Markov跳过程和乘性噪声的离散时间非线性随机系统的最优控制问题, 给出并证明了相应的最大值原理. 首先, 利用条件期望的平滑性, 通过引入具有适应解的倒向随机差分方程, 给出了带有线性差分方程约束的线性泛函的表示形式, 并利用Riesz定理证明其唯一性. 其次, 对带Markov跳的非线性随机控制系统, 利用针状变分法, 对状态方程进行一阶变分, 获得其变分所满足的线性差分方程. 然后, 在引入Hamilton函数的基础上, 通过一对由倒向随机差分方程刻画的伴随方程, 给出并证明了带有Markov跳的离散时间非线性随机最优控制问题的最大值原理, 并给出该最优控制问题的一个充分条件和相应的Hamilton-Jacobi-Bellman方程. 最后, 通过一个实际例子说明了所提理论的实用性和可行性.

英文摘要

The maximum principle (MP) of the discrete-time nonlinear stochastic optimal control problem is proved, in which the control systems are driven by both Markov jumps and multiplicative noise. Firstly, based on the adapted solutions of the backward stochastic difference equation, the linear functional with the constraint of a linear difference equation is represented. The Riesz theorem is used to prove the uniqueness of such representation. Secondly, the spike variation method is extend to the nonlinear stochastic difference equation with Markov jumps. The variation equation of such state equation is obtained. Thirdly, by introducing a Hamiltonian function, a necessary condition of the discrete-time nonlinear stochastic optimal control system with Markov jump is obtained. It is proved that the adjoint equation of the maximum principle of the system is a pair of backward stochastic difference equations. Moreover, a sufficient condition is also given and the corresponding Hamilton-Jacobi-Bellman equation is derived. Finally, a practical example is given to illustrate the practicability and feasibility of the proposed theory.