基于策略迭代算法的连续时间线性Markov跳变系统非零和微分反馈Nash控制

朱国政; 张茂光; 何舒平

引用本文:	朱国政,张茂光,何舒平.基于策略迭代算法的连续时间线性Markov跳变系统非零和微分反馈Nash控制[J].控制理论与应用,2020,37(8):1749~1756.[点击复制]
	Zhu Guo-zheng,Zhang Mao-guang,He Shu-ping.Policy iteration-based non-zero sum differential feedback Nash control for continuous-time Markov jump linear systems[J].Control Theory and Technology,2020,37(8):1749~1756.[点击复制]

基于策略迭代算法的连续时间线性Markov跳变系统非零和微分反馈Nash控制

Policy iteration-based non-zero sum differential feedback Nash control for continuous-time Markov jump linear systems

摘要点击 1686 全文点击 640 投稿时间：2019-07-23 修订日期：2020-01-20

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2020.90603

2020,37(8):1749-1756

中文关键词策略迭代 Markov跳变线性系统非零和微分反馈Nash策略

英文关键词 policy iteration Markov jump linear systems non-zero sum differential feedback Nash strategy

基金项目国家自然科学基金项目(61673001), 安徽省杰出青年基金项目(1608085J05), 安徽省高校优秀青年人才支持重点项目(gxydZD2017001)资助

作者	单位	E-mail
朱国政	安徽大学	2572165091@qq.com
张茂光	安徽大学
何舒平^*	安徽大学	shuping.he@ahu.edu.cn

中文摘要

针对一类连续时间线性Markov跳变系统, 本文提出了一种新的策略迭代算法用于求解系统的非零和微分反馈Nash控制问题. 通过求解耦合的数值迭代解, 以获得具有线性动力学特性和无限时域二次成本的双层非零和微分策略的Nash均衡解. 在每一个策略层, 采用策略迭代算法来计算与每一组给定的反馈控制策略相关联的最小无限时域值函数. 然后，通过子系统分解将Markov跳变系统分解为N个并行的子系统, 并将该算法应用于跳变系统. 本文提出的策略迭代算法可以很容易求解非零和微分策略所对应的耦合代数Riccati方程, 且对高维系统有效. 最后通过仿真示例证明了本文设计方法的有效性和可行性.

英文摘要

In this paper, a new policy iterative algorithm is proposed to solve the non-zero sum differential feedback Nash control problems for a class of continuous-time Markov jump linear systems. The Nash equilibrium solution of a double-layer non-zero sum differential policy with linear dynamics and infinite time-domain secondary cost is found by solving the coupled numerical iteration solutions. At each policy layer, an policy iterative algorithm is used to calculate the minimum infinite time-domain value function associated with the set of given feedback control strategies. Then, Markov jump linear systems is decomposed into N parallel subsystems by subsystems transformation. And the algorithm is applied to jump systems. The policy iteration algorithm proposed in this paper can easily solve the coupled algebraic Riccati equations corresponding to the non-zero and differential policy. It is effective for high-dimensional systems. Finally, a simulation example is given to prove the effectiveness and feasibility of the design method.