引用本文:朱国政,张茂光,何舒平.基于策略迭代算法的连续时间线性Markov跳变系统非零和微分反馈Nash控制[J].控制理论与应用,2020,37(8):1749~1756.[点击复制]
Zhu Guo-zheng,Zhang Mao-guang,He Shu-ping.Policy iteration-based non-zero sum differential feedback Nash control for continuous-time Markov jump linear systems[J].Control Theory and Technology,2020,37(8):1749~1756.[点击复制]
基于策略迭代算法的连续时间线性Markov跳变系统非零和微分反馈Nash控制
Policy iteration-based non-zero sum differential feedback Nash control for continuous-time Markov jump linear systems
摘要点击 1465  全文点击 589  投稿时间:2019-07-23  修订日期:2020-01-20
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2020.90603
  2020,37(8):1749-1756
中文关键词  策略迭代  Markov跳变线性系统  非零和  微分反馈Nash策略
英文关键词  policy iteration  Markov jump linear systems  non-zero sum  differential feedback Nash strategy
基金项目  国家自然科学基金项目(61673001), 安徽省杰出青年基金项目(1608085J05), 安徽省高校优秀青年人才支持重点项目(gxydZD2017001)资助
作者单位E-mail
朱国政 安徽大学 2572165091@qq.com 
张茂光 安徽大学  
何舒平* 安徽大学 shuping.he@ahu.edu.cn 
中文摘要
      针对一类连续时间线性Markov跳变系统, 本文提出了一种新的策略迭代算法用于求解系统的非零和微分 反馈Nash控制问题. 通过求解耦合的数值迭代解, 以获得具有线性动力学特性和无限时域二次成本的双层非零和 微分策略的Nash均衡解. 在每一个策略层, 采用策略迭代算法来计算与每一组给定的反馈控制策略相关联的最小 无限时域值函数. 然后,通过子系统分解将Markov跳变系统分解为N个并行的子系统, 并将该算法应用于跳变系 统. 本文提出的策略迭代算法可以很容易求解非零和微分策略所对应的耦合代数Riccati方程, 且对高维系统有效. 最后通过仿真示例证明了本文设计方法的有效性和可行性.
英文摘要
      In this paper, a new policy iterative algorithm is proposed to solve the non-zero sum differential feedback Nash control problems for a class of continuous-time Markov jump linear systems. The Nash equilibrium solution of a double-layer non-zero sum differential policy with linear dynamics and infinite time-domain secondary cost is found by solving the coupled numerical iteration solutions. At each policy layer, an policy iterative algorithm is used to calculate the minimum infinite time-domain value function associated with the set of given feedback control strategies. Then, Markov jump linear systems is decomposed into N parallel subsystems by subsystems transformation. And the algorithm is applied to jump systems. The policy iteration algorithm proposed in this paper can easily solve the coupled algebraic Riccati equations corresponding to the non-zero and differential policy. It is effective for high-dimensional systems. Finally, a simulation example is given to prove the effectiveness and feasibility of the design method.