融合强化学习和进化算法的高超声速飞行器航迹规划

池海红; 周明鑫

引用本文:	池海红,周明鑫.融合强化学习和进化算法的高超声速飞行器航迹规划[J].控制理论与应用,2022,39(5):847~856.[点击复制]
	CHI Hai-hong,ZHOU Ming-xin.Trajectory planning for hypersonic vehicle combined with reinforcement learning and evolutionary algorithms[J].Control Theory and Technology,2022,39(5):847~856.[点击复制]

融合强化学习和进化算法的高超声速飞行器航迹规划

Trajectory planning for hypersonic vehicle combined with reinforcement learning and evolutionary algorithms

摘要点击 1561 全文点击 491 投稿时间：2021-06-03 修订日期：2022-03-17

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2021.10478

2022,39(5):847-856

中文关键词强化学习深度强化学习高超声速飞行器航迹规划

英文关键词 reinforcement learning deep reinforcement learning hypersonic vehicles trajectory planning

基金项目国家重点研发计划项目(2018YFC0310102)资助.

作者	单位	E-mail
池海红	哈尔滨工程大学	chi_hon@hrbeu.edu.cn
周明鑫^*	哈尔滨工程大学	1147596768@qq.com

中文摘要

由于高超声速飞行器的复杂特性, 对其进行航迹规划是一项非常困难的任务. 本文针对高超声速飞行器巡航段, 提出了一种将无模型的强化学习和交叉熵方法相结合的在线航迹规划算法. 本文将航迹规划问题建模为环境信息缺失程度不同的马尔可夫决策过程, 利用(PPO)算法在建立的飞行环境模拟器中离线训练智能体, 并通过提高智能体的动作在时间上的相关性来保证航迹的曲率平滑. 交叉熵方法则以已训练的智能体由观测到的状态给出的动作作为一种先验知识, 进一步在线优化规划策略. 实验结果表明了本文的方法可以生成曲率平滑的航迹, 在复杂的飞行环境中具有较高的成功率, 并且可以泛化到不同的飞行环境中.

英文摘要

It is difficult to plan the flight trajectory for hypersonic vehicle because of its sophisticated characteristics. In this paper, an online trajectory planning algorithm combining model-free reinforcement learning and cross-entropy method for hypersonic vehicle in the cruise phase is proposed. The trajectory planning problem is modeled as Markov decision processes with different degrees of missing environmental information. The agent is trained off-line in the flight environment simulator by using proximal policy optimization (PPO) algorithm, and the curvature smoothness of the trajectory is ensured by improving the temporal correlation of the agent’s action. The cross-entropy method uses the actions of the trained agent from the observed state as a kind of prior knowledge to further optimize the planning policy online. Simulation results provide the evidence that the proposed method can generate curvature smooth trajectories with high success rate in complex flight environment, and can be generalized to different flight environments.