专家监督的SAC强化学习重载列车运行优化控制

杨辉; 王禹; 李中奇; 付雅婷; 谭畅

引用本文:	杨辉,王禹,李中奇,付雅婷,谭畅.专家监督的SAC强化学习重载列车运行优化控制[J].控制理论与应用,2022,39(5):799~808.[点击复制]
	YANG Hui,WANG Yu,LI Zhong-qi,FU Ya-ting,TAN Chang.Supervised SAC reinforcement learning method for heavy haul train optimization control[J].Control Theory and Technology,2022,39(5):799~808.[点击复制]

专家监督的SAC强化学习重载列车运行优化控制

Supervised SAC reinforcement learning method for heavy haul train optimization control

摘要点击 1246 全文点击 404 投稿时间：2021-02-10 修订日期：2022-01-10

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2021.10132

2022,39(5):799-808

中文关键词重载列车强化学习行为克隆专家策略

英文关键词 heavy haul train reinforcement learning behavior clone expertise strategy

基金项目国家自然科学基金项目(U2034211, 62003138, 61803155), 江西省自然科学基金项目(20202BAB202005), 江西省科技专项(20203AEI009), 江西省青年科学基金重点资助项目(20192ACBL21005)资助.

作者	单位	E-mail
杨辉^*	华东交通大学	yhshuo@263.net
王禹	华东交通大学
李中奇	华东交通大学
付雅婷	华东交通大学
谭畅	华东交通大学

中文摘要

重载列车是我国大宗商品运输的重要方式, 因载重大、车身长、线路复杂等因素导致重载列车的控制变得困难. 本文将列车运行过程分为启动牵引、巡航控制、停车制动3个阶段, 基于多质点重载列车纵向动力学模型, 考虑常用空气制动, 利用(SAC)强化学习方法, 结合循环神经网络对专家经验数据进行行为克隆, 并将克隆出的专家策略对强化学习训练进行监督, 训练了一种新的智能驾驶操控策略. 本文的策略可以高效学习驾驶经验数据, 不断从学习中提高目标奖励, 得到最优控制策略. 仿真结果表明: 本文所提的控制策略比未受专家模型监督的强化学习算法更优, 奖励提升的周期更快, 并能获得更高的奖励, 训练出的控制器运行效果更加高效、稳定.

英文摘要

Heavy haul train is an important transportation way of bulk commodity in our country. The control of heavy haul train becomes difficult due to factors such as heavy load, long body length, and complex line conditions. In this paper, the train operation process is divided into three stages: startup mode, cruise mode, and brake mode. Based on the longitudinal dynamics model of the multi-point mass heavy haul train, the common air brake is considered, using soft actorcritic (SAC) reinforcement learning method, combined with expert contorl strategy that trained by recurrent neural network fitting with expertise data, which called “behavior clone”, to supervise reinforcement learning process. A new intelligent driving control strategy is trained. The strategy in this paper can efficiently learn the driving experience data, continuously improve the total reward from the learning, and obtain the optimal control strategy. The result of simulation shows that the control strategy proposed in this paper is better than the reinforcement learning algorithm that is not supervised by the expert model, the period of reward promotion is faster, higher rewards can be obtained, and the training controller operates more efficiently and stably.