基于元强化学习的自动列车定速控制

颜罡; 赵斐然; 叶锋; 吴俊博; 游科友

引用本文:	颜罡,赵斐然,叶锋,吴俊博,游科友.基于元强化学习的自动列车定速控制[J].控制理论与应用,2022,39(10):1807~1814.[点击复制]
	YAN Gang,ZHAO Fei-ran,YE Feng,WU Jun-bo,YOU Ke-you.Meta-reinforcement learning based velocity regulation for automatic train operation[J].Control Theory and Technology,2022,39(10):1807~1814.[点击复制]

基于元强化学习的自动列车定速控制

Meta-reinforcement learning based velocity regulation for automatic train operation

摘要点击 1804 全文点击 440 投稿时间：2021-07-06 修订日期：2022-05-06

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2022.10595

2022,39(10):1807-1814

中文关键词定速控制,马尔可夫过程,强化学习,元学习,神经网络

英文关键词 Velocity regulation, Markov decision procession, Reinforcement learning, Meta-learning, Neural network

基金项目国家自然科学基金重点项目

作者	单位	E-mail
颜罡	中车株洲电力机车有限公司和大功率交流传动电力机车系统集成国家重点实验室	yan.x.gang@163.com
赵斐然	清华大学自动化系
叶锋	中车株洲电力机车有限公司
吴俊博	中车株洲电力机车有限公司
游科友^*	清华大学自动化系	youky@tsinghua.edu.cn

中文摘要

本文考虑自动列车在路况变化下的定速控制问题. 由于铁路路况的复杂以及列车动力学的不确定性, 基于模型的控制器难以稳定、快速、精确地进行定速控制. 我们提出了一种无模型控制器, 其只需要很少的列车运行数据即可适应新的路况. 首先, 我们将列车的定速控制问题建模为一系列转移概率未知的静态连续马尔可夫过程. 然后, 我们应用元强化学习去求解该马尔可夫过程, 得到自适应神经网络控制器. 仿真说明该无模型控制器能够高效地进行定速控制, 并能迅速适应新的环境, 同时满足系统约束.

英文摘要

This paper considers the velocity regulation problem for the automatic train operation system under time-variant railway conditions. Due to complex environment and uncertainites in system dynamics, this problem cannot be well solved by most model-based controllers. To this end, we propose a model-free controller, which only requires a ``small'' amount of data to adapt to the new railway condition. First, we formulate the velocity regulation problem for the automatic train as a sequence of stationary and continuous Markov decision processes (MDPs) with unknown transition probabilities. Then, we adopt the meta-reinforcement learning framework to solve the MDPs and to train an initial neural-network controller, which is able to adapt to new environment quickly using observed data. Finally, We illustrate via simulations that our model-free controller can regulate the train to the desired velocity and well adapt to the time-variant railway conditions, while satisfying the constraints in the dyamical system. Moreover, the experiments also show the robustness of our controller under uncertain dynamics.