基于分布式深度强化学习的微电网实时优化调度

郭方洪; 何通; 吴祥; 董辉; 刘冰

引用本文:	郭方洪,何通,吴祥,董辉,刘冰.基于分布式深度强化学习的微电网实时优化调度[J].控制理论与应用,2022,39(10):1881~1889.[点击复制]
	GUO Fang-hong,HE Tong,WU Xiang,DONG Hui,LIU Bing.Real-time optimal scheduling for microgrid systems based on distributed deep reinforcement learning[J].Control Theory and Technology,2022,39(10):1881~1889.[点击复制]

基于分布式深度强化学习的微电网实时优化调度

Real-time optimal scheduling for microgrid systems based on distributed deep reinforcement learning

摘要点击 3135 全文点击 459 投稿时间：2021-09-30 修订日期：2022-09-16

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2022.10932

2022,39(10):1881-1889

中文关键词深度强化学习分布式优化微电网优化调度优化算法

英文关键词 Deep reinforcement learning distributed optimization Microgrid optimal scheduling Optimization algorithm

基金项目国家自然科学基金青年基金项目(61903333) , 浙江省“钱江人才”特殊急需类项目(QJD1902010)

作者	单位	E-mail
郭方洪	浙江工业大学	fhguo@zjut.edu.cn
何通	浙江工业大学
吴祥	浙江工业大学
董辉^*	浙江工业大学	hdong@zjut.edu.cn
刘冰	浙江工业大学

中文摘要

随着海量新能源接入到微电网中, 微电网系统模型的参数空间成倍增长, 其能量优化调度的计算难度不断上升. 同时, 新能源电源出力的不确定性也给微电网的优化调度带来巨大挑战. 针对上述问题, 本文提出了一种基于分布式深度强化学习的微电网实时优化调度策略. 首先, 在分布式的架构下, 将主电网和每个分布式电源看作独立智能体. 其次, 各智能体拥有一个本地学习模型, 并根据本地数据分别建立状态和动作空间, 设计一个包含发电成本、交易电价、电源使用寿命等多目标优化的奖励函数及其约束条件. 最后, 各智能体通过与环境交互来寻求本地最优策略, 同时智能体之间相互学习价值网络参数, 优化本地动作选择, 最终实现最小化微电网系统运行成本的目标. 仿真结果表明, 与深度确定性策略梯度算法(Deep Deterministic Policy Gradient, DDPG)相比, 本方法在保证系统稳定以及求解精度的前提下, 训练速度提高了17.6%, 成本函数值降低了67%, 实现了微电网实时优化调度.

英文摘要

With more and more renewable energy resources penetrating into the microgrid system, it results in doubling of the parameter space of the microgrid system model, and thus the computational complexity of its real-time optimal scheduling keeps rising. At the same time, the uncertainty of renewable energy resources also brings great challenges to the optimal scheduling problem of microgrids. To tackle above problems, this paper proposes a real-time optimal scheduling strategy for microgrid, which is based on distributed deep reinforcement learning approach. Firstly, under the distributed architecture, each distributed generator and main grid are treated as independent agents. Secondly, each agent has a local learning model, and it establishes its state and action space respectively based on local data. A multi-objective optimization reward function and constraint conditions are designed, which include power generation cost, transaction price, power supply life and so on. Finally, each agent seeks its optimal strategy by interacting with the environment, and meanwhile, agents learn value strategies from each other to optimize local action selection so as to minimize overall operation cost. The simulation results show that, compared to the deep deterministic strategy gradient algorithm, our method improves the training speed by 17.6% and reduces the cost function value by 67%, which meets the requirement of real-time optimal scheduling for microgrids, while ensuring the stability of the system and the accuracy of the solution.