基于Bandit反馈的自适应量化分布式在线镜像下降算法

谢俊如; 高文华; 谢奕彬

引用本文:	谢俊如,高文华,谢奕彬.基于Bandit反馈的自适应量化分布式在线镜像下降算法[J].控制理论与应用,2023,40(10):1774~1782.[点击复制]
	XIE Jun-ru,GAO Wen-hua,XIE Yi-bin.Adaptive quantized online distributed mirror descent algorithm with Bandit feedback[J].Control Theory and Technology,2023,40(10):1774~1782.[点击复制]

基于Bandit反馈的自适应量化分布式在线镜像下降算法

Adaptive quantized online distributed mirror descent algorithm with Bandit feedback

摘要点击 600 全文点击 234 投稿时间：2022-03-03 修订日期：2023-09-11

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2022.20152

2023,40(10):1774-1782

中文关键词镜像下降算法多智能体系统优化量化 Bandit反馈

英文关键词 mirror descent algorithm multi-agent systems optimization quantization Bandit feedback

基金项目国家自然科学基金项目(62273157), 广州市科技计划项目(202002030158)

作者	单位	E-mail
谢俊如	华南理工大学	junru_xie@163.com
高文华^*	华南理工大学	whgao@scut.edu.cn
谢奕彬	华南理工大学

中文摘要

多智能体系统的在线分布式优化常用于处理动态环境下的优化问题, 节点间需要实时传输数据流. 在很多情况下, 各节点无法获取个体目标函数的全部信息(包括梯度信息), 并且节点间信息传输存在一定的通信约束. 考虑到非欧投影意义下的镜像下降算法在处理高维数据和大规模在线学习上的优势, 本文使用个体目标函数在两点处的函数值信息对缺失的梯度信息进行估计, 并且根据镜像下降算法的性质设计自适应量化器, 提出基于Bandit反馈的自适应量化分布式在线镜像下降算法. 然后分析了量化误差界和Regret界的关系, 适当选择参数可得所提算法的Regret界为O(√T). 最后, 通过数值仿真验证了算法和理论结果的有效性

英文摘要

Online distributed optimization of multi-agent systems is often used to deal with the optimization problems in dynamic environments, and real-time data streams need to be transmitted between nodes. In many cases, each node cannot obtain all the information of the individual objective function (including gradient information), and there are communication constraints in the transmission of information between nodes. In this paper, considering the advantages of the mirror descent algorithm in the sense of non-Euclidean projection in processing high-dimensional data and large-scale online learning, the function value information of the individual objective function at two points is used to estimate the missing gradient information, and an adaptive quantizer is designed according to the property of mirror descent algorithm, and an adaptive quantized distributed online mirror descent algorithm based on the bandit feedback is proposed. Then the relationship between the quantization error bound and the regret bound is analyzed. The regret bound of the proposed algorithm can be obtained as O(√T) when the parameters are chosen appropriately. Finally, the effectiveness of the algorithm and theoretical results is verified by numerical simulations.