Abstract
This paper studies an online iterative algorithm for solving discrete-time multi-agent dynamic graphical games with input constraints. In order to obtain the optimal strategy of each agent, it is necessary to solve a set of coupled Hamilton-Jacobi-Bellman (HJB) equations. It is very difficult to solve HJB equations by the traditional method. The relevant game problem will become more complex if the control input of each agent in the dynamic graphical game is constrained. In this paper, an online iterative algorithm is proposed to find the online solution to dynamic graphical game without the need for drift dynamics of agents. Actually, this algorithm is to find the optimal solution of Bellman equations online. This solution employs a distributed policy iteration process, using only the local information available to each agent. It can be proved that under certain conditions, when each agent updates its own strategy simultaneously, the whole multi-agent system will reach Nash equilibrium. In the process of algorithm implementation, for each agent, two layers of neural networks are used to fit the value function and control strategy, respectively. Finally, a simulation example is given to show the effectiveness of our method.
Similar content being viewed by others
References
R. W. Beard, V. Stepanyan. Synchronization of information in distributed multiple vehicle coordination control. Proceedings of IEEE Conference on Decision and Control, Maui, 2003: 2029–2034.
B. C. Wang, Y. H. Ni, H. S. Zhang. Mean-field games for multiagent systems with multiplicative noises. Internation Journal of Robust and Nonlinear Control, 2019, 29(17): 6081–6104.
M. Abouheaf, F. L. Lewis, K. Vamvoudakis, et al. Multiagent discrete-time graphical games and reinforcement learning solutions. Automatica, 2014, 50(12): 3038–3053.
Y. Hong, J. Hu, L. Gao. Tracking control for multi-agent consensus with an active leader and variable topology. Automatica, 2006, 42(7): 1177–1182.
B. C. Wang, J. F. Zhang. Hierarchical mean field games for multiagent systems with tracking-type costs: distributed ε- Stackelberg equilibria. IEEE Transactions on Automatic Control, 2014, 59(8): 2241–2247.
W. B. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. New York: Wiley, 2009.
D. P. Bertsekas, J. N. Tsitsiklis. Neuro-dynamic programming. Encyclopedia of Optimization, 1996, 7: 2555–2560.
J. Si, A. G. Barto, W. B. Powell, D. Wunch. Handbook of Learning and Approximate Dynamic Programming. New York: Wiley, 2004.
R. S. Sutton, A. G. Barto. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998.
F. L. Lewis, D. Vrabie, V. Syrmos. Optimal Control. 3rd ed. New York: Wiley, 2012.
F. L. Lewis, D. Vrabie. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits and Systems Magazine, 2009, 9(3): 32–50.
B. Kiumarsi, F. L. Lewis, H. Modares, et al. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica, 2014, 50(4): 1167–1175.
B. Kiumarsi-Khomartash, F. L. Lewis, M.-B. Naghibi-Sistani, et al. Optimal tracking control for linear discrete-time systems using reinforcement learning. Proceedings of the 52nd IEEE Annual Conference on Decision and Control, Florence, Italy: IEEE, 2013: 3845–3850.
S. G. Khan, G. Herrmann, F. L. Lewis, et al. Reinforcement learning and optimal adaptive control: an overview and implementation examples. Annual Reviews in Control, 2012, 36(1): 42–59.
A. Al-Tamimi, F. L. Lewis, M. Abu-Khalaf. Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control. Automatica, 2007, 43(3): 473–481.
K. G. Vamvoudakis, F. L. Lewis. Multi-player non-zero sum games: Online adaptive learning solution of coupled Hamilton- Jacobi equations. Automatica, 2011, 47(8): 556–569.
K. G. Vamvoudakis, F. L. Lewis, G. R. Hudas. Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality. Automatica, 2012, 48(8): 1598–1611.
M. Liu, Y. Wan, F. L. Lewis, et al. Stochastic two-player zero-sum learning differential games. Proceedings of the 15th International Conference on Control and Automation, Edinburgh, U.K.: IEEE, 2019: 1038–1043.
B. Kiumarsi, F. L. Lewis. Actor¨Ccritic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 26(1): 140–151.
X. Li, X. Y. Zhou, A. E. B. Lim. Dynamic mean-variance portfolio selection with no-shorting constraints. SIAM Journal on Control and Optimization, 2001, 40(5): 1540–1555.
X. Y. Cui, D. Li, X. Li. Mean-variance policy for discretetime cone-constrained markets:time consistency in efficiency and the minimum-variance signed supermartingale measure. Mathematical Finance, 2017, 27(2): 471–504.
S. Khoo, L. Xie, Z. Man. Robust finite-time consensus tracking algorithm for multirobot systems. IEEE Transactions on Mechatronics, 2009, 14(2): 219–228.
R. Olfati-Saber, R. M. Murray. Consensus problems in networks of agents with switching topology and time-delays. IEEE Trans. Autom. Control, 2004, 49(9): 1520–1533.
M. Abu-Khalaf, F. L. Lewis. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica, 2005, 41(5): 779–791.
M. I. Abouheaf, M. S. Mahmoud, F. L. Lewis. Policy iteration solution for differential games with constrained control policies. Proceedings of the American Control Conference, Philadelphia: IEEE, 2019: 4301–4306.
M. I. Abouheaf, F. L. Lewis, M. S. Mahmoud, et al. Discrete-time dynamic graphical games: model-free reinforcement learning solution. Control Theory and Technology, 2015, 13(1): 55–69.
Acknowledgement
The authors would thank Prof. Xun LI for the insightful discussion.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the National Natural Science Foundation of China (Nos. 61773241, 61973183) and the Shandong Provincial Natural Science Foundation (No. ZR2019MF041).
Tianxiang WANG received the B.E. degree in Building Electricity and Intelligence from Qingdao University of Technology, Qingdao, China, in 2017. He is currently pursuing the M.Sc. degree in Control Engineering with Shandong University, Jinan, China. His current research interests include reinforce learning and multi-agent systems.
Bingchang WANG received the M.Sc. degree in Mathematics from Central South University, Changsha, China, in 2008, and the Ph.D. degree in System Theory from Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China, in 2011. From September 2011 to August 2012, he was with Department of Electrical and Computer Engineering, University of Alberta, Canada, as a Postdoctoral Fellow. From September 2012 to September 2013, he was with School of Electrical Engineering and Computer Science, University of Newcastle, Australia, as a Research Academic. From October 2013, he has been with School of Control Science and Engineering, Shandong University, China, as an Associate Professor. He held visiting appointments as a Research Associate with Carleton University, Canada, from November 2014 to May 2015, and with the Hong Kong Polytechnic University from November 2016 to January 2017. His current research interests include mean field games, stochastic control, and multi-agent systems. He received the IEEE CSS Beijing Chapter Young Author Prize in 2018.
Yong LIANG received the B.E. degree in Building Electricity and Intelligence from Qingdao University of Technology, Qingdao, China, in 2017. He is currently pursuing the Ph.D. degree in Control Engineering with Shandong University, Jinan, China. His current research interests include networked control systems and multi-agent systems.
Rights and permissions
About this article
Cite this article
Wang, T., Wang, B. & Liang, Y. Multi-agent graphical games with input constraints: an online learning solution. Control Theory Technol. 18, 148–159 (2020). https://doi.org/10.1007/s11768-020-0013-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11768-020-0013-6