Skip to main content
Log in

Multi-agent graphical games with input constraints: an online learning solution

  • Published:
Control Theory and Technology Aims and scope Submit manuscript

Abstract

This paper studies an online iterative algorithm for solving discrete-time multi-agent dynamic graphical games with input constraints. In order to obtain the optimal strategy of each agent, it is necessary to solve a set of coupled Hamilton-Jacobi-Bellman (HJB) equations. It is very difficult to solve HJB equations by the traditional method. The relevant game problem will become more complex if the control input of each agent in the dynamic graphical game is constrained. In this paper, an online iterative algorithm is proposed to find the online solution to dynamic graphical game without the need for drift dynamics of agents. Actually, this algorithm is to find the optimal solution of Bellman equations online. This solution employs a distributed policy iteration process, using only the local information available to each agent. It can be proved that under certain conditions, when each agent updates its own strategy simultaneously, the whole multi-agent system will reach Nash equilibrium. In the process of algorithm implementation, for each agent, two layers of neural networks are used to fit the value function and control strategy, respectively. Finally, a simulation example is given to show the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. R. W. Beard, V. Stepanyan. Synchronization of information in distributed multiple vehicle coordination control. Proceedings of IEEE Conference on Decision and Control, Maui, 2003: 2029–2034.

    Google Scholar 

  2. B. C. Wang, Y. H. Ni, H. S. Zhang. Mean-field games for multiagent systems with multiplicative noises. Internation Journal of Robust and Nonlinear Control, 2019, 29(17): 6081–6104.

    Article  MathSciNet  Google Scholar 

  3. M. Abouheaf, F. L. Lewis, K. Vamvoudakis, et al. Multiagent discrete-time graphical games and reinforcement learning solutions. Automatica, 2014, 50(12): 3038–3053.

    Article  MathSciNet  Google Scholar 

  4. Y. Hong, J. Hu, L. Gao. Tracking control for multi-agent consensus with an active leader and variable topology. Automatica, 2006, 42(7): 1177–1182.

    Article  MathSciNet  Google Scholar 

  5. B. C. Wang, J. F. Zhang. Hierarchical mean field games for multiagent systems with tracking-type costs: distributed ε- Stackelberg equilibria. IEEE Transactions on Automatic Control, 2014, 59(8): 2241–2247.

    Article  MathSciNet  Google Scholar 

  6. W. B. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. New York: Wiley, 2009.

    MATH  Google Scholar 

  7. D. P. Bertsekas, J. N. Tsitsiklis. Neuro-dynamic programming. Encyclopedia of Optimization, 1996, 7: 2555–2560.

    MATH  Google Scholar 

  8. J. Si, A. G. Barto, W. B. Powell, D. Wunch. Handbook of Learning and Approximate Dynamic Programming. New York: Wiley, 2004.

    Book  Google Scholar 

  9. R. S. Sutton, A. G. Barto. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998.

    MATH  Google Scholar 

  10. F. L. Lewis, D. Vrabie, V. Syrmos. Optimal Control. 3rd ed. New York: Wiley, 2012.

    Book  Google Scholar 

  11. F. L. Lewis, D. Vrabie. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits and Systems Magazine, 2009, 9(3): 32–50.

    Article  Google Scholar 

  12. B. Kiumarsi, F. L. Lewis, H. Modares, et al. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica, 2014, 50(4): 1167–1175.

    Article  MathSciNet  Google Scholar 

  13. B. Kiumarsi-Khomartash, F. L. Lewis, M.-B. Naghibi-Sistani, et al. Optimal tracking control for linear discrete-time systems using reinforcement learning. Proceedings of the 52nd IEEE Annual Conference on Decision and Control, Florence, Italy: IEEE, 2013: 3845–3850.

    Chapter  Google Scholar 

  14. S. G. Khan, G. Herrmann, F. L. Lewis, et al. Reinforcement learning and optimal adaptive control: an overview and implementation examples. Annual Reviews in Control, 2012, 36(1): 42–59.

    Article  Google Scholar 

  15. A. Al-Tamimi, F. L. Lewis, M. Abu-Khalaf. Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control. Automatica, 2007, 43(3): 473–481.

    Article  MathSciNet  Google Scholar 

  16. K. G. Vamvoudakis, F. L. Lewis. Multi-player non-zero sum games: Online adaptive learning solution of coupled Hamilton- Jacobi equations. Automatica, 2011, 47(8): 556–569.

    Article  MathSciNet  Google Scholar 

  17. K. G. Vamvoudakis, F. L. Lewis, G. R. Hudas. Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality. Automatica, 2012, 48(8): 1598–1611.

    Article  MathSciNet  Google Scholar 

  18. M. Liu, Y. Wan, F. L. Lewis, et al. Stochastic two-player zero-sum learning differential games. Proceedings of the 15th International Conference on Control and Automation, Edinburgh, U.K.: IEEE, 2019: 1038–1043.

    Google Scholar 

  19. B. Kiumarsi, F. L. Lewis. Actor¨Ccritic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 26(1): 140–151.

    Article  Google Scholar 

  20. X. Li, X. Y. Zhou, A. E. B. Lim. Dynamic mean-variance portfolio selection with no-shorting constraints. SIAM Journal on Control and Optimization, 2001, 40(5): 1540–1555.

    Article  Google Scholar 

  21. X. Y. Cui, D. Li, X. Li. Mean-variance policy for discretetime cone-constrained markets:time consistency in efficiency and the minimum-variance signed supermartingale measure. Mathematical Finance, 2017, 27(2): 471–504.

    Article  MathSciNet  Google Scholar 

  22. S. Khoo, L. Xie, Z. Man. Robust finite-time consensus tracking algorithm for multirobot systems. IEEE Transactions on Mechatronics, 2009, 14(2): 219–228.

    Article  Google Scholar 

  23. R. Olfati-Saber, R. M. Murray. Consensus problems in networks of agents with switching topology and time-delays. IEEE Trans. Autom. Control, 2004, 49(9): 1520–1533.

    Article  MathSciNet  Google Scholar 

  24. M. Abu-Khalaf, F. L. Lewis. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica, 2005, 41(5): 779–791.

    Article  MathSciNet  Google Scholar 

  25. M. I. Abouheaf, M. S. Mahmoud, F. L. Lewis. Policy iteration solution for differential games with constrained control policies. Proceedings of the American Control Conference, Philadelphia: IEEE, 2019: 4301–4306.

    Google Scholar 

  26. M. I. Abouheaf, F. L. Lewis, M. S. Mahmoud, et al. Discrete-time dynamic graphical games: model-free reinforcement learning solution. Control Theory and Technology, 2015, 13(1): 55–69.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgement

The authors would thank Prof. Xun LI for the insightful discussion.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bingchang Wang.

Additional information

This work was supported by the National Natural Science Foundation of China (Nos. 61773241, 61973183) and the Shandong Provincial Natural Science Foundation (No. ZR2019MF041).

Tianxiang WANG received the B.E. degree in Building Electricity and Intelligence from Qingdao University of Technology, Qingdao, China, in 2017. He is currently pursuing the M.Sc. degree in Control Engineering with Shandong University, Jinan, China. His current research interests include reinforce learning and multi-agent systems.

Bingchang WANG received the M.Sc. degree in Mathematics from Central South University, Changsha, China, in 2008, and the Ph.D. degree in System Theory from Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China, in 2011. From September 2011 to August 2012, he was with Department of Electrical and Computer Engineering, University of Alberta, Canada, as a Postdoctoral Fellow. From September 2012 to September 2013, he was with School of Electrical Engineering and Computer Science, University of Newcastle, Australia, as a Research Academic. From October 2013, he has been with School of Control Science and Engineering, Shandong University, China, as an Associate Professor. He held visiting appointments as a Research Associate with Carleton University, Canada, from November 2014 to May 2015, and with the Hong Kong Polytechnic University from November 2016 to January 2017. His current research interests include mean field games, stochastic control, and multi-agent systems. He received the IEEE CSS Beijing Chapter Young Author Prize in 2018.

Yong LIANG received the B.E. degree in Building Electricity and Intelligence from Qingdao University of Technology, Qingdao, China, in 2017. He is currently pursuing the Ph.D. degree in Control Engineering with Shandong University, Jinan, China. His current research interests include networked control systems and multi-agent systems.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, T., Wang, B. & Liang, Y. Multi-agent graphical games with input constraints: an online learning solution. Control Theory Technol. 18, 148–159 (2020). https://doi.org/10.1007/s11768-020-0013-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11768-020-0013-6

Keywords

Navigation