quotation:[Copy]
Xueqing SUN,Tao MAO,Laura RAY,Dongqing SHI,Jerald KRALIK.[en_title][J].Control Theory and Technology,2011,9(3):440~450.[Copy]
【Print page】 【Online reading】【Download 【PDF Full text】 View/Add CommentDownload reader Close

←Previous page|Page Next →

Back Issue    Advanced search

This Paper:Browse 2062   Download 494 本文二维码信息
码上扫一扫!
XueqingSUN,TaoMAO,LauraRAY,DongqingSHI,JeraldKRALIK
0
(Thayer School of Engineering, Dartmouth College;Department of Psychological and Brain Sciences, Dartmouth College)
摘要:
关键词:  
DOI:
Received:March 09, 2011Revised:March 09, 2011
基金项目:This work was supported by the Office of Naval Research under Multi-University Research Initiative (MURI) (No.N00014-08-1-0693).
Hierarchical state-abstracted and socially augmented Q-Learning for reducing complexity in agent-based learning
Xueqing SUN,Tao MAO,Laura RAY,Dongqing SHI,Jerald KRALIK
(Thayer School of Engineering, Dartmouth College;Department of Psychological and Brain Sciences, Dartmouth College)
Abstract:
A primary challenge of agent-based policy learning in complex and uncertain environments is escalating computational complexity with the size of the task space (action choices and world states) and the number of agents. Nonetheless, there is ample evidence in the natural world that high-functioning social mammals learn to solve complex problems with ease, both individually and cooperatively. This ability to solve computationally intractable problems stems from both brain circuits for hierarchical representation of state and action spaces and learned policies as well as constraints imposed by social cognition. Using biologically derived mechanisms for state representation and mammalian social intelligence, we constrain state-action choices in reinforcement learning in order to improve learning efficiency. Analysis results bound the reduction in computational complexity due to state abstraction, hierarchical representation, and socially constrained action selection in agent-based learning problems that can be described as variants of Markov decision processes. Investigation of two task domains, single-robot herding and multirobot foraging, shows that theoretical bounds hold and that acceptable policies emerge, which reduce task completion time, computational cost, and/or memory resources compared to learning without hierarchical representations and with no social knowledge.
Key words:  Decentralized Markov decision process  Reinforcement learning  Multiagent systems