quotation:		[Copy]
		Xueqing SUN,Tao MAO,Laura RAY,Dongqing SHI,Jerald KRALIK.[en_title][J].Control Theory and Technology,2011,9(3):440~450.[Copy]

This Paper:Browse 2144 Download 494	码上扫一扫！

XueqingSUN,TaoMAO,LauraRAY,DongqingSHI,JeraldKRALIK
0 Fontlarge +\|Default\|Small
(Thayer School of Engineering, Dartmouth College;Department of Psychological and Brain Sciences, Dartmouth College)

摘要:

关键词:

DOI：

Received:March 09, 2011Revised:March 09, 2011

基金项目:This work was supported by the Office of Naval Research under Multi-University Research Initiative (MURI) (No.N00014-08-1-0693).

Hierarchical state-abstracted and socially augmented Q-Learning for reducing complexity in agent-based learning

Xueqing SUN,Tao MAO,Laura RAY,Dongqing SHI,Jerald KRALIK

(Thayer School of Engineering, Dartmouth College;Department of Psychological and Brain Sciences, Dartmouth College)

Abstract:

A primary challenge of agent-based policy learning in complex and uncertain environments is escalating computational complexity with the size of the task space (action choices and world states) and the number of agents. Nonetheless, there is ample evidence in the natural world that high-functioning social mammals learn to solve complex problems with ease, both individually and cooperatively. This ability to solve computationally intractable problems stems from both brain circuits for hierarchical representation of state and action spaces and learned policies as well as constraints imposed by social cognition. Using biologically derived mechanisms for state representation and mammalian social intelligence, we constrain state-action choices in reinforcement learning in order to improve learning efficiency. Analysis results bound the reduction in computational complexity due to state abstraction, hierarchical representation, and socially constrained action selection in agent-based learning problems that can be described as variants of Markov decision processes. Investigation of two task domains, single-robot herding and multirobot foraging, shows that theoretical bounds hold and that acceptable policies emerge, which reduce task completion time, computational cost, and/or memory resources compared to learning without hierarchical representations and with no social knowledge.

Key words: Decentralized Markov decision process Reinforcement learning Multiagent systems