搬运系统作业分配问题的小脑模型关节控制器Q学习算法

唐昊; 丁丽洁; 程文娟; 周雷

引用本文:	唐昊,丁丽洁,程文娟,周雷.搬运系统作业分配问题的小脑模型关节控制器Q学习算法[J].控制理论与应用,2009,26(8):884~888.[点击复制]
	tanghao,DING Lijie,Cheng Wenjuan,ZHOU Lei.The cerebellar-model-articulation-controller Q-learning for the task assignment of a handling system[J].Control Theory and Technology,2009,26(8):884~888.[点击复制]

搬运系统作业分配问题的小脑模型关节控制器Q学习算法

The cerebellar-model-articulation-controller Q-learning for the task assignment of a handling system

摘要点击 1929 全文点击 1206 投稿时间：2008-05-25 修订日期：2008-11-16

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/j.issn.1000-8152.2009.8.CCTA080522

2009,26(8):884-888

中文关键词作业分配 Markov决策过程 Q学习 CMAC

英文关键词 task assignment MDP Q-learning CMAC

基金项目国家自然科学基金资助项目(60404009); 安徽省自然科学基金资助项目(090412046,070416242); 安徽高校省级自然科学研究重点资助项目(KJ2007A063,KJ2008A058); 教育部留学回国人员科研启动基金资助项目.

作者	单位	E-mail
唐昊^*	合肥工业大学计算机与信息学院, 安徽合肥230009 安全关键工业测控技术教育部工程研究中心, 安徽合肥230009	htang@hfut.edu.cn
丁丽洁	合肥工业大学计算机与信息学院, 安徽合肥230009
程文娟	合肥工业大学计算机与信息学院, 安徽合肥230009
周雷	合肥工业大学计算机与信息学院, 安徽合肥230009

中文摘要

研究两机器人高速搬运系统的作业分配问题. 在系统的Markov决策过程(MDP)模型中, 状态变量具有连续取值和离散取值的混杂性, 状态空间复杂且存在“维数灾”问题, 传统的数值优化难以进行. 根据小脑模型关节控制器(CMAC)具有收敛速度快和适应性强的特点, 运用该结构作为Q值函数的逼近器, 并与Q学习和性能势概念相结合, 给出了一种适用于平均或折扣性能准则的CMAC-Q学习优化算法. 仿真结果说明, 这种神经元动态规划方法比常规的Q学习算法具有节省存储空间, 优化精度高和优化速度快的优势.

英文摘要

The task assignment of a high-speed handling system with two robots is studied in this paper. In the underlying Markov decision process(MDP) model, the state variable is composed of both continuous and discrete values, and the state space is complex and suffers from the curse of dimensionality. Therefore, the traditional numerical optimization is prevented from successful application to this system. Since the cerebellar-model-articulation-controller(CMAC) has the advantages of fast convergence and desired adaptability, it is employed to approximate the Q-values in a CMAC-Q learning optimization algorithm for combining the concept of performance potential and Q-learning, and for unifying the average criteria with the discount criteria. Compared with the Q-learning, the proposed neuro-dynamic programming approach requires less memory, but provides higher learning speed and better optimization performance as shown in the simulations.