密度分布函数在聚类算法中的应用

谭建豪; 章兢; 李伟雄

引用本文:	谭建豪,章兢,李伟雄.密度分布函数在聚类算法中的应用[J].控制理论与应用,2011,28(12):1791~1796.[点击复制]
	TAN Jian-hao,ZHANG Jing,LI Wei-xiong.Application of density distribution function in clustering algorithms[J].Control Theory and Technology,2011,28(12):1791~1796.[点击复制]

密度分布函数在聚类算法中的应用

Application of density distribution function in clustering algorithms

摘要点击 2682 全文点击 2218 投稿时间：2010-02-04 修订日期：2011-02-27

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/j.issn.1000-8152.2011.12.CCTA100138

2011,28(12):1791-1796

中文关键词聚类算法 KNN GNN 密度分布函数 OPTICS DENCLUE 区域比例半径比例因子

英文关键词 clustering algorithms KNN GNN density distribution function OPTICS(ordering points to identify the clustering structure) DENCLUE(density-based clustering) local scale radius scale factor

基金项目国家自然科学基金资助项目(60634020); 湖南省自然科学基金资助项目(08JJ3132); 中央高校基本科研业务费资助项目.

作者	单位	E-mail
谭建豪^*	湖南大学电气与信息工程学院	tanjianhao96@sina.com.cn
章兢	湖南大学电气与信息工程学院
李伟雄	湖南大学电气与信息工程学院

中文摘要

深入分析了传统的基于密度的聚类方法的特点和存在的问题及讨论了基于密度聚类算法研究现状, 提出了一种改进的基于密度分布函数的聚类算法. 使用K最近邻(KNN)的思想度量密度以寻找当前密度最大点, 即中心点. 并使用区域比例, 将类从中心点开始扩展, 每次扩展的同时引入半径比例因子以发现核心点. 再从该核心点的KNN扩展类, 直至密度下降到中心点密度的给定比率时结束. 给出了数个算法实例并与基于网格的共享近邻聚类(GNN)算法在聚类准确率和效率上进行了试验比较, 试验表明该算法极大降低了基于密度聚类算法对参数的敏感性、改善了对高维密度分布不均数据集的聚类效果、提高了聚类准确率和效率.

英文摘要

Characteristics and disadvantages of traditional density-based clustering algorithms are deeply investigated; the present research status of density-based clustering algorithms is discussed; an improved clustering algorithm based on density distribution function is put forward. K nearest neighbor (KNN) is used to measure the density of each point; a local maximum density point is defined as the center point. By means of local scale, classification is extended from the center point. For each point there is a procedure to determine whether it is a core point by a radius scale factor. The classification is extended once again from the core point until the density descends to the given ratio of the density of the center point. Several algorithm examples are given and the algorithm is experimentally compared with the grid-shared nearest neighbor (GNN) clustering algorithm, on the clustering accuracy ratio and efficiency. The tests show that the improved algorithm greatly reduces the sensitivity of density-based clustering algorithms to parameters, improves the clustering effect of the high-dimensional data sets with uneven density distribution, and enhances the clustering accuracy and efficiency.