聚类中心初始化的新方法

李春生; 王耀南

引用本文:	李春生,王耀南.聚类中心初始化的新方法[J].控制理论与应用,2010,27(10):1435~1440.[点击复制]
	LI Chun-sheng,WANG Yao-nan.New initialization method for cluster center[J].Control Theory and Technology,2010,27(10):1435~1440.[点击复制]

聚类中心初始化的新方法

New initialization method for cluster center

摘要点击 2632 全文点击 1499 投稿时间：2008-09-02 修订日期：2010-01-03

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/j.issn.1000-8152.2010.10.CCTA080927

2010,27(10):1435-1440

中文关键词最小支撑树聚类中心初始化 k-means算法

英文关键词 cluster center initialization minimum spanning tree k-means algorithm

基金项目国家“863”计划重点资助项目(2007AA04Z224); 国家自然科学基金重点资助项目(60835004).

作者	单位	E-mail
李春生^*	广东商学院数学与计算科学学院湖南大学电气与信息工程学院	lcs0200731@yahoo.com.cn
王耀南	湖南大学电气与信息工程学院

中文摘要

k–均值聚类算法易受初始聚类中心的影响而陷入局部最优解. 现有聚类中心初始化方法尚未得到广泛认可. 本文依据每个类内至少有一个数据稠密区, 且处于不同类的数据稠密区比处于同一类的数据稠密区相距更远的假设, 在数据集合上构造一棵最小支撑树, 应用根树原理在其上搜索数据稠密区并估计其密度, 从中选出密度大且足够分离的数据稠密区, 以其内的点作为初始聚类中心, 得到了一个聚类中心初始化的新方法. 将此方法与现有的方法进行比较, 仿真实验表明, 本文方法性能更优越.

英文摘要

The k-means clustering algorithm is prone to be trapped into local optima by inappropriate initial cluster centers. For this reason, the existing initialization methods for the cluster center have not been widely accepted. We assume that there is at least one dense subset of data in a cluster; and the dense subsets between different clusters are more distant than those in the same cluster. A minimum spanning tree is built for the given data set. The dense subsets can be found through the search from root trees, and their densities are obtained by the estimation technique for data density. The initial cluster centers are picked out from the dense subsets that are dense enough and distant enough from each other. The comparisons between the proposed method and current methods show that the performance of the proposed method is promising.