基于可靠性的鲁棒模糊聚类

潘金艳; 高朋; 高云龙; 谢有为; 熊裕慧

引用本文:	潘金艳,高朋,高云龙,谢有为,熊裕慧.基于可靠性的鲁棒模糊聚类[J].控制理论与应用,2021,38(4):516~528.[点击复制]
	PAN Jin-yan,GAO Peng,GAO Yun-long,XIE You-wei,XIONG Yu-hui.Reliability-based of robust fuzzy flustering[J].Control Theory and Technology,2021,38(4):516~528.[点击复制]

基于可靠性的鲁棒模糊聚类

Reliability-based of robust fuzzy flustering

摘要点击 2228 全文点击 611 投稿时间：2020-07-23 修订日期：2020-11-12

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2020.00480

2021,38(4):516-528

中文关键词模糊C均值(FCM) 类不均衡集成学习 k近邻约束局部信息

英文关键词 fuzzy C-means (FCM) size imbalance ensemble learning k-nearest neighbor constraint local information

基金项目国家自然科学基金项目(61203176), 福建省自然科学基金项目(2013J05098, 2016J01756)资助.

作者	单位	E-mail
潘金艳	集美大学信息工程学院	jypan@jmu.edu.cn
高朋	集美大学航海学院
高云龙^*	厦门大学航空航天学院	gaoyl@xmu.edu.cn
谢有为	福建省厦门市集美区集美大学
熊裕慧	厦门大学航空航天学院

中文摘要

相比于k-means算法, 模糊C均值(FCM)通过引入模糊隶属度, 考虑不同数据簇之间的相互作用, 进而避免了聚类中心趋同性问题. 然而模糊隶属度具有拖尾和翘尾的结构特征, 因此使得FCM算法对噪声点和孤立点很敏感; 此外, 由于FCM算法倾向于将各数据簇均等分, 因此算法对数据簇大小也很敏感, 对非平衡数据簇聚类效果不佳. 针对这些问题, 本文提出了基于可靠性的鲁棒模糊聚类算法(RRFCM). 该算法基于当前的聚类结果, 对样本点进行可靠性分析, 利用样本点的可靠性和局部近邻信息, 突出不同数据簇之间的可分性, 从而提高了算法对噪声的鲁棒性, 并且降低了对非平衡数据簇大小的敏感性, 得到了泛化性能更好的聚类结果. 与相关算法进行对比, RRFCM算法在人造数据集, UCI真实数据集以及图像分割实验中均取得最优的结果.

英文摘要

Compared with the k-means algorithm, fuzzy C-means (FCM) considers the interaction between different data clusters by introducing fuzzy membership degree, thus avoiding the clustering center overlapping problem. However, fuzzy membership degree has the structural characteristics of trailing and warp-tail, which makes FCM algorithm very sensitive to noise points and outliers. In addition, the FCM algorithm tends to classify the data cluster with average size, so it is sensitive to data cluster size also, which makes the algorithm not good for clustering imbalanced data clusters. To solve these problems, a reliability–based of robust fuzzy clustering algorithm (RRFCM) is proposed in this paper. The algorithm is based on the current clustering results, the reliability analysis was carried out on the sample points, using the reliability of the sample points and local neighbor information, highlight the separability between different data clusters, so as to improve the robustness of the algorithm for noises, and reduce the sensitivity to cluster size and behave better on unbalanced data cluster size, better generalization capability of the clustering results are obtained. Compared with related algorithms, the algorithm achieves the optimal results in artificial data sets, UCI real data sets and image segmentation experiments.