基于最大方差权信息系数的煤气数据填补

吕政; 赵珺; 刘颖; 王伟

引用本文:	吕政,赵珺,刘颖,王伟.基于最大方差权信息系数的煤气数据填补[J].控制理论与应用,2015,32(5):646~654.[点击复制]
	LV Zheng,ZHAO Jun,LIU Ying,WANG Wei.Missing data imputation based on maximal variance weight information coefficient for gas flow in steel industry[J].Control Theory and Technology,2015,32(5):646~654.[点击复制]

基于最大方差权信息系数的煤气数据填补

Missing data imputation based on maximal variance weight information coefficient for gas flow in steel industry

摘要点击 2532 全文点击 1154 投稿时间：2014-09-06 修订日期：2014-12-31

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2015.40828

2015,32(5):646-654

中文关键词冶金能源系统数据填补样本筛选最大方差权信息系数

英文关键词 energy system of steel industry data imputation sample selection maximal variance weight information coefficient

基金项目国家“863”计划项目(2013AA040703), 国家自然科学基金项目(61034003, 61304213, 61104157, 61273037, 61473056), 中央高校基本科研业务费专项资金项目(DUT13RC203)资助.

作者	单位	E-mail
吕政	大连理工大学控制科学与工程学院	lvzheng@mail.dlut.edu.cn
赵珺^*	大连理工大学控制科学与工程学院	zhaoj@dlut.edu.cn
刘颖	大连理工大学控制科学与工程学院
王伟	大连理工大学控制科学与工程学院

中文摘要

在基于数据的挖掘、建模与优化领域, 数据的完整性与准确性是进行此类研究的基础. 鉴于冶金能源系统的复杂性和现场数据采集过程易受干扰的特点, 其数据在获取过程中极易发生数据缺失的现象, 从而造成模型无法建立, 隐含信息无法准确挖掘等情况. 本文针对钢铁企业副产煤气的发生、消耗流量数据出现的缺失情况, 通过分析相似工况下能源流量数据的相关特性, 提出一种基于最大方差权信息系数的冶金企业副产煤气系统流量数据填补方法. 该方法针对现场经常发生的两类数据缺失情况, 即数据点间断缺失和数据长时间连续缺失, 以最大方差权信息系数作为样本筛选准则, 并采用基于核学习的方法对缺失数据进行填补. 为验证本文提出的数据填补方法的有效性, 本文对上海宝钢高炉、焦炉和冷热轧用户的实际生产数据的运行试验, 结果表明该方法相比其他的方法在填补精度上有很大优势.

英文摘要

In data-driven-based modeling and optimization, the completeness and the accuracy of data are the foundations for further research tasks. Since the energy system of steel industry is rather complicate and its data-acquisition process might be frequently affected by the malfunctions of data transportation, storage and transformation, the data-missing phenomenon usually occurs, which might lead to the failure of model building or accurate information discovery. In this study, by analyzing the correlation of the energy data with respect to corresponding operation conditions in manufacturing, a data imputation method for the missing data in the byproduct gas flow is proposed. In this method, the proposed maximal variance weight information coefficient (MVWIC) is adopted as the sample selection criteria to realize the data imputation by using the kernel-learning-based method. To validate the proposed method, two types of missing modes that frequently occurred in steel industry are considered here, i.e., the intermittent missing and the long-term continuous missing. A series of experiments using the practical energy data indicates that the proposed method exhibits good performance on the imputation accuracy when compared to other methods.