基于聚类分析与特征权重因子优化的船舶油耗预测模型

A ship fuel consumption prediction model based on clustering analysis and feature weight factor optimization

  • 摘要:
    目的 在国际排放法规日益严格的背景下,船舶油耗预测成为航运业节能减排的关键环节。针对多源异构数据采样频率不一致导致的特征偏差匹配问题,提出一种融合船舶自动识别系统(AIS)数据、船舶日报(NR)数据及欧洲中期天气预报中心(ECMWF)气象数据的油耗预测黑箱模型。
    方法 针对现有均值降频处理存在的缺陷,模型结合数据特点提出基于积分原理的频率匹配方法。模型首先引入特征权重因子,使用加权欧式距离计算特征空间的距离;结合K-means算法对高频AIS数据进行聚类分析,计算加权聚类中心;利用KNN距离加权思想构建临近聚类中心的加权反距离权重,建立特征数据点与聚类中心的映射关系;基于数值积分建立AIS数据与NR数据的回归方程,采用L-BFGS-B算法求解约束条件下回归方程参数;最后引入NSGA-II多目标算法对特征权重因子进行全局优化,以提升模型拟合效果。此外,本研究采用五折交叉验证策略校验模型的鲁棒性与泛化能力。
    结果 试验结果表明,在160个聚类中心条件下,模型测试集的MAPE,MAE,R2的平均值分别为5.3721%,5.6049 t/d,0.9767,对比传统均值降频处理效果均有改善。
    结论 该模型可以通过增加聚类中心个数提高模型拟合精度,在满足精度要求下减少聚类中心数量以提高模型计算速度,通过降低一定的外推能力获得更好强抗干扰能力,具有一定工程实用价值。

     

    Abstract:
    Objective In the context of increasingly stringent international emission regulations, predicting ship fuel consumption has become a crucial component in energy conservation and emission reduction within the shipping industry. To address feature bias and matching issues arising from inconsistent sampling frequencies of multi-source heterogeneous data, this paper proposes a black-box model for fuel consumption prediction that integrates Automatic Identification System (AIS) data, ship noon report (NR) data, and European Centre for Medium-Range Weather Forecasts (ECMWF) meteorological data.
    Method To address the shortcomings of the mean value downsampling method used in existing studies, this model proposes a frequency matching approach based on the integral principle, tailored to the characteristics of the data. The model first introduces feature weight factors and utilizes weighted Euclidean distance to calculate distances in the feature space. It then combines the K-means algorithm to perform clustering analysis on high-frequency AIS data and compute weighted cluster centers. Next, the model constructs weighted inverse distance weights for adjacent cluster centers using the distance weighting principle from KNN, establishing a mapping relationship between feature data points and cluster centers. Based on numerical integration, a regression equation between AIS data and NR data is developed, and the L-BFGS-B algorithm is applied to solve the parameters of the regression equation under constraint conditions. Finally, the NSGA-II multi-objective algorithm is introduced to optimize the feature weight factors globally, improving the model's fitting accuracy. In addition, this study verifies the robustness and generalization ability of the model through a five-fold cross-validation strategy.
    Results The experimental results show that, with 160 cluster centers, the model achieves average values of MAPE (5.3721%), MAE (5.6049 t/day), and R2 (0.9767) on the test set. These values represent an improvement over the traditional mean value downsampling method.
    Conclusion The model's fitting accuracy improves with the increase in the number of cluster centers. However, to meet accuracy requirements while enhancing computational efficiency, fewer cluster centers can be selected, thereby speeding up model processing. The model demonstrates better anti-interference capability by reducing extrapolation, which provides practical engineering value.

     

/

返回文章
返回