Abstract:
Objective In the context of increasingly stringent international emission regulations, predicting ship fuel consumption has become a crucial component in energy conservation and emission reduction within the shipping industry. To address feature bias and matching issues arising from inconsistent sampling frequencies of multi-source heterogeneous data, this paper proposes a black-box model for fuel consumption prediction that integrates Automatic Identification System (AIS) data, ship noon report (NR) data, and European Centre for Medium-Range Weather Forecasts (ECMWF) meteorological data.
Method To address the shortcomings of the mean value downsampling method used in existing studies, this model proposes a frequency matching approach based on the integral principle, tailored to the characteristics of the data. The model first introduces feature weight factors and utilizes weighted Euclidean distance to calculate distances in the feature space. It then combines the K-means algorithm to perform clustering analysis on high-frequency AIS data and compute weighted cluster centers. Next, the model constructs weighted inverse distance weights for adjacent cluster centers using the distance weighting principle from KNN, establishing a mapping relationship between feature data points and cluster centers. Based on numerical integration, a regression equation between AIS data and NR data is developed, and the L-BFGS-B algorithm is applied to solve the parameters of the regression equation under constraint conditions. Finally, the NSGA-II multi-objective algorithm is introduced to optimize the feature weight factors globally, improving the model's fitting accuracy. In addition, this study verifies the robustness and generalization ability of the model through a five-fold cross-validation strategy.
Results The experimental results show that, with 160 cluster centers, the model achieves average values of MAPE (5.3721%), MAE (5.6049 t/day), and R2 (0.9767) on the test set. These values represent an improvement over the traditional mean value downsampling method.
Conclusion The model's fitting accuracy improves with the increase in the number of cluster centers. However, to meet accuracy requirements while enhancing computational efficiency, fewer cluster centers can be selected, thereby speeding up model processing. The model demonstrates better anti-interference capability by reducing extrapolation, which provides practical engineering value.