Abstract:
                                      Objectives To enhance the safety and efficiency of maritime traffic, this paper proposes an autonomous collision avoidance decision-making method for unmanned ships based on an enhanced Deep Deterministic Policy Gradient (DDPG) algorithm. 
Methods  In order to address the issues of low data utilization and poor convergence in traditional DDPG algorithms, we employ Priority Experience Replay (PER) to dynamically adjust experience priority, reduce sample correlation, and utilize the Long Short-Term Memory (LSTM) network to improve the algorithm convergence. Based on the domain knowledge of ships and adhering to the International Regulations for Preventing Collisions at Sea (COLREGs), a model for determining meeting situations and a novel set of reward functions that consider urgent scenarios when other ships fail to comply with the COLREGs are introduced. Generalization experiments are conducted involving two-ship and multi-ship encounters to validate the effectiveness of the proposed method.
Results  As the experimental results demonstrate, compared to traditional DDPG algorithms, our improved approach enhances the convergence speed by approximately 28.8%. 
Conclusions  The trained model enables autonomous decision-making and navigation while ensuring compliance with the COLREGs, thereby providing valuable insights for intelligent decision-making in the field of maritime transportation.