基于改进DDPG算法的无人船自主避碰决策方法

Autonomous decision-making method of unmanned ship based on improved DDPG algorithm

  • 摘要:
    目的 针对传统深度确定性策略梯度(DDPG)算法数据利用率低、收敛性差的特点,改进并提出一种新的无人船自主避碰决策方法。
    方法 利用优先经验回放(PER)自适应调节经验优先级,降低样本的相关性,并利用长短期记忆(LSTM)网络提高算法的收敛性。基于船舶领域和《国际海上避碰规则》(COLREGs),设置会遇情况判定模型和一组新定义的奖励函数,并考虑了紧迫危险以应对他船不遵守规则的情况。为验证所提方法的有效性,在两船和多船会遇局面下进行仿真实验。
    结果 结果表明,改进的DDPG算法相比于传统DDPG算法在收敛速度上提升约28.8%,
    结论 训练好的自主避碰模型可以使无人船在遵守COLREGs的同时实现自主决策和导航,为实现更加安全、高效的海上交通智能化决策提供参考。

     

    Abstract:
    Objectives To enhance the safety and efficiency of maritime traffic, this paper proposes an autonomous collision avoidance decision-making method for unmanned ships based on an enhanced Deep Deterministic Policy Gradient (DDPG) algorithm.
    Methods  In order to address the issues of low data utilization and poor convergence in traditional DDPG algorithms, we employ Priority Experience Replay (PER) to dynamically adjust experience priority, reduce sample correlation, and utilize the Long Short-Term Memory (LSTM) network to improve the algorithm convergence. Based on the domain knowledge of ships and adhering to the International Regulations for Preventing Collisions at Sea (COLREGs), a model for determining meeting situations and a novel set of reward functions that consider urgent scenarios when other ships fail to comply with the COLREGs are introduced. Generalization experiments are conducted involving two-ship and multi-ship encounters to validate the effectiveness of the proposed method.
    Results  As the experimental results demonstrate, compared to traditional DDPG algorithms, our improved approach enhances the convergence speed by approximately 28.8%.
    Conclusions  The trained model enables autonomous decision-making and navigation while ensuring compliance with the COLREGs, thereby providing valuable insights for intelligent decision-making in the field of maritime transportation.

     

/

返回文章
返回