徐德胜, 徐春晖. 基于改进PPO算法的AUV控制器设计[J]. 中国舰船研究. DOI: 10.19693/j.issn.1673-3185.04031
引用本文: 徐德胜, 徐春晖. 基于改进PPO算法的AUV控制器设计[J]. 中国舰船研究. DOI: 10.19693/j.issn.1673-3185.04031
Design of AUV controller based on improved PPO algorithm[J]. Chinese Journal of Ship Research. DOI: 10.19693/j.issn.1673-3185.04031
Citation: Design of AUV controller based on improved PPO algorithm[J]. Chinese Journal of Ship Research. DOI: 10.19693/j.issn.1673-3185.04031

基于改进PPO算法的AUV控制器设计

Design of AUV controller based on improved PPO algorithm

  • 摘要: [目的]为了提高AUV控制器对环境建模误差的鲁棒性,提出一种引入上下文信息和课程学习训练机制的强化学习控制策略。[方法]首先,通过将交互历史数据作为策略网络输入的一部分,将上下文信息嵌入策略网络;其次,设计课程学习训练机制,在训练过程中逐渐增加干扰力度,避免因干扰过大导致的训练不稳定和早停现象。在仿真环境中进行了定深控制实验,并在水池中使用实体AUV进一步验证了算法的有效性。[结果]实验结果表明,所提出算法可以将收敛速度提升25%,奖励稳态值提升10.8%,有效改进训练过程。所提出算法在仿真环境中可以实现无静差跟踪,在水池实验中的跟踪误差均值相比于对照算法分别减小了62%、45%;标准差减小了52%、36%,有效提升了跟踪精度和稳定性。[结论]研究成果可为深度强化学习方法在AUV控制领域上的应用提供参考。

     

    Abstract: Objectives In order to improve the robustness of AUV controllers to environment modeling errors, a reinforcement learning control strategy that introduces contextual information and course learning training mechanism is proposed. Methods First, the contextual information is embedded into the policy network by using the interaction history data as part of the policy network input; second, the course-learning training mechanism is designed to gradually increase the interference strength during the training process to avoid training instability and early stopping phenomenon caused by too much interference. Fixed-depth control experiments were conducted in a simulation environment, and the effectiveness of the algorithm was further verified using a solid AUV in a pool. Results The experimental results show that the proposed algorithm can improve the convergence speed by 25% and the reward steady state value by 10.8%, which effectively improves the training process. The proposed algorithm can realize static-free tracking in the simulation environment, and the mean

     

/

返回文章
返回