概率感知型TS-DQN无人艇智能搜潜决策

刘婷; 王辉; 郑凯; 高其强

doi:10.19693/j.issn.1673-3185.04440

概率感知型TS-DQN无人艇智能搜潜决策

Probabilistic Perception-based TS-DQN Decision-making for Autonomous USV Submarine Search

摘要

摘要: 【目的】针对无人水面艇(USV)对潜艇的探查问题，提出一种基于深度强化学习的无人艇智能搜潜算法。【方法】研究以潜艇潜入我方重点海域为背景，构建了探查环境和运动学模型，并建立了综合距离与角度影响的声呐探测概率模型，明确界定了探查任务的成功判定条件。在此基础上，通过深度Q网络(DQN)算法建立马尔科夫决策过程框架，结合探查任务的需求设计了包含探测概率的状态空间，耦合探测概率、距离和角度的多元奖励函数，并提出了融合双重决斗网络架构与优先经验回放的时序差分Q网络(TS-DQN)算法，引入探测概率感知型ε-贪婪策略，使系统能够根据当前环境的探测状态自动调整探索倾向，显著提升了智能体的策略学习效率。【结果】大量仿真实验表明，所提方法在探测成功率方面达到38.85%，比排名第二的Dueling DQN高出18倍；同时在平均路径步长上实现334.36步，较其他算法减少9.5%以上【结论】研究成果在探测效率与有效性方面展现出的综合优势，为无人艇自主探查技术的发展提供了创新性解决方案。

Abstract: Objectives To develop a deep reinforcement learning-based search algorithm for unmanned surface vehicles (USVs) in submarine detection tasks. Methods The study is conducted in the context of submarines infiltrating key maritime areas, where a search environment and a kinematic model are constructed. A sonar detection probability model is established, incorporating the effects of distance and angle, with well-defined criteria for determining detection success. Based on this framework, a Markov decision process (MDP) is formulated using the deep Q-network (DQN) algorithm. The state space is designed to include detection probability, and a multi-objective reward function is constructed by integrating detection probability, distance, and angle. To enhance learning efficiency, a temporal difference Q-network (TS-DQN) algorithm is introduced, integrating a double-dueling network architecture with prioritized experience replay. Additionally, a probabilistic perception-based ε-greedy strategy is employed, allowing the system to dynamically adjust its exploration behavior based on real-time detection states, thereby significantly improving policy learning efficiency. Results Extensive simulation experiments demonstrate that the proposed method achieves a detection success rate of 38.85%, which is 18 times higher than that of the second-best Dueling DQN. The approach also reduces the average path length to 334.36 steps, shortening the search trajectory by more than 9.5% compared to other algorithms.Conclusions The proposed algorithm exhibits significant advantages in detection efficiency and effectiveness, providing an innovative solution for advancing autonomous USV-based search and detection technologies.

HTML全文

参考文献(0)

施引文献

资源附件(0)