Citation: | YANG Y P, SONG L F, MAO J Q, et al. Unmanned surface vehicle escape strategy based on hybrid sampling deep Q-network[J]. Chinese Journal of Ship Research, 2024, 19(1): 256–263 (in Chinese). DOI: 10.19693/j.issn.1673-3185.03105 |
Aiming at the encirclement tactics adopted by enemy ships, this study focuses on the problem of planning an escape strategy when an unmanned surface vehicle (USV) is surrounded by enemy ships.
A hybrid sampling deep Q-network (HS-DQN) reinforcement learning algorithm is proposed which gradually increases the playback frequency of important samples and retains a certain level of exploration to prevent it from falling into local optimization. The state space, action space and reward function are designed to obtain the USV's optimal escape strategy, and its performance is compared with that of the deep Q-network (DQN) algorithm in terms of reward and escape success rate.
The simulation results show that using the HS-DQN algorithm for training increases the escape success rate by 2% and the convergence speed by 20%.
The HS-DQN algorithm can reduce the number of useless explorations and speed up the convergence of the algorithm. The simulation results verify the effectiveness of the USV escape strategy.
[1] |
SONNENBURG C R, WOOLSEY C A. Modeling, identification, and control of an unmanned surface vehicle[J]. Journal of Field Robotics, 2013, 30(3): 371–398. doi: 10.1002/rob.21452
|
[2] |
SARDA E I, QU H, BERTASKA I R, et al. Station keeping control of an unmanned surface vehicle exposed to current and wind disturbances[J]. Ocean Engineering, 2016, 127: 305–324. doi: 10.1016/j.oceaneng.2016.09.037
|
[3] |
LARRAZABAL J M, PENAS M S. Intelligent rudder control of an unmanned surface vessel[J]. Expert Systems with Applications, 2016, 55: 106–117. doi: 10.1016/j.eswa.2016.01.057
|
[4] |
THOMAS H, SANDIP S. Evolving behavioral strategies in predator and prey[J]. International Joint Conference on Artificial Intelligence, 2005: 113-126.
|
[5] |
张彬. 基于粒子群算法的群体机器人围捕行为的研究[D]. 兰州: 兰州理工大学, 2013.
ZHANG B. The research on groups of robot hunting behavior based on particle swarm optimization algorithm[D]. Lanzhou: Lanzhou University of Technology, 2013 (in Chinese).
|
[6] |
宋利飞, 徐凯凯, 史晓骞, 等. 多无人艇协同围捕智能逃跑目标方法研究[J]. 中国舰船研究, 2023, 18(1): 52–59. doi: 10.19693/j.issn.1673-3185.02974
SONG L F, XU K K, SHI X Q, et al. Multiple USV cooperative algorithm method for hunting intelligent escaped targets[J]. Chinese Journal of Ship Research, 2023, 18(1): 52–59 (in both Chinese and English). doi: 10.19693/j.issn.1673-3185.02974
|
[7] |
狄小娟. 基于强化学习的移动多智能体自组织协同目标搜索[D]. 南京: 南京邮电大学, 2020.
DI X J. Self-organizing collaborative target search of mobile multi-agent based on reinforcement learning[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2020 (in Chinese).
|
[8] |
吴子沉, 胡斌. 基于态势认知的无人机集群围捕方法[J]. 北京航空航天大学学报, 2021, 42(2): 424–429. doi: 10.13700/j.bh.1001-5965.2020.0274
WU Z C, HU B. Swarm rounding up method of UAV based on situation cognition[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021, 42(2): 424–429 (in Chinese). doi: 10.13700/j.bh.1001-5965.2020.0274
|
[9] |
史帅科. 基于博弈论的多卫星围捕策略研究 [D]. 北京: 北京邮电大学, 2017.
SHI S K. Research on multi-satellites pursuit evasion stra-tegy based on game theory[D]. Beijing: Beijing University of Posts and Telecommunications, 2017 (in Chinese).
|
[10] |
RUBINSKY S, GUTMAN S. Three-player pursuit and evasion conflict[J]. Journal of Guidance, Control, and Dynamics, 2014, 37(1): 98–110. doi: 10.2514/1.61832
|
[11] |
WANG Y D, DONG L, SUN C Y. Cooperative control for multi-player pursuit-evasion games with reinforcement learning[J]. Neurocomputing, 2019.
|
[12] |
CARSTEN H, THOMY P, THOMAS G, et al. Emergent escape-based flocking behavior using multi-agent reinforcement learning[D]. America: Cornell University, 2019.
|
[13] |
XIONG H, CAO H H, ZHANG L, et al. A dynamics perspective of pursuit-evasion games of intelligent agents with the ability to learn[D]. Ithaca, NY: Cornell University, 2021.
|
[14] |
夏家伟, 朱旭芳, 张建强, 等. 基于多智能体强化学习的无人艇协同围捕方法研究[J]. 控制与决策, 2022: 1-9.
XIA J W, ZHU X F, ZHANG J Q, et al. Research on cooperative hunting method of unmanned surface vehicle based on multi-agent reinforcement learning[J]. Control and Decision, 2022: 1–9 (in Chinese).
|
[15] |
刘峰, 魏瑞轩, 周凯, 等. 基于群体意志统一的无人机协同围捕策略研究[J]. 北京航空航天大学学报, 2021, 48(11): 1-8.
LIU F, WEI R, ZHOU K, et al. Research on multi-UAV roundup strategy based on the unity of group will[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021,48(11): 1-8 (in Chinese).
|
[16] |
方宝富, 潘启树, 洪炳镕, 等. 多追捕者-单一逃跑者追逃问题实现成功捕获的约束条件[J]. 机器人, 2012, 34(3): 282–291. doi: 10.3724/SP.J.1218.2012.00282
FANG B F, PAN Q H, HONG B G, et al. Constraint conditions of successful capture in multi-pursuers VS one evader games[J]. Robot, 2012, 34(3): 282–291 (in Chinese). doi: 10.3724/SP.J.1218.2012.00282
|
[17] |
苏义鑫, 石兵华, 张华军, 等. 水面无人艇的抗追捕-逃跑策略[J]. 哈尔滨工程大学学报, 2018, 39(6): 1019–1025. doi: 10.11990/jheu.201705092
SU Y X, SHI B H, ZHANG H J, et al. Anti-pursuit evasion strategy of an unmanned surface vehicle[J]. Journal of Harbin Engineering University, 2018, 39(6): 1019–1025 (in Chinese). doi: 10.11990/jheu.201705092
|
[18] |
SUTTON R S, BSRTO A G. Reinforcement learning: an introduction[J]. IEEE Transactions on Neural Networks, 2005, 16: 285–286. doi: 10.1109/TNN.2004.842673
|
[19] |
MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[EB/OL]. (2013-12-19)[2022-09-20]. https://arxiv.org/abs/1312.5602
|