Unmanned surface vehicle escape strategy based on hybrid sampling deep Q-network

YANG Yuanpeng; SONG Lifei; MAO Jiaqi; LI Yi; CHEN Houjing

doi:10.19693/j.issn.1673-3185.03105

Volume 19 Issue 1

Feb. 2024

Turn off MathJax

Article Contents

Abstract

References

Chinese Journal of Ship Research > 2024 > 19(1): 256-263. > DOI: 10.19693/j.issn.1673-3185.03105 CSTR: 32390.14.j.issn.1673-3185.03105

YANG Y P, SONG L F, MAO J Q, et al. Unmanned surface vehicle escape strategy based on hybrid sampling deep Q-network[J]. Chinese Journal of Ship Research, 2024, 19(1): 256–263 (in Chinese). DOI: 10.19693/j.issn.1673-3185.03105

Citation:

PDF (8572 KB)

Unmanned surface vehicle escape strategy based on hybrid sampling deep Q-network

YANG Yuanpeng^{1, 2},
SONG Lifei^2,,
MAO Jiaqi²,
LI Yi²,
CHEN Houjing^{2, 3,}

1.
Systems Engineering Research Institute, CSSC, Beijing 100094, China
2.
Key Laboratory of High Performance Ship Technology of Ministry of Education, Wuhan University of Technology, Wuhan 430063, China
3.
China Ship Development and Design Center, Wuhan 430064, China

More Information

Received Date: September 26, 2022
Revised Date: February 14, 2023
Available Online: March 02, 2023
Published Date: April 16, 2023

This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Graphical Abstract

Abstract

Abstract

Objective
Aiming at the encirclement tactics adopted by enemy ships, this study focuses on the problem of planning an escape strategy when an unmanned surface vehicle (USV) is surrounded by enemy ships.
Methods
A hybrid sampling deep Q-network (HS-DQN) reinforcement learning algorithm is proposed which gradually increases the playback frequency of important samples and retains a certain level of exploration to prevent it from falling into local optimization. The state space, action space and reward function are designed to obtain the USV's optimal escape strategy, and its performance is compared with that of the deep Q-network (DQN) algorithm in terms of reward and escape success rate.
Results
The simulation results show that using the HS-DQN algorithm for training increases the escape success rate by 2% and the convergence speed by 20%.
Conclusions
The HS-DQN algorithm can reduce the number of useless explorations and speed up the convergence of the algorithm. The simulation results verify the effectiveness of the USV escape strategy.
- USV,
- Apollonius circle,
- pursuit−evasion,
- deep reinforcement learning,
- hybrid sampling

FullText(HTML)

References (19)

References

[1]	SONNENBURG C R, WOOLSEY C A. Modeling, identification, and control of an unmanned surface vehicle[J]. Journal of Field Robotics, 2013, 30(3): 371–398. doi: 10.1002/rob.21452
[2]	SARDA E I, QU H, BERTASKA I R, et al. Station keeping control of an unmanned surface vehicle exposed to current and wind disturbances[J]. Ocean Engineering, 2016, 127: 305–324. doi: 10.1016/j.oceaneng.2016.09.037
[3]	LARRAZABAL J M, PENAS M S. Intelligent rudder control of an unmanned surface vessel[J]. Expert Systems with Applications, 2016, 55: 106–117. doi: 10.1016/j.eswa.2016.01.057
[4]	THOMAS H, SANDIP S. Evolving behavioral strategies in predator and prey[J]. International Joint Conference on Artificial Intelligence, 2005: 113-126.
[5]	张彬. 基于粒子群算法的群体机器人围捕行为的研究[D]. 兰州: 兰州理工大学, 2013. ZHANG B. The research on groups of robot hunting behavior based on particle swarm optimization algorithm[D]. Lanzhou: Lanzhou University of Technology, 2013 (in Chinese).
[6]	宋利飞, 徐凯凯, 史晓骞, 等. 多无人艇协同围捕智能逃跑目标方法研究[J]. 中国舰船研究, 2023, 18(1): 52–59. doi: 10.19693/j.issn.1673-3185.02974 SONG L F, XU K K, SHI X Q, et al. Multiple USV cooperative algorithm method for hunting intelligent escaped targets[J]. Chinese Journal of Ship Research, 2023, 18(1): 52–59 (in both Chinese and English). doi: 10.19693/j.issn.1673-3185.02974
[7]	狄小娟. 基于强化学习的移动多智能体自组织协同目标搜索[D]. 南京: 南京邮电大学, 2020. DI X J. Self-organizing collaborative target search of mobile multi-agent based on reinforcement learning[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2020 (in Chinese).
[8]	吴子沉, 胡斌. 基于态势认知的无人机集群围捕方法[J]. 北京航空航天大学学报, 2021, 42(2): 424–429. doi: 10.13700/j.bh.1001-5965.2020.0274 WU Z C, HU B. Swarm rounding up method of UAV based on situation cognition[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021, 42(2): 424–429 (in Chinese). doi: 10.13700/j.bh.1001-5965.2020.0274
[9]	史帅科. 基于博弈论的多卫星围捕策略研究 [D]. 北京: 北京邮电大学, 2017. SHI S K. Research on multi-satellites pursuit evasion stra-tegy based on game theory[D]. Beijing: Beijing University of Posts and Telecommunications, 2017 (in Chinese).
[10]	RUBINSKY S, GUTMAN S. Three-player pursuit and evasion conflict[J]. Journal of Guidance, Control, and Dynamics, 2014, 37(1): 98–110. doi: 10.2514/1.61832
[11]	WANG Y D, DONG L, SUN C Y. Cooperative control for multi-player pursuit-evasion games with reinforcement learning[J]. Neurocomputing, 2019.
[12]	CARSTEN H, THOMY P, THOMAS G, et al. Emergent escape-based flocking behavior using multi-agent reinforcement learning[D]. America: Cornell University, 2019.
[13]	XIONG H, CAO H H, ZHANG L, et al. A dynamics perspective of pursuit-evasion games of intelligent agents with the ability to learn[D]. Ithaca, NY: Cornell University, 2021.
[14]	夏家伟, 朱旭芳, 张建强, 等. 基于多智能体强化学习的无人艇协同围捕方法研究[J]. 控制与决策, 2022: 1-9. XIA J W, ZHU X F, ZHANG J Q, et al. Research on cooperative hunting method of unmanned surface vehicle based on multi-agent reinforcement learning[J]. Control and Decision, 2022: 1–9 (in Chinese).
[15]	刘峰, 魏瑞轩, 周凯, 等. 基于群体意志统一的无人机协同围捕策略研究[J]. 北京航空航天大学学报, 2021, 48(11): 1-8. LIU F, WEI R, ZHOU K, et al. Research on multi-UAV roundup strategy based on the unity of group will[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021,48(11): 1-8 (in Chinese).
[16]	方宝富, 潘启树, 洪炳镕, 等. 多追捕者-单一逃跑者追逃问题实现成功捕获的约束条件[J]. 机器人, 2012, 34(3): 282–291. doi: 10.3724/SP.J.1218.2012.00282 FANG B F, PAN Q H, HONG B G, et al. Constraint conditions of successful capture in multi-pursuers VS one evader games[J]. Robot, 2012, 34(3): 282–291 (in Chinese). doi: 10.3724/SP.J.1218.2012.00282
[17]	苏义鑫, 石兵华, 张华军, 等. 水面无人艇的抗追捕-逃跑策略[J]. 哈尔滨工程大学学报, 2018, 39(6): 1019–1025. doi: 10.11990/jheu.201705092 SU Y X, SHI B H, ZHANG H J, et al. Anti-pursuit evasion strategy of an unmanned surface vehicle[J]. Journal of Harbin Engineering University, 2018, 39(6): 1019–1025 (in Chinese). doi: 10.11990/jheu.201705092
[18]	SUTTON R S, BSRTO A G. Reinforcement learning: an introduction[J]. IEEE Transactions on Neural Networks, 2005, 16: 285–286. doi: 10.1109/TNN.2004.842673
[19]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[EB/OL]. (2013-12-19)[2022-09-20]. https://arxiv.org/abs/1312.5602

Cited By

Citation

XML

PDF in Chinese

Article views (507) PDF downloads (71)

Unmanned surface vehicle escape strategy based on hybrid sampling deep Q-network

Abstract

References

Catalog

Related

Unmanned surface vehicle escape strategy based on hybrid sampling deep Q-network

Abstract

References

Catalog

Related

Export File

Citation

Format

Content