WANG X Z, WANG M, LUO W. Intelligent decision technology in combat deduction based on soft actor-critic algorithm[J]. Chinese Journal of Ship Research, 2021, 16(6): 99–108. DOI: 10.19693/j.issn.1673-3185.02099
Citation: WANG X Z, WANG M, LUO W. Intelligent decision technology in combat deduction based on soft actor-critic algorithm[J]. Chinese Journal of Ship Research, 2021, 16(6): 99–108. DOI: 10.19693/j.issn.1673-3185.02099

Intelligent decision technology in combat deduction based on soft actor-critic algorithm

More Information
  • Received Date: August 30, 2020
  • Revised Date: February 03, 2021
  • Available Online: June 10, 2021
© 2021 The Authors. Published by Editorial Office of Chinese Journal of Ship Research. Creative Commons License
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  •   Objectives  The existing combat deduction simulation system mainly implements decision-making based on operational rules and experience knowledge, and it has certain problems such as limited application scenarios, low decision-making efficiency and poor flexibility. In view of the shortcomings of conventional decision-making methods, an intelligent decision-making model based on deep reinforcement learning (DRL) technology is proposed.
      Methods  First, the maximum entropy Markov decision process(MDP) of simulation deduction is established, and then the agent training network is constructed on the basis of actor-critic architecture to generate randomization policies that improve the agent's exploration ability. At the same time, the soft policy iterative updating method is used to search for better policies and continuously improve the agent's decision-making level. Finally, the simulation is carried out on the Mozi AI platform to validate the model.
      Results  The results show that an agent trained with the improved soft actor-critic (SAC) decision-making algorithm can achieve autonomous decision-making. Compared with the deep deterministic policy gradient (DDPG) algorithm, the probability of winning is increased by 24.53%.
      Conclusions  The design scheme of this decision-making model can provide theoretical references for research on intelligent decision-making technology, giving it some reference significance for warfare simulation and deduction.
  • [1]
    胡荟, 吴振齐. 人工智能技术在美国军事情报工作中的当前应用及发展趋势探析[J]. 国防科技, 2020, 41(2): 15–20.

    HU H, WU Z Q. Research on the current application and development trend of artificial intelligence technology in US military intelligence work[J]. National Defense Science & Technology, 2020, 41(2): 15–20 (in Chinese).
    [2]
    付长军, 郑伟明, 葛蕾, 等. 人工智能在作战仿真中的应用研究[J]. 无线电工程, 2020, 50(4): 257–261. doi: 10.3969/j.issn.1003-3106.2020.04.001

    FU C J, ZHENG W M, GE L, et al. Application of artificial intelligence in combat simulation[J]. Radio Engineering, 2020, 50(4): 257–261 (in Chinese). doi: 10.3969/j.issn.1003-3106.2020.04.001
    [3]
    孙鹏, 谭玉玺, 李路遥. 基于态势描述的陆军作战仿真外部决策模型研究[J]. 指挥控制与仿真, 2016, 38(2): 15–19. doi: 10.3969/j.issn.1673-3819.2016.02.004

    SUN P, TAN Y X, LI L Y. Research on external decision model of army operational simulation based on situation description[J]. Command Control & Simulation, 2016, 38(2): 15–19 (in Chinese). doi: 10.3969/j.issn.1673-3819.2016.02.004
    [4]
    董倩, 纪梦琪, 朱一凡, 等. 空中作战决策行为树建模与仿真[J]. 指挥控制与仿真, 2019, 41(1): 12–19. doi: 10.3969/j.issn.1673-3819.2019.01.003

    DONG Q, JI M Q, ZHU Y F, et al. Behavioral tree modeling and simulation for air operations decision[J]. Command Control & Simulation, 2019, 41(1): 12–19 (in Chinese). doi: 10.3969/j.issn.1673-3819.2019.01.003
    [5]
    彭希璐, 王记坤, 张昶, 等. 面向智能决策的兵棋推演技术[C]//2019第七届中国指挥控制大会论文集. 北京: 中国指挥与控制学会, 2019: 193–198.

    PENG X L, WANG J K, ZHANG C, et al. The technology of wargame based on intelligent decision[C]//Proceedings of the 7th China Command and Control Conference in 2019. Beijing: Chinese Institute of Command and Control, 2019: 193–198 (in Chinese).
    [6]
    廖馨, 孙峥皓. 作战推演仿真中的智能决策技术应用探索[C]//第二十届中国系统仿真技术及其应用学术年会论文集. 乌鲁木齐: 中国自动化学会系统仿真专业委员会, 2019: 368–374.

    LIAO X, SUN Z H. Exploration on application of intelligent decision-making in battle deduction simulation[C]//Proceedings of the 20th China Annual Conference on System Simulation Technology and its Application. Urumqi: System Simulation Committee of China Automation Society, 2019: 368–374 (in Chinese).
    [7]
    崔文华, 李东, 唐宇波, 等. 基于深度强化学习的兵棋推演决策方法框架[J]. 国防科技, 2020, 41(2): 113–121.

    CUI W H, LI D, TANG Y B, et al. Framework of wargaming decision-making methods based on deep reinforcement learning[J]. National Defense Science & Technology, 2020, 41(2): 113–121 (in Chinese).
    [8]
    HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: ACM Press, 2018.
    [9]
    SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge: MIT Press, 1998.
    [10]
    SPIELBERG S, GOPALUNI R, LOEWEN P. Deep reinforcement learning approaches for process control[C]//2017 6th International Symposium on Advanced Control of Industrial Processes, [S. 1. ]: IEEE, 2017: 201–203.
    [11]
    HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications [EB/OL]. ArXiv: 1812.05905, 2018(2018-12-13)[2020-08-30]. https://arxiv.org/abs/1812.05905.
    [12]
    MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533. doi: 10.1038/nature14236
    [13]
    SCHULMAN J, CHEN X, ABBEEL P. Equivalence between policy gradients and soft Q-learning[EB/OL]. ArXiv: 1704.06440, 2017. (2017-4-21)[2020-08-30]. https://arxiv.org/pdf/1704.06440.pdf.
    [14]
    HAARNOJA T, TANG H, ABBEEL P, et al, Reinforcement learning with deep energy-based policies[C]//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: ACM Press: MLR. org, 2017: 1352–1361.
    [15]
    LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]//Proceedings of the 4th International Conference on Learning Representations. San Juan, Puerto Rico: Elsevier, 2016.
  • Other Related Supplements

Catalog

    Article views (1801) PDF downloads (325) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return