面向舰载机器人应用的三维语义增强扩散策略操作方法

3D semantic enhanced diffusion policy for intelligent shipborne robot

  • 摘要:
    目的 舰载机器人自主作业中受制于有限的计算与部署条件而难以实现高水平认知与决策,亟需提高其在资源受限下的实时环境认知与自主决策能力。
    方法 提出一种以扩散策略为核心的轻量化三维语义增强框架(SGDP)。该框架首先基于3DGS(3D Gaussian splatting)对操作物体进行语义建模,随后利用实时姿态估计的语义场更新,实现动态场景下的语义一致性;并设计一种融合语义、几何与关节状态的多模态扩散策略,在提升语义感知能力的同时可以轻量化部署在舰载机器人中。
    结果 在舰载机器人实验平台上进行的放置刀具、抓取马克笔与水瓶倒水3类复杂任务中,SGDP算法在仅使用1台深度相机进行环境观测的条件下,语义场更新频率和动作推理频率分别达到39.71 Hz和32.16 Hz,使得在算力有限的舰载机器人平台上进行实时闭环控制成为可能。在已知物体任务执行中平均成功率达到81.67%,在未知物体任务中仍保持78.33%的成功率,表明该框架具备较强的零样本泛化能力,降低了对大量任务特定数据收集的依赖。
    结论 结果表明,该框架提供了一种高效的感知−决策一体化可行方案,实现了在有限资源下环境认知与自主决策的高效协同,为缓解舰载机器人等单体无人装备在轻量化部署与高水平自主决策之间的矛盾提供了有效的技术途径。

     

    Abstract:
    Objective With the advancement of artificial intelligence and sensor fusion technologies, shipborne robots equipped with target recognition and autonomous execution capabilities are increasingly becoming a vital development direction for future combat systems. However, during autonomous operations, these robots are constrained by limited onboard computing resources and deployment conditions, hindering the achievement of high-level cognition and decision-making. A lightweight 3D Semantic-enhanced framework centered on a diffusion policy (SGDP) is proposed to address this challenge, aiming to ensure recognition accuracy, response speed, and decision-making stability under conditions of limited resources and dynamic disturbances.
    Methods Firstly, a semantic projection mechanism based on 3D Gaussian splatting is introduced to construct dense semantic point clouds centered on operational objects. Multi-view semantic fusion is utilized to achieve object-level 3D semantic representations, providing more effective semantic priors for complex manipulation tasks. Subsequently, the semantic field is updated in real-time using the FoundationPose estimator for 6D pose estimation, which eliminates the need for repeated extraction of multi-view semantic features, thereby improving inference efficiency while maintaining semantic consistency in dynamic scenes. Furthermore, a multimodal diffusion policy integrating semantic, geometric, and joint state information is designed to enhance semantic perception while being lightweight enough for practical deployment.
    Results Evaluations conducted on a shipborne robot test platform involving three complex tasks—placing a knife, grasping a marker pen, and pouring water from a bottle—demonstrated that the SGDP algorithm, using only a single depth camera for environmental observation, achieved semantic field update and action inference frequencies of 39.71 Hz and 32.16 Hz, respectively. This represents a fivefold increase compared to the GenDP baseline, enabling real-time closed-loop control on computationally limited platforms. Concurrently, a 20.25% reduction in GPU memory usage was observed compared to GenDP, indicating significantly lower computational cost alongside the improved performance. For tasks with known objects, an average success rate of 81.67% was achieved, while a success rate of 78.33% was maintained for tasks involving unknown objects, demonstrating strong zero-shot generalization capability and reduced reliance on extensive task-specific data collection.
    Conclusion The results indicate that the proposed framework offers an efficient and viable integrated perception-decision solution, achieving effective synergy between environmental cognition and autonomous decision-making under resource-limited conditions. Thus, an effective technical pathway is provided to alleviate the inherent conflict between the requirements for lightweight deployment and high-level autonomous decision-making in single-unit unmanned systems such as shipborne robots.

     

/

返回文章
返回