堆叠覆盖环境下的深度强化学习机械臂避障抓取方法

禹鑫燚; 周晨; 俞俊鑫; 曹铭洲; 欧林林

文章摘要

禹鑫燚,周晨,俞俊鑫,曹铭洲,欧林林.堆叠覆盖环境下的深度强化学习机械臂避障抓取方法[J].高技术通讯(中文),2025,35(3):284~296

堆叠覆盖环境下的深度强化学习机械臂避障抓取方法

Obstacle avoidance and grasping method for robotic arm based on deep reinforcement learning in stack-overlayed environment

DOI：10. 3772 / j. issn. 1002-0470. 2025. 03. 006

中文关键词: 堆叠覆盖环境；避障抓取；图像编码器；深度强化学习；二次行为克隆

英文关键词: stack-overlayed，obstacle avoidance and grasping，image encoder，deep reinforcement learning（DRL），second-order behavior cloning

基金项目:

作者	单位
禹鑫燚	（浙江工业大学信息工程学院杭州 310023）
周晨
俞俊鑫
曹铭洲
欧林林

摘要点击次数: 2361

全文下载次数: 1743

中文摘要:

堆叠覆盖环境下的机械臂避障抓取是一个重要且有挑战性的任务。针对机械臂在堆叠环境下的避障抓取任务，本文提出了一种基于图像编码器和深度强化学习（deep reinforcement learning，DRL）的机械臂避障抓取方法Ec-DSAC(encoder and crop for discrete SAC)。首先设计结合YOLO（you only look once）v5和对比学习网络编码的图像编码器，能够编码关键特征和全局特征，实现像素信息至向量信息的降维。其次结合图像编码器和离散软演员评价家（soft actor-critic，SAC）算法，设计离散动作空间和密集奖励函数约束并引导策略输出的学习方向，同时使用随机图像裁剪增加强化学习的样本效率。最后，提出了一种应用于深度强化学习预训练的二次行为克隆方法，增强了强化学习网络的学习能力并提高了控制策略的成功率。仿真实验中Ec DSAC的避障抓取成功率稳定高于80.0%，验证其具有比现有方法更好的避障抓取性能。现实实验中避障抓取成功率为73.3%，验证其在现实堆叠覆盖环境下避障抓取的有效性。

英文摘要:

Obstacle avoidance and grasping with a robotic arm in a stack-overlayed environment is an important and challenging task. This paper proposes a method called encoder and crop for discrete SAC(Ec-DSAC), which is based on image encoder and deep reinforcement learning（DRL）, for obstacle avoidance and grasping with a robotic arm in a stack-overlayed environment. Firstly, we design an image encoder that combines you only look once (YOLO)v5 and contrastive learning network, which can encode key features and global features, and reduce pixel information to vector information. Secondly, we combine the image encoder with discrete soft actor-critic（SAC） to design a discrete action space and a dense reward function, which constrain and guide the learning direction of the policy output. And random image crop is used to increase the sample efficiency of reinforcement learning. Finally, we propose a second-order behavior cloning method for pre-training in deep reinforcement learning, which enhances the learning ability of the reinforcement learning network and improves the success rate of the control policy. In the simulation experiments, the success rate of Ec-DSAC for obstacle avoidance and grasping is consistently above 80.0% in different scenarios, which verifies its better performance in obstacle avoidance and grasping than existing methods. In the real-world experiments, the success rate of obstacle avoidance and grasping is 73.3%, which verifies its effectiveness in obstacle avoidance and grasping in real-world stack-overlayed environments.

查看全文查看/发表评论下载PDF阅读器

关闭