融合人体感知和多模态手势的人机交互方法和系统设计

禹鑫燚; 张鑫; 许成军; 欧林林

文章摘要

禹鑫燚,张鑫,许成军,欧林林.融合人体感知和多模态手势的人机交互方法和系统设计[J].高技术通讯(中文),2025,35(2):183~197

融合人体感知和多模态手势的人机交互方法和系统设计

Human-robot interaction method and system design by fusing human perception and multimodal gestures

DOI：10. 3772 / j. issn. 1002-0470. 2025. 02. 008

中文关键词: 人机交互；人体感知；多模态手势识别；交互任务

英文关键词: human-robot interaction, human perception, multimodal gesture recognition, interaction task

基金项目:

作者	单位
禹鑫燚	（浙江工业大学信息工程学院杭州 310023）
张鑫
许成军
欧林林

摘要点击次数: 2015

全文下载次数: 1542

中文摘要:

针对现有受限于预编码形式的人机交互（human-robot interaction，HRI）无法感知人员交互意图而缺乏灵活性和不同任务场景的泛化性问题，提出融合人体感知和多模态手势的人机交互方法。首先，设计融合人体感知的多模态手部检测方法，以人体姿态为先验得到多模态手部特征，动态适应不同检测距离，实现多人交互手势的在线检测并建立交互指令与人员身份的对应关系；其次，基于手部检测结果采集多模态交互手势数据集并构建通用手势交互指令集；然后，设计多模态手势交互指令融合识别方法，通过数据增强和手势旋转映射减少复杂场景对识别的影响；最后，构建人机交互方法框架。实验结果表明，本文提出的手部检测方法具有实际可用性；融合识别方法准确率达到99%以上，性能优于单一模态，与其他方法相比具有较好性能。通过人机协作拼装、人机协作搬运以及任务点记录和复现等典型人机交互任务，验证了所提人机交互方法的可行性和有效性。

英文摘要:

Aiming at problems that the existing human-robot interaction (HRI) lacks flexibility and generalization across different tasks since it is limited by the pre-programmed form and cannot perceive human interaction intentions, this paper proposes a HRI method that fuses human perception and multimodal gestures. Firstly, a multimodal hand detection method that incorporates human perception is designed. The method takes human poses as a priori to obtain multimodal hand features, dynamically adapts to different detection distances, realizes online detection of multi-person interaction gestures, and obtains the corresponding relationship between interaction commands and personnel identities. Secondly, based on the hand detection method, a multimodal interaction gesture dataset is collected, and a general gesture interaction instruction set is constructed. Thirdly, a multimodal gesture recognition method is designed. Data augmentation and gesture rotation mapping are used to reduce the impact of complex scenarios on recognition. Finally, a framework of the proposed HRI method is built. Experimental results indicate that the proposed hand detection method has practical usability. The accuracy of the multimodal gesture recognition method reaches over 99%, and its performance is better than that of a single modality and other methods. Typical human-robot interaction tasks, such as collaborative assembly, collaborative transportation, and task point recording and reproduction, verify the feasibility and effectiveness of the proposed HRI method.

查看全文查看/发表评论下载PDF阅读器

关闭