GCANet：面向视觉物联网的标签文本检测方法

孔二伟*; 窦泽亚**; 张亚邦*; 贾运红*** ****; 王满利**

文章摘要

孔二伟*,窦泽亚**,张亚邦*,贾运红*** ****,王满利**.GCANet：面向视觉物联网的标签文本检测方法[J].高技术通讯(中文),2025,35(10):1059~1068

GCANet：面向视觉物联网的标签文本检测方法

GCANet: a text detection methods for the visual Internet of Things

DOI：10. 3772 / j. issn. 1002-0470. 2025. 10. 003

中文关键词: 视觉物联网；文本检测；坐标注意力模块；全局上下文注意力模块

英文关键词: visual Internet of Things, text detection, coordinate attention module, global context attention module

基金项目:

作者	单位
孔二伟*	(平顶山天安煤业股份有限公司平顶山 467000) (河南理工大学物理与电子信息学院焦作 454000) (中国煤炭科工集团太原研究院有限公司太原 030032) (**煤炭挖掘机械装备国家工程实验室太原 030032)
窦泽亚**
张亚邦*
贾运红* **
王满利**

摘要点击次数: 27

全文下载次数: 54

中文摘要:

针对复杂环境下含标签货物实时记录困难的问题，提出一种面向视觉物联网（visual Internet of Things，VIoT）的文本检测方法。在视觉物联网中设计并引入基于全局上下文注意力和坐标注意力的文本检测网络（text detection network based on global context attention and coordinate attention，GCANet），首先提出一种改进型坐标注意力模块，通过水平和垂直2个并行的一维池化操作，避免了因二维全局池化造成的位置信息丢失；然后引入全局上下文注意力模块，避免在复杂的背景对文本检测的影响，并防止密集或较远间隔的文本被错误地检测。该系统中提出的GCANet在公共数据集ICDAR2015、MSRA-TD500和Total-Text上的综合指标F值分别达到87.4%、86.9%和86.3%。在工业标签数据集Label-Text上平均准确率、平均召回率和平均F值分别达到93.4%、90.9%和92.1%。此外，GCANet在矿井下的标签数据集Mine-Text上准确率、召回率和F值分别达到94.4%、84.9%和89.9%。实验结果表明，本文提出的面向视觉物联网的文本检测方法效果优异。

英文摘要:

Aiming at the difficulty of box label text detection in complex environment, a text detection method for visual Internet of Things is proposed. In this paper, a text detection network based on global context attention and coordinate attention（GCANet） is designed and introduced into the visual Internet of Things. Firstly, an improved coordinate attention module is proposed in the algorithm, which avoids the loss of location information caused by two-dimensional global pooling through two parallel one-dimensional pooling operations, horizontal and vertical. Then, the global context attention module is introduced to avoid the influence of complex background on text detection and prevent dense or distantly spaced texts from being detected incorrectly. The F-measure of the comprehensive index of GCANet proposed in this system on the public datasets ICDAR2015, MSRA-TD500 and Total-Text reaches 87.4%, 86.9% and 86.3%, respectively. The precision, recall and F-measure of GCANet on the industrial label dataset Label-Text reach 93.4%, 90.9% and 92.1%, respectively. In addition, the accuracy, recall and F-measure of GCANet on the Text dataset Mine-Text under the Mine reach 94.4%, 84.9%, and 89.9%, respectively. The experimental results show that the text detection method for visual Internet of things proposed in this paper has excellent effect.

查看全文查看/发表评论下载PDF阅读器

关闭