基于多维语义特征与层次注意力机制的讽刺识别

宋留静* **; 赵泽方* **; 马宇翔***; 申罕骥*; 李俊* **

文章摘要

宋留静* **,赵泽方* **,马宇翔***,申罕骥*,李俊* **.基于多维语义特征与层次注意力机制的讽刺识别[J].高技术通讯(中文),2024,34(5):453~462

基于多维语义特征与层次注意力机制的讽刺识别

Sarcasm recognition based on multi-dimensional semantic features and hierarchical attention mechanism

DOI：10. 3772 / j. issn. 1002-0470. 2024. 05. 002

中文关键词: 讽刺识别；自然语言处理；多维语义表示；层次注意力机制

英文关键词: sarcasm recognition, natural language processing, multi-dimensional semantic, hierarchical attention mechanism

基金项目:

作者	单位
宋留静* **	（中国科学院计算机网络信息中心北京 100190）（中国科学院大学北京 100049）（**河南大学计算机与信息工程学院开封 475004）
赵泽方* **
马宇翔***
申罕骥*
李俊* **

摘要点击次数: 4476

全文下载次数: 3051

中文摘要:

讽刺是一种复杂的语言表达方式，在日常交流中发挥着重要作用。随着人工智能和社交网络的快速发展，讽刺识别已成为自然语言处理领域的热点研究课题之一。现有的讽刺识别研究往往从单一维度对讽刺文本特征进行表示，忽视了讽刺文本特征的细微差异及其重要程度。本文将讽刺识别视为文本分类任务，在特征提取阶段，将讽刺文本根据其不一致性特征、情感特征、句法结构特征和风格特征进行多维语义特征表示。在特征融合阶段，针对不同维度特征对整体特征贡献和关联程度不同，采用层次注意力机制调整不同讽刺语言学特征对模型整体性能的影响。实验结果表明，所提出的模型能够从多个维度提取讽刺文本的潜在语义特征，其在公开数据集IAC、Tweets和Reddit上的实验性能均有明显提升。

英文摘要:

Sarcasm is a complex language expression that plays an important role in everyday communication. With the rapid development of artificial intelligence and social networks, making computers to automatically recognize sarcasm has become one of the hot research topics in the field of natural language processing. Existing research on sarcasm recognition often expresses samantic features from a single dimension, ignoring the subtle differences and importance of samantic features. This paper treats sarcasm recognition as a kind of natural language classification task, in the feature extraction stage, the sarcasm text is represented by multi-dimensional semantic features according to its inconsistency features, affective features, dependency structure features and style features. In the feature fusion stage, the hierarchical attention mechanism is used to adjust the impact of different samantic linguistic features on the overall performance of the model in view of the different contribution and correlation degree of different dimension features to the overall feature. The experimental results show that the proposed model can extract the latent semantic features of satirical text from multiple dimensions, bring a significant improvement on public datasets IAC, Tweets and Reddit.

查看全文查看/发表评论下载PDF阅读器

关闭