Video expression recognition based on frame level attention mechanism

CHEN Rui（陈瑞）*; TONG Ying*; ZHANG Yiye**; XU Bo**

文章摘要

CHEN Rui（陈瑞）*,TONG Ying*,ZHANG Yiye**,XU Bo**.[J].高技术通讯(英文),2023,29(2):130~139

Video expression recognition based on frame level attention mechanism

DOI：10. 3772/ j. issn. 1006-6748. 2023. 02. 003

中文关键词:

英文关键词: facial expression recognition(FER), video sequence, attention mechanism, feature extraction, enhanced feature, VGG network, image classification, neural network

基金项目:

Author Name	Affiliation
CHEN Rui（陈瑞）*	（College of Information & Communication Engineering, Nanjing Institute of Technology, Nanjing 211167, PRChina）（*Jiangsu Future Network Innovation Research Institute, Nanjing 211111, PRChina）
TONG Ying*	（College of Information & Communication Engineering, Nanjing Institute of Technology, Nanjing 211167, PRChina）（*Jiangsu Future Network Innovation Research Institute, Nanjing 211111, PRChina）
ZHANG Yiye**	（College of Information & Communication Engineering, Nanjing Institute of Technology, Nanjing 211167, PRChina）（*Jiangsu Future Network Innovation Research Institute, Nanjing 211111, PRChina）
XU Bo**	（College of Information & Communication Engineering, Nanjing Institute of Technology, Nanjing 211167, PRChina）（*Jiangsu Future Network Innovation Research Institute, Nanjing 211111, PRChina）

Hits: 2681

Download times: 2805

中文摘要:

英文摘要:

Facial expression recognition(FER) in video has attracted the increasing interest and many approaches have been made.The crucial problem of classifying a given video sequence into several basic emotions is how to fuse facial features of individual frames.In this paper, a frame level attention module is integrated into an improved VGG based frame work and a lightweight facial expression recognition method is proposed.The proposed network takes a sub video cut from an experimental video sequence as its input and generates a fixed dimension representation.The VGG based network with an enhanced branch embeds face images into feature vectors.The frame level attention module learns weights which are used to adaptively aggregate the feature vectors to form a single discriminative video representation.Finally, a regression module outputs the classification results.The experimental results on CK+and AFEW databases show that the recognition rates of the proposed method can achieve the state of the art performance.

View Full Text View/Add Comment Download reader