文章摘要
Wu Jin (吴进),Min Yu,Shi Qianwen,Zhang Weihua,Zhao Bo.[J].高技术通讯(英文),2020,26(4):372~382
Behavior recognition based on the fusion of 3D-BN-VGG and LSTM network
  
DOI:10.3772/j.issn.1006-6748.2020.04.004
中文关键词: 
英文关键词: behavior recognition, deep learning, 3 dimensional batch normalization visual geometry group (3D-BN-VGG), long short-term memory (LSTM) network
基金项目:
Author NameAffiliation
Wu Jin (吴进) (School of Electronic and Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China) 
Min Yu (School of Electronic and Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China) 
Shi Qianwen (School of Electronic and Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China) 
Zhang Weihua (School of Electronic and Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China) 
Zhao Bo (School of Electronic and Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, P.R.China) 
Hits: 1547
Download times: 1370
中文摘要:
      
英文摘要:
      In order to effectively solve the problems of low accuracy, large amount of computation and complex logic of deep learning algorithms in behavior recognition, a kind of behavior recognition based on the fusion of 3 dimensional batch normalization visual geometry group(3D-BN-VGG) and long short-term memory (LSTM) network is designed. In this network, 3D convolutional layer is used to extract the spatial domain features and time domain features of video sequence at the same time, multiple small convolution kernels are stacked to replace large convolution kernels, thus the depth of neural network is deepened and the number of network parameters is reduced. In addition, the latest batch normalization algorithm is added to the 3-dimensional convolutional network to improve the training speed. Then the output of the full connection layer is sent to LSTM network as the feature vectors to extract the sequence information. This method, which directly uses the output of the whole base level without passing through the full connection layer, reduces the parameters of the whole fusion network to 15324485, nearly twice as much as those of 3D-BN-VGG. Finally, it reveals that the proposed network achieves 96.5% and 74.9% accuracy in the UCF-101 and HMDB-51 respectively, and the algorithm has a calculation speed of 1066 fps and an acceleration ratio of 1, which has a significant predominance in velocity.
View Full Text   View/Add Comment  Download reader
Close

分享按钮