Deep Ensemble Machine for Video Classification
This paper aim to classify the different types of videos using deep learning framework with convolution neural network (CNN)
The convolution 3-D (C3-D) and VGG (vision and graphics group) are ﬁrst deployed to extract temporal and spatial features from the input videos cooperatively, which establishes comprehensive and informative representations of videos. VGG and C3-D are chosen due to their strong capability of extracting complementary spatial and temporal features for comprehensive video representations. The resultant high-dimensional representations are further reduced by random projections into a set of lower dimensional subspaces, on which an ensemble of efficient classiﬁers is trained with these lower dimensional features. The introduced RLE layer is further deployed to encode the initial outputs of classiﬁers, which is followed by a weighting layer jointly learned in the end-to-end framework to combine classiﬁcation results.
In this proposed system, we propose the convolution neural network method for action recognition in video. The input video will be captured by using the webcam. The input video is converted into number of frames. Then the CNN (Convolution Neural Network) algorithm is used in order to detect the particular part of the frame. Then the maximum weight values are taken from the feature extraction frames by using the Convolution neural network. Finally the action will be detected in the videos and then the label (action name) is identified.