Deep Ensemble Machine for Video Classification

Tags: , , , ,

Aim:

This paper aim to classify the different types of videos using deep learning framework with convolution neural network (CNN)

Existing System:

The convolution 3-D (C3-D) and VGG (vision and graphics group) are first deployed to extract temporal and spatial features from the input videos cooperatively, which establishes comprehensive and informative representations of videos. VGG and C3-D are chosen due to their strong capability of extracting complementary spatial and temporal features for comprehensive video representations. The resultant high-dimensional representations are further reduced by random projections into a set of lower dimensional subspaces, on which an ensemble of efficient classifiers is trained with these lower dimensional features. The introduced RLE layer is further deployed to encode the initial outputs of classifiers, which is followed by a weighting layer jointly learned in the end-to-end framework to combine classification results.

Proposed System:

In this proposed system, we propose the convolution neural network method for action recognition in video. The input video will be captured by using the webcam. The input video is converted into number of frames. Then the CNN (Convolution Neural Network) algorithm is used in order to detect the particular part of the frame. Then the maximum weight values are taken from the feature extraction frames by using the Convolution neural network. Finally the action will be detected in the videos and then the label (action name) is identified.