Our paper on “Video Event Classification using String Kernels“ was accepted for publication by Springer International Journal on Multimedia Tools and Applications (MTAP) in the special issue on Content-Based Multimedia Indexing.
In this paper we present a method to introduce temporal information for video event recognition within the bag-of-words (BoW) approach. Events are modeled as a sequence composed of histograms of visual features, computed from each frame using the traditional BoW. The sequences are treated as strings phrases where each histogram is considered as a character. Event classification of these sequences of variable length, depending on the duration of the video clips, are performed using SVM classifiers with a string kernel that uses the Needlemann-Wunsch edit distance.