Model
Digital Document
Publisher
Florida Atlantic University
Description
For an 8-bit grayscale image patch of size n x n, the number of distinguishable
signals is 256(n2). Natural images (e.g.,photographs of a natural scene) comprise a
very small subset of these possible signals. Traditional image and video processing
relies on band-limited or low-pass signal models. In contrast, we will explore the
observation that most signals of interest are sparse, i.e. in a particular basis most
of the expansion coefficients will be zero. Recent developments in sparse modeling
and L1 optimization have allowed for extraordinary applications such as the single
pixel camera, as well as computer vision systems that can exceed human performance.
Here we present a novel neural network architecture combining a sparse filter model
and locally competitive algorithms (LCAs), and demonstrate the networks ability to
classify human actions from video. Sparse filtering is an unsupervised feature learning
algorithm designed to optimize the sparsity of the feature distribution directly without
having the need to model the data distribution. LCAs are defined by a system of
di↵erential equations where the initial conditions define an optimization problem and the dynamics converge to a sparse decomposition of the input vector. We applied
this architecture to train a classifier on categories of motion in human action videos.
Inputs to the network were small 3D patches taken from frame di↵erences in the
videos. Dictionaries were derived for each action class and then activation levels for
each dictionary were assessed during reconstruction of a novel test patch. We discuss
how this sparse modeling approach provides a natural framework for multi-sensory
and multimodal data processing including RGB video, RGBD video, hyper-spectral
video, and stereo audio/video streams.
signals is 256(n2). Natural images (e.g.,photographs of a natural scene) comprise a
very small subset of these possible signals. Traditional image and video processing
relies on band-limited or low-pass signal models. In contrast, we will explore the
observation that most signals of interest are sparse, i.e. in a particular basis most
of the expansion coefficients will be zero. Recent developments in sparse modeling
and L1 optimization have allowed for extraordinary applications such as the single
pixel camera, as well as computer vision systems that can exceed human performance.
Here we present a novel neural network architecture combining a sparse filter model
and locally competitive algorithms (LCAs), and demonstrate the networks ability to
classify human actions from video. Sparse filtering is an unsupervised feature learning
algorithm designed to optimize the sparsity of the feature distribution directly without
having the need to model the data distribution. LCAs are defined by a system of
di↵erential equations where the initial conditions define an optimization problem and the dynamics converge to a sparse decomposition of the input vector. We applied
this architecture to train a classifier on categories of motion in human action videos.
Inputs to the network were small 3D patches taken from frame di↵erences in the
videos. Dictionaries were derived for each action class and then activation levels for
each dictionary were assessed during reconstruction of a novel test patch. We discuss
how this sparse modeling approach provides a natural framework for multi-sensory
and multimodal data processing including RGB video, RGBD video, hyper-spectral
video, and stereo audio/video streams.
Member of