Computer vision--Mathematical models.

Model
Digital Document
Publisher
Florida Atlantic University
Description
Research on Multiple Object Tracking (MOT) has typically involved 2D displays where
stimuli move in a single depth plane. However, under natural conditions, objects move in 3D
which adds complexity to tracking. According to the spatial interference model, tracked
objects have an inhibitory surround that when crossed causes tracking errors. How do
these inhibitory fields translate to 3D space? Does multiple object tracking operate on a
2D planar projection, or is it in fact 3D? To investigate this, we used a fully immersive
virtual-reality environment where participants were required to track 1 to 4 moving
objects. We compared performance to a condition where participants viewed the same
stimuli on a computer screen with monocular depth cues. Results suggest that participants
were more accurate in the VR condition than the computer screen condition. This
demonstrates interference is negligent when the objects are spatially distant, yet
proximate within the 2D projection.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Scene understanding attempts to produce a textual description of visible and
latent concepts in an image to describe the real meaning of the scene. Concepts are
either objects, events or relations depicted in an image. To recognize concepts, the
decision of object detection algorithm must be further enhanced from visual
similarity to semantical compatibility. Semantically relevant concepts convey the
most consistent meaning of the scene.
Object detectors analyze visual properties (e.g., pixel intensities, texture, color
gradient) of sub-regions of an image to identify objects. The initially assigned
objects names must be further examined to ensure they are compatible with each
other and the scene. By enforcing inter-object dependencies (e.g., co-occurrence,
spatial and semantical priors) and object to scene constraints as background
information, a concept classifier predicts the most semantically consistent set of
names for discovered objects. The additional background information that describes
concepts is called context.
In this dissertation, a framework for building context-based concept detection is
presented that uses a combination of multiple contextual relationships to refine the
result of underlying feature-based object detectors to produce most semantically compatible concepts.
In addition to the lack of ability to capture semantical dependencies, object
detectors suffer from high dimensionality of feature space that impairs them.
Variances in the image (i.e., quality, pose, articulation, illumination, and occlusion)
can also result in low-quality visual features that impact the accuracy of detected
concepts.
The object detectors used to build context-based framework experiments in this
study are based on the state-of-the-art generative and discriminative graphical
models. The relationships between model variables can be easily described using
graphical models and the dependencies and precisely characterized using these
representations. The generative context-based implementations are extensions of
Latent Dirichlet Allocation, a leading topic modeling approach that is very
effective in reduction of the dimensionality of the data. The discriminative contextbased
approach extends Conditional Random Fields which allows efficient and
precise construction of model by specifying and including only cases that are
related and influence it.
The dataset used for training and evaluation is MIT SUN397. The result of the
experiments shows overall 15% increase in accuracy in annotation and 31%
improvement in semantical saliency of the annotated concepts.