The semantics that are of interest in a surveillance
system can be classified into three categories: Targets (e.g
pedestrians, cars, large vehicles), Actions (e.g. move, stop,
enter/exit, accelerate, turn left/right)
and Scene Static features (e.g. road, corridor, door, gate, ATM, desk,
bus stop). The proposed general scheme is that
Targets perform
Actions in a
environment consisting of other targets and
scene static features. This
work is mainly about learning the scene static features.
Learning in Visual Surveillance
We suggest that learning in Visual Surveillance must be performed
mainly unsupervisedly, for two reasons:
Firstly, to exploit the vast amount of observations that are available
due to
the continuous operation of the surveillance system.
Secondly, to allow the development of systems that can automatically
learn their
environment so they can be easily installed (plug’n’play) and adapt.
A Reverse Engineering Approach