Intelligent multi-camera systems that integrate computer vision algorithms are not error free, and thus both false positive and negative detections need to be revised by a specialized human operator. Traditional multi-camera systems usually include a control center with a wall of monitors displaying videos from each camera of the network. Nevertheless, as the number of cameras increases, switching from a camera to another becomes hard for a human operator. In this work we propose a new method that dynamically selects and displays the content of a video camera from all the available contents in the multi-camera system. The proposed method is based on a computational model of human visual attention that integrates top-down and bottom-up cues. We believe that this is the first work that tries to use a model of human visual attention for the dynamic selection of the camera view of a multi-camera system. The proposed method has been experimented in a given scenario and has demonstrated its effectiveness with respect to the other methods and manually generated ground-truth. The effectiveness has been evaluated in terms of number of correct best-views generated by the method with respect to the camera views manually generated by a human operator.
Napoletano, P., Tisato, F. (2014). An attentive multi-camera system. In Image Processing: Machine Vision Applications VII. SPIE [10.1117/12.2042652].
An attentive multi-camera system
NAPOLETANO, PAOLO;TISATO, FRANCESCO
2014
Abstract
Intelligent multi-camera systems that integrate computer vision algorithms are not error free, and thus both false positive and negative detections need to be revised by a specialized human operator. Traditional multi-camera systems usually include a control center with a wall of monitors displaying videos from each camera of the network. Nevertheless, as the number of cameras increases, switching from a camera to another becomes hard for a human operator. In this work we propose a new method that dynamically selects and displays the content of a video camera from all the available contents in the multi-camera system. The proposed method is based on a computational model of human visual attention that integrates top-down and bottom-up cues. We believe that this is the first work that tries to use a model of human visual attention for the dynamic selection of the camera view of a multi-camera system. The proposed method has been experimented in a given scenario and has demonstrated its effectiveness with respect to the other methods and manually generated ground-truth. The effectiveness has been evaluated in terms of number of correct best-views generated by the method with respect to the camera views manually generated by a human operator.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.