Saliency Prediction Based on Multi-Channel Models of Visual Processing

Qiang Li
Image and Signal Processing Lab (ISP), University of Valencia, Valencia, Spain

Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Atlanta, USA


alt text

Abstract

Visual attention is one of the most significant characteristics for selecting and understanding the outside redundancy world. The human vision system cannot process all information simultaneously due to the visual information bottleneck. In order to reduce the redundant input of visual information, the human visual system mainly focuses on dominant parts of scenes. This is commonly known as visual saliency map prediction. This paper proposed a new psychophysical oriented saliency prediction architecture, which inspired by multi-channel model of visual cortex functioning in humans. The model consists of opponent color channels, wavelet transform, wavelet energy map, and contrast sensitivity function for extracting low-level image features and providing a maximum approximation to the low-level human visual system. The proposed model is evaluated using several datasets, including the MIT1003, MIT300, TORONTO, SID4VAM, and UCF Sports datasets. We also quantitatively and qualitatively compare the saliency prediction performance with that of other state-of-the-art models. Our model achieved strongly stable and better performance with different metrics on natural images, psychophysical synthetic images and dynamic videos. Additionally, we suggested that Fourier and spectral-inspired saliency prediction models outperformed other state-of-the-art non-neural network and even deep neural network models on psychophysical synthetic images. In the meantime, we suggest that deep neural networks need specific architectures and goals to be able to predict salient performance on psychophysical synthetic images better and more reliably. Finally, the proposed model could be used as a computational model of primate low-level vision system and help us understand mechanism of primate low-level vision system.


Partial Datasets

alt dataset

Left:MIT1003, Right:MIT300



Partial Results

alt mit1003

Performance evaluation with MIT1003 database. The first row are color images, second row are ground truth saliency maps and the last row are proposed model prediction saliency maps.

alt mit300

Left:Performance evaluation with MIT300 database. The first and third column are color images. The second and fourth column are proposed model prediction saliency maps. Right:Performance evaluation with TORONTO database. The first and third column are color images. The second and fourth column are proposed model prediction saliency maps either.

alt mit1003 model

Qualitative saliency prediction results from MIT1003 database with selected different models. The first row is six stimuli images selected from the MIT1003 database. Then follow with Achanta, AIM, HFT, ICL, ITII, SIM, Proposed models, and Ground Truth (GT) saliency prediction result with artificial color for better visualization.

alt sid4vam model

Qualitative saliency prediction results on the SID4VAM dataset with different models. The first row shows six stimuli images selected from the SID4VAM dataset. The rows beneath show the salience prediction results obtained with Achanta, AIM, HFT, ICL, ITII, SIM, and the proposed model, as well as the ground truth (GT) salience, with artificial color for better visualization. The proposed model can be successfully applied to explain the “pop-out” effects in the visual search.

alt results

Comparison of the area-under-the-curve (AUC) and PR curves with different thresholds of our method and other state-of-art methods on three benchmark datasets.



Publications

Saliency Prediction Based on Multi-Channel Models of Visual Processing.
Qiang Li
[Spotlight] Machine Vision and Applications, 2023.



Related Research

alt literature


Acknowledgements

This work was partially funded by a Spanish grant from GVA Grisolía-P/2019/035.



Webpage template modified from Richard Zhang.