Figure 1: An overview of our CMA-Net. RGB-D high level features extracted from duel-branch encoder are fed into two proposed cascaded mutual attention
modules, followed by a group of (de-)convolutional layers used in BBSNet. The abbreviations in the figure are detailed as follows: AiF Image = all-in-focus
image. GT = ground truth. Resi = the ith ResNet layer. (De)Conv = (de-)convolutional layer. MAi = the ith mutual attention module. CMA = cascaded
mutual attention module. CW = column-wise normalization. RW = row-wise normalization.
we propose CMA-Net, being similar to SA-Net and SA-Net-V2, which consists of two novel cascaded mutual attention modules aiming at fusing the high level features from different modalities. Our proposed CMANet outperforms 30 state-of-the-art SOD methods on two widely applied light field benchmark datasets. Besides, the proposed CMA-Net is able to inference at a speed of 53 fps, thus being much faster than the top-ranked light field SOD methods.
Download the saliency prediction maps at Google Drive or OneDrive.
Download the pretrained model at Google Drive or OneDrive.
Please refer to CMANet_train.py.