In search mode, firstly the bottom-up saliency map is computed. Additionally, we determine a top-down saliency map that competes with the bottom-up map for saliency. The top-down map is composed of an excitation and an inhibition map. The excitation map is the weighted sum of all feature maps that are important for the learned object, namely the features with weights greater than 1. The inhibition map contains the feature maps that are not present in the learned object, namely the features with weights smaller than 1:
The top-down saliency map is obtained by: . The final saliency map is composed as a combination of bottom-up and top-down influences. When fusing the maps, it is possible to determine the degree to which each map contributes by weighting the maps with a top-down factor : .
With , VOCUS looks only for the specified target. With , also bottom-up cues have an influence and may divert the focus of attention. This is also an important mechanism in human visual attention. E.g., a person suddenly entering a room catches immediately our attention, independently of the task. For the application discussed in this paper, we always use and use the bottom-up saliency only to learn the weights of the training objects. Thus, the robot focuses its attention completely on the ball and not to play foul on other robots.