Saturday, January 13, 2007

Interview with Sebastien Billard

Google auto-translation from French to English

The presentation of ViewFinder 2 days ago had raised some comments and interrogations. Rafael Mizrahi, director of technology at Feng-GUI, and creator of the algorithm agreed to answer my questions.

Sebastien Billiards: Hello Rafael, could you present yourselves at the readers? Rafael Mizrahi: I have worked in data-processing industry for more than 16 years. Playing of the music and making painting, I always had a strong sensitivity for the harmony. These two aspects of my personality naturally led me to the study of the user interfaces, a branch of data-processing research.

SB: When started to develop this algorithm?
RM: I undertook research on the composition dynamic and taught during the 10 last years. The implementation of ViewFinder strictly speaking did not begin that approximately 2 years ago.

SB: Which research did you use to develop ViewFinder?
RM: The question is often asked to us, this is why we will add more information on the site on this subject. But if I were to summarize in only one word, I will say: the salience (NdT: i.e. capacity of an element to be arisen at the time of visual perception of a scene, at the point to take a particular cognitive importance).
More information in this Powerpoint

The ViewFinder algorithm creates a map of salience site. The charts of salience were developed during the 25 last years by the research laboratories on the numerical vision. The algorithm was developed then compared with the experimental results of research on the movements of the glance, in order to accurately represent the way in which the human ones are attracted by the visual ones.

SB: Does your algorithm analyze only contrasts, or takes it in account other stimuli or behaviors?
RM: ViewFinder takes into account contrasts, but also the colors, the movements, textures, flows as well as other criteria, with an aim of behaving like an eye and a brain (model “bottom-up”, eye towards the brain). We also work to include in the algorithm of the capacities of detection of the texts and the faces, which are key elements of the attention at the human ones (model “top-down”).

SB: What do you hear exactly by “flow”?
RM: That they are flow, movements, of textures, all that report/ratio with the reasons which one can find in the images. For example, take a car of small size (let us say 2% of the surface of the image) according to a road with mountainside. The algorithm of detection of the movement included in ViewFinder can identify this car, because it breaks the fluidity of the texture of the mountain.
SB: And concerning the text, do you speak to analyze the direction of the texts, or only their appearance?

RM: The detection of the texts (in fact their localization) as that of the faces are used to determine the places posting of the text and the faces. They are algorithms of classification, which locate reasons, but do not try to compare them with a data base biometric or to carry out a character recognition. It is thus a question just of knowing that there is something of interesting at a given place.

SB: Your tool often suggests a visual attention paid to the edges, whereas these zones are empty. Is it about a bug? of an artifact?

RM: Indeed, a certain number of people pointed out it to us, and we think of providing examples and of explaining these results. It is not a bug. Very often they are areas presenting a strong contrast with the interior zone, and these zones attract your attention, even if it is in a subliminal way and that they do not contain anything significant. As underlines it the article “Psychology of the form and dynamic symmetry”, the rate/rhythm is in the time what symmetry is with space.

SB: Your tool does not analyze the direction, i.e. meant elements. Up to what point content of the texts or the images does it affect the visual attention? Does the visual attention depend on what is represented, or depends it only on the way in which the things are represented?
RM: The attention can as well be reflexive, impulsive (“bottom-up”) that cognitive, related to the context (“top-down”). It depends at the same time on “how” and “what”.

Take this example: you lead the night, on a circular road. On this road, a car is parked, with its lit indicators. Your attention is drawn by these light which ignites and die out, once (“bottom-up”). You carry on your road, and start to be unaware of these lights (“top-down”) because you know that this car will not have any more an influence on you. It is right a car parking itself. SB: Which are the future projects and developments?
RM: Our Company Feng-GUI has as a specialty the perception of the visual ones, which it is of attention or attraction. Our business model is to develop various applications of Viewfinder, for then integrating them in the products of companies leaders such as Apple, Adobe, Google, Yahoo, etc.