The visual phenomenon behind hybrid images is called "visuospatial resonance". We are in the field of neuroscience, and the first discoveries date back to the 1970s. The question was to understand how the brain analyzes an image.
On the one hand there are the visual stimuli provided by the eyes, and on the other hand there is the memory of the observer. In concrete terms, everything happens in less than 0.1 seconds. Yet it is a relatively laborious iterative process of comparison, which starts with a rough vision then detail after detail will be able to arrive at the precise interpretation of the object looked at.
Obviously, a quality view will be a big advantage. I can guarantee that without glasses, you will not recognize much in less than 0.15 sec. To qualify the sharpness of the image, we talk about frequency. High-frequency images will be images with very sharp edges, with little detail. We see them very well up close, but after a few meters they disappear in our eyes. Conversely, low-frequency images will appear blurred at close range, but sharp from a distance. It is by combining these two images (cf: resonance) that we can produce these famous hybrid images.
But why does the eye work this way? I, who thought our brain was a lazy one, why does it take care to look at two visuospatial areas at the same time? Couldn't it have found a more economical way?
The eye: "Hello brain, I'm sending you an HD picture, I don't know what it is, but figure it out..."
The brain: "Ok, wait 30 seconds, it's downloading..."
The eye: "Go ahead and hurry ! Maybe it's a tiger !"
The brain: "Hey easy, I'm still 56kpbs... OK, it's OK, it's a cat!"
You will have understood it, this dialogue is pure fiction, and any resemblance with a scene that has already existed would only be fortuitous. But there's a lot of truth in this exchange. In fact, if all the information were to pass between the eye and the brain at once, we would have big problems of cognitive saturation. To do this, nature has given us two receiving channels, each with a different bit rate.
That's where the iterations begin. The low frequencies are sent to the brain very quickly, to get the first feedback. This "coarse" visual information would allow a first recognition of the information. You can experience these "low frequencies" yourself, since they are the "blurred" areas at the edges of our field of vision. When the tiger surreptitiously enters your living room, when it sneaks in between the TV and the sofa while you were concentrated on reading this article, it is the low frequencies that will save your life, even before you saw that it was a cat. At that point, the stopwatch is at about .08 seconds.
The brain: "Go ahead and raise your head, I'm not sure that the moving red spot isn't a tiger! I'd like to check."
At this point, the brain can then ask the eye to look more closely. The whole body is on alert. The high frequencies will come into play to confirm or reject the coarse image recognition.
The brain: "Can we stop the stopwatch? Would I like to know my performance?"
There's nothing to be afraid of, it was a cat. The stopwatch reads 0.15 seconds.