A family of early-vision neurons reacting to directional transitions from high to low spatial frequency.
Some of the neurons in vision models are features that we aren’t particularly surprised to find. Curve detectors, for example, are a pretty natural feature for a vision system to have. In fact, they had already been discovered in the animal visual cortex
High-low frequency detectors, on the other hand, seem more surprising. They are not a feature that we would have expected a priori to find. Yet, when systematically characterizingmixed3a
that appear to detect a high frequency pattern on one side, and a low frequency pattern on the other.
One worry we might have about the circuits approach
How can we be sure that “high-low frequency detectors” are actually detecting directional transitions from low to high spatial frequency? We will rely on three methods:
Later on in the article, we dive into the mechanistic details of how they are both implemented and used. We will be able to understand the algorithm that implements them, confirming that they detect high to low frequency transitions.
A feature visualization
From their feature visualizations, we observe that all of these high-low frequency detectors share these same characteristics:
We can use a diversity term in our feature visualizations to jointly optimize for the activation of a neuron while encouraging different activation patterns in a batch of visualizations. We are thus reasonably confident that if high-low frequency detectors were also sensitive to other patterns, we would see signs of them in these feature visualizations. Instead, the frequency contrast remains an invariant aspect of all these visualizations. (Although other patterns form along the boundary, these are likely outside the neuron’s effective receptive field.)
We generate dataset examples by sampling from a natural data distribution (in this case, the training set) and selecting the images that cause the neurons to maximally activate. Checking against these examples helps ensure we’re not misreading the feature visualizations.
A wide range of real-world situations can cause high-low frequency detectors to fire. Oftentimes it’s a highly-textured, in-focus foreground object against a blurry background — for example, the foreground might be the microphone’s latticework, the hummingbird’s tiny head feathers, or the small rubber dots on the Lenovo ThinkPad pointing stick — but not always: we also observe that it fires for the MP3 player’s brushed metal finish against its shiny screen, or the text of a watermark.
In all cases, we see one area with high frequency and another area with low frequency. Although they often fire at an object boundary, they can also fire in cases where there is a frequency change without an object boundary. High-low frequency detectors are therefore not the same as boundary detectors.
Tuning curves show us how a neuron’s response changes with respect to a parameter.
They are a standard method in neuroscience
To construct such a curve, we’ll need a set of synthetic stimuli which cause high-low frequency detectors to fire. We generate images with a high-frequency pattern on one side and a low-frequency pattern on the other. Since we’re interested in orientation, we’ll rotate this pattern to create a 1D family of stimuli:
But what frequency should we use for each side? How steep does the difference in frequency need to be? To explore this, we’ll add a second dimension varying the ratio between the two frequencies:
(Adding a second dimension will also help us see whether the results for the first dimension are robust.)
Now that we have these two dimensions, we sample the synthetic stimuli and plot each neuron’s responses to them:
Each high-low frequency detector exhibits a clear preference for a limited range of orientations. As we previously found with curve detectors, high-low frequency detectors are rotationally equivariant: each one selects for a given orientation, and together they span the full 360º space.
How are high-low frequency detectors built up from lower-level neurons? One could imagine many different circuits which could implement this behavior. To give just one example, it seems like there are at least two different ways that the oriented nature of these units could form.
To resolve this question — and more generally, to understand how these detectors are implemented — we can look at the weights.
Let’s look at a single detector. Glancing at the weights from conv2d2
to mixed3a
110, most of them can be roughly divided into two categories: those that activate on the left and inhibit on the right, and those that do the opposite.
The same also holds for each of the other high-low frequency detectors — but, of course, with different spatial patterns
Surprisingly, across all high-low frequency detectors, the two clusters of neurons that we get for each are actually the same two clusters! One cluster appears to detect textures with a generally high frequency, and one cluster appears to detect textures with a generally low frequency.