High-Low Frequency Detectors

A family of early-vision neurons reacting to directional transitions from high to low spatial frequency.

Published

Jan. 27, 2021

DOI

10.23915/distill.00024.005

This article is part of the Circuits thread, an experimental format collecting invited short articles and critical commentary delving into the inner workings of neural networks.

Introduction

Some of the neurons in vision models are features that we aren’t particularly surprised to find. Curve detectors, for example, are a pretty natural feature for a vision system to have. In fact, they had already been discovered in the animal visual cortex. It’s easy to imagine how curve detectors are built up from earlier edge detectors, and it’s easy to guess why curve detection might be useful to the rest of the neural network.

High-low frequency detectors, on the other hand, seem more surprising. They are not a feature that we would have expected a priori to find. Yet, when systematically characterizing the early layers of InceptionV1, we found a full fifteen neurons of mixed3a that appear to detect a high frequency pattern on one side, and a low frequency pattern on the other.

One worry we might have about the circuits approach to studying neural networks is that we might only be able to understand a limited set of highly-intuitive features. High-low frequency detectors demonstrate that it’s possible to understand at least somewhat unintuitive features.

Function

How can we be sure that “high-low frequency detectors” are actually detecting directional transitions from low to high spatial frequency? We will rely on three methods:

Later on in the article, we dive into the mechanistic details of how they are both implemented and used. We will be able to understand the algorithm that implements them, confirming that they detect high to low frequency transitions.

Feature Visualization

A feature visualization is a synthetic input optimized to elicit maximal activation of a single, specific neuron. Feature visualizations are constructed starting from random noise, so each and every pixel in a feature visualization that’s changed from random noise is there because it caused the neuron to activate more strongly. This establishes a causal link! The behavior shown in the feature visualization is behavior that causes the neuron to fire:

1: Feature visualizations of a variety of high-low frequency detectors from InceptionV1′s mixed3a layer.

From their feature visualizations, we observe that all of these high-low frequency detectors share these same characteristics:

We can use a diversity term in our feature visualizations to jointly optimize for the activation of a neuron while encouraging different activation patterns in a batch of visualizations. We are thus reasonably confident that if high-low frequency detectors were also sensitive to other patterns, we would see signs of them in these feature visualizations. Instead, the frequency contrast remains an invariant aspect of all these visualizations. (Although other patterns form along the boundary, these are likely outside the neuron’s effective receptive field.)

1-2: Feature visualizations of high-low frequency detector mixed3a:136 from InceptionV1′s mixed3a layer, optimized with a diversity objective. You can learn more about feature visualization and the diversity objective here.

Dataset Examples

We generate dataset examples by sampling from a natural data distribution (in this case, the training set) and selecting the images that cause the neurons to maximally activate. Checking against these examples helps ensure we’re not misreading the feature visualizations.

2: Crops taken from Imagenet where mixed3a 136 activated maximally, argmaxed over spatial locations.

A wide range of real-world situations can cause high-low frequency detectors to fire. Oftentimes it’s a highly-textured, in-focus foreground object against a blurry background — for example, the foreground might be the microphone’s latticework, the hummingbird’s tiny head feathers, or the small rubber dots on the Lenovo ThinkPad pointing stick — but not always: we also observe that it fires for the MP3 player’s brushed metal finish against its shiny screen, or the text of a watermark.

In all cases, we see one area with high frequency and another area with low frequency. Although they often fire at an object boundary, they can also fire in cases where there is a frequency change without an object boundary. High-low frequency detectors are therefore not the same as boundary detectors.

Synthetic Tuning Curves

Tuning curves show us how a neuron’s response changes with respect to a parameter. They are a standard method in neuroscience, and we’ve found them very helpful for studying artificial neural networks as well. For example, we used them to demonstrate how the response of curve detectors changes with respect to orientation. Similarly, we can use tuning curves to show how high-low frequency detectors respond.

To construct such a curve, we’ll need a set of synthetic stimuli which cause high-low frequency detectors to fire. We generate images with a high-frequency pattern on one side and a low-frequency pattern on the other. Since we’re interested in orientation, we’ll rotate this pattern to create a 1D family of stimuli:

The first axis of variation of our synthetic stimuli is orientation.

But what frequency should we use for each side? How steep does the difference in frequency need to be? To explore this, we’ll add a second dimension varying the ratio between the two frequencies:

The second axis of variation of our synthetic stimuli is the frequency ratio.

(Adding a second dimension will also help us see whether the results for the first dimension are robust.)

Now that we have these two dimensions, we sample the synthetic stimuli and plot each neuron’s responses to them:

Each high-low frequency detector exhibits a clear preference for a limited range of orientations. As we previously found with curve detectors, high-low frequency detectors are rotationally equivariant: each one selects for a given orientation, and together they span the full 360º space.

Implementation

How are high-low frequency detectors built up from lower-level neurons? One could imagine many different circuits which could implement this behavior. To give just one example, it seems like there are at least two different ways that the oriented nature of these units could form.

To resolve this question — and more generally, to understand how these detectors are implemented — we can look at the weights.

Let’s look at a single detector. Glancing at the weights from conv2d2 to mixed3a 110, most of them can be roughly divided into two categories: those that activate on the left and inhibit on the right, and those that do the opposite.

4: Six neurons from conv2d2 contributing weights to mixed3a 110.

The same also holds for each of the other high-low frequency detectors — but, of course, with different spatial patternsAs an aside: The 1-2-1 pattern on each column of weights is curiously reminiscent of the structure of the Sobel filter. on the weights, implementing the different orientations.

Surprisingly, across all high-low frequency detectors, the two clusters of neurons that we get for each are actually the same two clusters! One cluster appears to detect textures with a generally high frequency, and one cluster appears to detect textures with a generally low frequency.

5: The strongest weights on any high-low frequency detector (here shown: mixed3a 110, mixed3a 136, and mixed3a 112) can be divided into roughly two clusters. Each cluster contributes its weights in similar ways.

Top row: underlying neurons conv2d2 119, conv2d2 102, conv2d2 123, conv2d2 90, conv2d2 89, conv2d2 163, conv2d2 98, and conv2d2 188.

This is exactly what we would expect to see if the Invariant→Equivariant hypothesis is true: each high-low frequency detector composes the same two components in different spatial arrangements, which then in turn govern the detector’s orientation.

These two different clusters are really striking. In the next section, we’ll investigate them in more detail.

High and Low Frequency Factors

It would be nice if we could confirm that these two clusters of neurons are real. It would also be nice if we could create a simpler way to represent them for circuit analysis later.

Factorizing the connectionsBetween two adjacent layers, “connections” reduces to the weights between the two layers. Sometimes we are interested in observing connectivity between layers that may not be directly adjacent. Because our model, a deep convnet, is non-linear, we will need to approximate the connections. A simple approach that we take is to linearize the model by removing the non-linearities. While this is not a great approximation of the model’s behavior, it does give a reasonable intuition for counterfactual influence: had the neurons in the intermediate layer fired, how it would have affected neurons in the downstream layers. We treat positive and negative influences separately. between lower layers and the high-low frequency detectors is one way that we can check whether these two clusters are meaningful, and investigate their significance. Performing a one-sided non-negative matrix factorization (NMF)We require that the channel factor be positive, but allow the spatial factor to have both positive and negative values. separates the connections into two factors.

Each factor corresponds to a vector over neurons. Feature visualization can also be used to visualize these linear combinations of neurons. Strikingly, one clearly displays a generic high-frequency image, whereas the other does the same with a low-frequency image.In InceptionV1 in particular, it’s possible that we recover these two factors so crisply in part due to the 3x3 bottleneck between conv2d2 and mixed3a. Because of this, we’re not here looking at direct weights between conv2d2 and mixed3a, but rather the “expanded weights,” which are a product of a 1x1 convolution (which reduces down to a small number of neurons) combined with a 3x3 convolution. This structure is very similar to the factorization we apply. However, as we see later in Universality, we recover similar factors for other models where this bottleneck doesn’t exist. NMF makes it easy to see this abstract circuit across many models which may not have an architecture that more explicitly reifies it. We’ll call these the HF-factor and the LF-factor:

mixed3a → conv2d2

= + + + + + HF-factor
× 0.93
× 0.73
× 0.66
× 0.59
× 0.55
= + + + + + LF-factor
× 0.44
× 0.41
× 0.38
× 0.36
× 0.34

mixed3a → conv2d1

= + + + + + HF-factor
× 0.86
× 0.81
× 0.64
× 0.53
× 0.52
= + + + + + LF-factor
× 0.49
× 0.48
× 0.45
× 0.43
× 0.42
6: NMF recovers the neurons that contribute to the two NMF factors plus the weighted amount they contribute to each factor. Here shown: NMF against both conv2d2 and a deeper layer, conv2d1. The left side of the equal sign shows feature visualizations of the NMF factors.

The feature visualizations are suggestive, but how can we be sure that these factors really correspond to high and low frequency in general, rather than specific high or low frequency patterns? One thing we can do is to create synthetic stimuli again, but now plotting the responses of those two NMF factors.

Since our factors don’t correspond to an edge, our synthetic stimuli will only have one frequency region for each stimulus. To add a second dimension and again demonstrate robustness, we also vary the rotation of that region. (The frequency texture is not exactly rotationally invariant because we construct the stimulus out of orthogonal cosine waves.)

Unlike last time, these activations now mostly ignore the image’s orientation, but are sensitive to its frequency. We can average these results over all orientations in order to produce a simple tuning curve of how each factor responds to frequency. As predicted, the HF-factor responds to high frequency and the LF-factor responds to low frequency.

8: Tuning curve for HF-factor and LF-factor from conv2d2 against images with synthetic frequency, averaged across orientation. Wavelength as a proportion of the full input image ranges from 1:1 to 1:10.

Now that we’ve confirmed what these factors are, let’s look at how they’re combined into high-low frequency detectors.

Construction of High-Low Frequency Detectors

NMF factors the weights into both a channel factor and a spatial factor. So far, we’ve looked at the two parts of the channel factor. The spatial factor shows the spatial weighting that combines the HF and LF factors into high-low frequency detectors.

Unsurprisingly, these weights basically reproduce the same pattern that we’d previously been seeing in Figure 5 from its two different clusters of neurons: where the HF-factor inhibits, the LF-factor activates — and vice versa. As an aside, the HF-factor here for InceptionV1 (as well as some of its NMF components, like conv2d2 123) also appears to be lightly activated by bright greens and magentas. This might be responsible for the feature visualizations of these high-low frequency detectors showing only greens and magentas on the high-frequency side.

HF-factor
LF-factor
HF-factor
LF-factor

9: Using NMF factorization on the weights connecting six high-low frequency detectors in InceptionV1 to the two directly preceding convolutional layers, conv2d2 and conv2d1.

Their spatial arrangement is very clear, with LF factors activating areas in which high-low frequency detectors expect low frequencies, and inhibiting areas in which they expect high frequencies. The two factors are very close to symmetric. Weight magnitudes normalized between -1 and 1.

High-low frequency detectors are therefore built up by circuits that arrange high frequency detection on one side and low frequency detection on the other.

There are some exceptions that aren’t fully captured by the NMF factorization perspective. For example, conv2d2 181 is a texture contrast detector that appears to already have spatial structure. This is the kind of feature that we would expect to be involved through an Equivariant→Equivariant circuit. If that were the case, however, we would expect its weights to the high-low frequency detector mixed3a 70 to be a solid positive stripe down the middle. What we instead observe is that it contributes as a component of high frequency detection, though perhaps with a slight positive overall bias. Although conv2d2 181 has a spatial structure, perhaps it responds more strongly to high frequency patterns.

The weights from conv2d2 181 to mixed3a 70 are consistent with conv2d2 181 contributing via the HF-factor, not via the existing spatial structure of its texture contrast detection.

Now that we understand how they are constructed, how are high-low frequency detectors used by higher-level features?

Usage

mixed3b is the next layer immediately after the high-low frequency detectors. Here, high-low frequency detectors contribute to a variety of features. Their most important role seems to be supporting boundary detectors, but they also contribute to bumps and divots, line-like and curve-like shapes, and at least one each of center-surrounds, patterns, and textures.

10: Examples of neurons that high-low frequency detectors contribute to: (1) mixed3b 345 (a boundary detector), (2) mixed3b 276 (a center-surround texture detector), (3) mixed3b 314 (a double boundary detector), and (4) mixed3b 365 (an hourglass shape detector).

These aren’t the only contributors to these neurons – for example, mixed3b 276 also relies heavily on certain center-surrounds and textures – but they are strong contributors.

Oftentimes, downstream features appear to ignore the “polarity” of a high-low frequency detector, responding roughly the same way regardless of which side is high frequency. For example, the vertical boundary detector mixed3b 345 (see above) is strongly excited by high-low frequency detectors that detect frequency change across a vertical line in either direction.

Whereas activation from a high-low frequency detector can help detect boundaries between different objects, inhibition from a high-low frequency detector can also add structure to an object detector by detecting regions that must be contiguous along some direction — essentially, indicating the absence of a boundary.

11: Some of mixed3b 314’s weights, extracted for emphasis. Orientation doesn’t matter so much for how these weights are used by mixed3b 314, but their 180º-invariant orientation does!

You may notice that strong excitation (left) is correlated with the presence of a boundary at a particular angle, whereas strong inhibition (right) is correlated with object continuity where a boundary might otherwise have been.

As we’ve mentioned, by far the primary downstream contribution of high-low frequency detectors is to boundary detectors. Of the top 20 neurons in mixed3b with the highest L2-norm of weights across all high-low frequency detectors, eight of those 20 neurons participate in boundary detection of some sort: double boundary detectors, miscellaneous boundary detectors, and especially object boundary detectors.

Role in object boundary detection

Object boundary detectors are neurons which detect boundaries between objects, whether that means the boundary between one object and another or the transition from foreground to background. They are different from edge detectors or curve detectors: although they are sensitive to edges (indeed, some of their strongest weights are contributed by lower-level edge detectors!), object boundary detectors are also sensitive to other indicators such as color contrast and high-low frequency detection.

12: mixed3b 345 is a boundary detector activated by high-low frequency detectors, edges, color contrasts, and end-of-line detectors. It is specifically sensitive to vertically-oriented high-low frequency detectors, regardless of their orientation, and along a vertical line of positive weights.

High-low frequency detectors contribute to these object boundary detectors by providing one piece of evidence that an object has ended and something else has begun. Some examples of object boundary detectors are shown below, along with their weights to a selection of high-low frequency detectors, grouped by orientation (ignoring polarity).

In particular, note how similar the weights are within each grouping! This shows us again that the later layers ignore the high-low frequency detectors’ polarity. Furthermore, the arrangement of excitatory and inhibitory weights contributes to each boundary detector’s overall shape, following the principles outlined above.

13: Four examples of object boundary detectors that high-low frequency detectors contribute to: mixed3b 345, mixed3b 376, mixed3b 368, and mixed3b 151.

Beyond mixed3b, high-low frequency detectors ultimately play a role in detecting more sophisticated object shapes in mixed4a and beyond, by continuing to contribute to the detection of boundaries and contiguity.

So far, the scope of our investigation has been limited to InceptionV1. How common are high-low frequency detectors in convolutional neural networks generally?

Universality

High-Low Frequency Detectors in Other Networks

It’s always good to ask if what we see is the rule or an interesting exception — and high-low frequency detectors seem to be the rule. High-low frequency detectors similar to ones in InceptionV1 can be found in a variety of architectures.

InceptionV1 Layer mixed3a

At ~33% CNN depth

AlexNet Layer Conv2D_2

At ~29% CNN depth

InceptionV4 Layer Mixed_5a

At ~33% CNN depth

ResNetV2-50 Layer B2_U1_conv2

At ~29% CNN depth

14. High-low frequency detectors that we’ve found in AlexNet, InceptionV4, and ResnetV2-50 (right), compared to their most similar counterpart from InceptionV1 (left). These are individual neurons, not linear combinations approximating the detectors in InceptionV1.

Notice that these detectors are found at very similar depths within the different networks, between 29% and 33% network depth!Network depth is here defined as the index of the layer divided by the total number of layers. While the particular orientations each network’s high-low frequency detectors respond to may vary slightly, each network has its own family of detectors that together cover the full 360º and comprise a rotationally equivariant family. Architecture aside – what about networks trained on substantially different datasets? In the extreme case, one could imagine a synthetic dataset where high-low frequency detectors don’t arise. For most practical datasets, however, we expect to find them. For example, we even find some candidate high-low frequency detectors in AlexNet (Places): down-up, left-right, and up-down.

Even though these families are from three completely different networks, we also discover that their high-low frequency detectors are built up from high and low frequency components.

HF-factor and LF-factor in Other Networks

As we did with InceptionV1, we can again perform NMF on the weights of the high-low frequency detectors in each network in order to extract the strongest two factors.

AlexNet

HF-factor
LF-factor

InceptionV3_slim

HF-factor
LF-factor

ResnetV2_50_slim

HF-factor
LF-factor
15: NMF of high-low frequency detectors in AlexNet’s Conv2D_2 with respect to conv1_1, InceptionV3_slim’s Conv2d_4a with respect to Conv2d_3b, and ResnetV2_50_slim’s B2_U1_conv2 with respect to B2_U1_conv1, showing activations and inhibitions.

The feature visualizations of the two factors reveal one clear HF-factor and one clear LF-factor, just like what we found in InceptionV1. Furthermore, the weights on the two factors are again very close to symmetric.

Our earlier conclusions therefore also hold across these different networks: high-low frequency detectors are built up from the specific spatial arrangement of a high frequency component and a low frequency component.

Conclusion

Although high-low frequency detectors represent a feature that we didn’t necessarily expect to find in a neural network, we find that we can still explore and understand them using the interpretability tools we’ve built up for exploring circuits: NMF, feature visualization, synthetic stimuli, and more.

We’ve also learned that high-low frequency detectors are built up from comprehensible lower-level parts, and we’ve shown how they contribute to later, higher-level features. Finally, we’ve seen that high-low frequency detectors are common across multiple network architectures.

Given the universality observations, we might wonder whether the existence of high-low frequency detectors isn’t so unnatural after all. We even find approximate high-low frequency detectors in AlexNet Places, with its substantially different training data. Beyond neural networks, the aesthetic quality imparted by the blurriness of an out-of-focus region of an image is already known as to photographers as bokeh. And in VR, visual blur can either provide an effective depth-of-field cue or, conversely, can induce nausea in the user when implemented in a dissonant way. Perhaps frequency detection might well be commonplace in both natural and artificial vision systems as yet another type of informational cue.

Nevertheless, whether their existence is natural or not, we find that high-low frequency detectors are possible to characterize and understand.

This article is part of the Circuits thread, a collection of short articles and commentary by an open scientific collaboration delving into the inner workings of neural networks.

Author Contributions

As with many scientific collaborations, the contributions to the high-low frequency detectors paper are difficult to separate because it was a collaborative effort that we wrote together.

Conceptual Contributions. Christopher Olah originally noted the high-low frequency directors as a research direction.

Experiments. Ludwig Schubert wrote the code for generating and measuring synthetic tuning curves for the high-low frequency detectors, for performing NMF on the high-low frequency detectors to extract the HF-factor and the LF-factor, and for performing NMF on the high-low frequency detectors from other networks. Chelsea Voss wrote the code for generating and measuring synthetic stimuli for the HF-factor and LF-factor, with help from Chris for extracting and using the NMF components. This investigation was done in the context of and informed by collaborative research into circuits by Nick Cammarata, Gabe Goh, Chelsea, Ludwig, and Chris.

Figures. Ludwig designed the visualization of the high-low frequency detector synthetic tuning curves, the visualization of the HF-factor and LF-factor NMF vectors, the visualization of the HF-factor and LF-factor NMF weights from conv2d1 and conv2d2, and the figures demonstrating high-low frequency detectors from other networks shown in the Universality section. Chelsea designed the visualization of the HF-factor and LF-factor synthetic stimuli results and the figures articulating the downstream use of high-low frequency detectors, and edited some of the final figures. Two figures were borrowed from Zoom In and Early Vision.

Writing. Chelsea and Ludwig wrote the paper, with feedback from Chris.

We are also grateful to Jennifer Lin, Stefan Sietzen, and Vincent Tjeng for comments on a draft of this paper.

References

  1. Complex pattern selectivity in macaque primary visual cortex revealed by large-scale two-photon imaging
    Tang, S., Lee, T.S., Li, M., Zhang, Y., Xu, Y., Liu, F., Teo, B. and Jiang, H., 2018. Current Biology, Vol 28(1), pp. 38--48. Elsevier.
  2. Shape representation in area V4: position-specific tuning for boundary conformation
    Pasupathy, A. and Connor, C.E., 2001. Journal of neurophysiology, Vol 86(5), pp. 2505--2519. American Physiological Society Bethesda, MD.
  3. Discrete neural clusters encode orientation, curvature and corners in macaque V4[link]
    Jiang, R., Li, M. and Tang, S., 2019. bioRxiv. Cold Spring Harbor Laboratory. DOI: 10.1101/808907
  4. An Overview of Early Vision in InceptionV1
    Olah, C., Cammarata, N., Schubert, L., Goh, G., Petrov, M. and Carter, S., 2020. Distill. DOI: 10.23915/distill.00024.002
  5. Going deeper with convolutions[PDF]
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A. and others,, 2015. DOI: 10.1109/cvpr.2015.7298594
  6. Zoom In: An Introduction to Circuits
    Olah, C., Cammarata, N., Schubert, L., Goh, G., Petrov, M. and Carter, S., 2020. Distill. DOI: 10.23915/distill.00024.001
  7. Visualizing higher-layer features of a deep network[PDF]
    Erhan, D., Bengio, Y., Courville, A. and Vincent, P., 2009. University of Montreal, Vol 1341, pp. 3.
  8. Visualizing and understanding convolutional networks[PDF]
    Zeiler, M.D. and Fergus, R., 2014. European conference on computer vision, pp. 818--833.
  9. Understanding neural networks through deep visualization[PDF]
    Yosinski, J., Clune, J., Nguyen, A., Fuchs, T. and Lipson, H., 2015. arXiv preprint arXiv:1506.06579.
  10. Visualizing and understanding recurrent networks[PDF]
    Karpathy, A., Johnson, J. and Fei-Fei, L., 2015. arXiv preprint arXiv:1506.02078.
  11. Feature Visualization[link]
    Olah, C., Mordvintsev, A. and Schubert, L., 2017. Distill. DOI: 10.23915/distill.00007
  12. ImageNet: A large-scale hierarchical image database
    {Deng}, J., {Dong}, W., {Socher}, R., {Li}, L., Li}, {. and Fei-Fei}, {., 2009. 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248-255. DOI: 10.1109/CVPR.2009.5206848
  13. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex
    Hubel, D.H. and Wiesel, T.N., 1962. The Journal of physiology, Vol 160(1), pp. 106--154. Wiley Online Library.
  14. Convergent learning: Do different neural networks learn the same representations?[PDF]
    Li, Y., Yosinski, J., Clune, J., Lipson, H. and Hopcroft, J.E., 2015. FE@ NIPS, pp. 196--212.
  15. ImageNet Classification with Deep Convolutional Neural Networks[PDF]
    Krizhevsky, A., Sutskever, I. and Hinton, G.E., 2012. Advances in Neural Information Processing Systems 25, pp. 1097--1105. Curran Associates, Inc.
  16. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning[PDF]
    Szegedy, C., Ioffe, S. and Vanhoucke, V., 2016. CoRR, Vol abs/1602.07261.
  17. Deep Residual Learning for Image Recognition[PDF]
    He, K., Zhang, X., Ren, S. and Sun, J., 2015. CoRR, Vol abs/1512.03385.

Updates and Corrections

If you see mistakes or want to suggest changes, please create an issue on GitHub.

Reuse

Diagrams and text are licensed under Creative Commons Attribution CC-BY 4.0 with the source available on GitHub, unless noted otherwise. The figures that have been reused from other sources don’t fall under this license and can be recognized by a note in their caption: “Figure from …”.

Citation

For attribution in academic contexts, please cite this work as

Schubert, et al., "High-Low Frequency Detectors", Distill, 2021.

BibTeX citation

@article{schubert2021high-low,
  author = {Schubert, Ludwig and Voss, Chelsea and Cammarata, Nick and Goh, Gabriel and Olah, Chris},
  title = {High-Low Frequency Detectors},
  journal = {Distill},
  year = {2021},
  note = {https://distill.pub/2020/circuits/frequency-edges},
  doi = {10.23915/distill.00024.005}
}