We reverse engineer a non-trivial learned algorithm from the weights of a neural network and use its core ideas to craft an artificial artificial neural network from scratch that reimplements it.
Many approaches to interpretability give high level explanations, but it's hard to know if they're true. Circuits takes the opposite approach: building up from individual neurons and individual weights to easily verifiable explanations of tiny slices of neural networks. But it faces a different question: can we ever hope to understand a full neural network this way?
This paper uses the Circuits approach to reverse-engineer curve detectors, a neuron family we studied in a previous article. We find that although curve detection involves more than 50,000 parameters, those parameters actually implement a simple algorithm that can be read off the weights and described in just a few English sentences. Based on this understanding, we re-implement curve detectors by hand, writing out weights to create an artificial artificial neural network that mimics curve detectors.
The reason the curve circuit is relatively simple is because of the equivariance motif. Rotational equivariance reduces the complexity by a factor of 10-20x while scale equivariance reduces it by an additional 2-3x, for a total reduction of ~50x
While the curve circuit in InceptionV1 is quaint next to the 175b weights in GPT-3, it's a big step up from the tiny circuits in Zoom In. We think the surprising simplicity of the curve circuit is a glimmer of hope that the Circuits approach may scale to letting us reverse-engineer big neural networks into small verifiable explanations.
In this article we'll reverse-engineer the 10 curve neurons we studied in Curve Detectors
To get a high level understanding of the first four layers in the context of curve detectors, we use a method we call decomposed feature visualization. Decomposed feature visualization renders a grid of feature visualization that shows the expanded weights of an upstream layer to a downstream neuron of interest (in this case 3b:379)
We can take the highest magnitude position from each layer above and see which neuron families contribute to it to get a sense for how each shape is built
For each family we can also visualize how they connect to each other, showing us a birds-eye view of the curve detection algorithm. Again, we see the layers progressively building more complex shapes, with shapes that closely resemble curve detectors like early curves and well-defined lines being built in 3a, only one layer before curve detectors.
Now that we know which neuron families are most important for curve detection, we can invest in understanding the weights connecting them. Luckily this is sometimes easy, since weights connecting families tend to follow patterns that we call motifs. Many families in this diagram implement the rotational equivariance motif, meaning that each neuron in a family is approximately a rotated version of another neuron in that family. We can see this in the weights connecting 3a early curves and 3b curves.
When neuron families implement rotational equivariance we learn a lot looking at the strongest positive and negative neuron connections, because the others are just rotated versions of them. In the weights we see a general pattern, as each layer builds a contrast detector by tiling a simpler contrast detector along its tangent. When the two shapes are aligned the weight is most positive, and most negative when they are perpendicular.
So far we’ve looked at the neuron families that are most important to curve detection, and also at the circuit motifs connecting them. This high-level “circuit schematic” is useful for seeing a complex algorithm at a glance, and it tells us the main components we'd need to build if we wanted to reimplement the circuit from scratch.
The circuit schematic also makes it easy to describe a few sentence English story of how curve detection works. Gabor filters turn into proto-lines which build lines and early curves. Finally, lines and early curves are composed into curves. In each case, each shape family (eg conv2d2 line) has positive weight across the tangent of the shape family it builds (eg. 3a early curve). Each shape family implements the rotational equivariance motif, containing multiple rotated copies of approximately the same neuron.
In the next few sections we'll zoom in from this high level description to a weight-level analysis of how each of the 3a families we've looked at so far contribute to curve detectors.
Early curves in 3a contribute more to the construction of curve detectors than any other neuron family. At every position curve neurons are excited by early curves in a similar orientation and inhibited by ones in the opposing orientation, closely following the general pattern we saw earlier.
At every position curve neurons are excited by early curves in a similar orientation and inhibited by ones in the opposing orientation. If you look closely you can see the weights shift slightly over the course of the curve to track the change in local curvature.
The weight matrix connecting early curves and curves shows a striking set of positive weights lining the diagonal where the two shapes have similar orientations. This band of positive weights is surrounded by negative weights where the early curve and curve orientations differ. The transition is smooth — if a curve strongly wants to see an early curve it wants to see its neighbor a bit too.
Why do we see that curve detectors which are rotated 180 degrees from each other are inhibitory, but not ones rotated 90 degrees? Recall from our previous article that early curve detectors respond slightly to curves that are 180 degrees rotated from their prefered orientation, which we called an echo. This makes sense: a curve in the opposite orientation runs tangent to the curve we’re trying to detect in the middle of the receptive field, causing similar edge detectors to fire.
These negative weights are the reason curve neurons in 3b have no echoes, which we can validate by using circuit editing to remove them and confirming that echoes appear.
Using negative weights in this way follows a more general trend: our experience is that negative weights are used for things that are in some way similar enough that they could be mistaken. This seems potentially analogous to lateral inhibition in biological neural networks.
Overview of Early Vision separates line-like shapes in 3a into several families: lines, line misc, angles, thick lines, and line ends. Since they have similar roles in the context of curve detectors we’ll discuss them together, while pointing out their subtle differences.
Like early curves, lines align to the tangent of curve detectors, with more positive weights when the neurons have the same orientation. However, this pattern is more nuanced and discontinuous with lines than early curves. A line with a similar orientation to a curve will excite it, but a line that's rotated a little may inhibit it. This makes it hard to see a general pattern by looking directly at the weight matrix.
Instead, we can study which line orientations excite curve detectors using synthetic stimuli. We can take a similar approach to decomposed feature visualization to see which lines different spatial positions respond to. This shows us that each 3b curve neuron is excited by edges along its tangent line with a tolerance of between about 10° to 45°.
This tolerance isn’t always symmetric, which we can see in 3b:406 below. On its left side it is most excited by lines about 10° upwards. If the line is oriented above 10° it is still excited, but if it is less than 10° it switches to being inhibited.
This view tells us how each curve detector responds to different orientations of lines in general. We can connect this back to individual line neurons by studying which orientations those line neurons respond to using a radial tuning curve.
There are 11 simple line neurons in 3a that mostly fire to one orientation, although some activate more weakly to an "echo" 90 degrees away.
Five neurons in 3a respond to lines that are perpendicular to the orientation where they are longest. These neurons mostly detect fur, but they also contribute to 3b curves.
Finally, there are five line neurons with curiously sharp transitions. These lines want an orientation facing a particular direction, and tolerate error in that direction, but definitely don't want it going the other way, even if slightly. In curve detection, this is useful for handling imperfections, like bumps.
We find cliff-like line neurons an interesting example of non-linear behavior. We usually think of neurons as measuring distance from some ideal. For instance, we may expect car neurons to prefer
The different types of line neurons we looked at above each have different behaviors, which is part of why the weight matrix between 3a lines and 3b curves is indecipherable. However, if we go back one more layer and look at how conv2d2 lines connect to 3b curves, we see structure.
We think this points to an interesting property of both curves and lines in InceptionV1. The line family in conv2d2 are roughly "pure" line detectors, detecting lines in an elegant pattern. The next layer (3a) builds lines too, but they're more applied and nuanced, with seemingly-awkward behavior like cliff-lines where the network finds them useful. Similarly, the 3b curve detectors are surprisingly elegant detectors of curves behaviorally, and mostly follow clean patterns in their construction. In contrast, the curves in the next layer (4a) are more applied and nuanced, mostly corresponding to 3d geometry and seemingly specialized for detecting real-world objects like the tops of cups and pans. Perhaps this points to a yet unnamed motif of pure shape detectors directly followed by applied ones.
In Curve Detectors we saw how curve neurons seem to be robust to several cosmetic properties, with similar behavior across textures like metal and wood in a variety of lighting. How do they do it?
We believe this reflects a more widespread phenomenon in early vision. As progressively sophisticated shapes are built in each layer, new shapes incorporate cosmetic neuron families like colors and texture. For instance, 3b curve neurons are built primarily from the line and early curve neuron families, but they also incorporate a family of 65 texture neurons. This means they both inherit the cosmetic robustness of the line and early curve neurons, as well as strengthen it by including more textures.
While we won't do a detailed weight-level analysis of how cosmetic robustness propagates through the shapes of early vision in this article, which is a broader topic than curve detection, we think this is an exciting direction for future research.
How do we know this story about the mechanics of curve detectors is true? One way is to use it to reimplement curve detectors from scratch. We manually set the weights of a blank neural network to implement the neuron families and circuit motifs from this article, crafting an artificial artificial neural network, and made the python code available and runnable in this Colab notebook. This was initially a few hour process for one person (Chris Olah), and they did not look at the original neural network’s weights when constructing it, which would go against the spirit of the exercise. Later, before publishing they tweaked the weights and in particular added negative weights to remove echoes in the activations.
To compare our artificial curve detectors against InceptionV1's naturally learned ones we have the full palette of techniques we used in Curve Detectors available to us. We'll choose three: feature visualization, dataset examples, and synthetic stimuli. From there we'll run two additional comparisons by leveraging circuits and model editing.
First we'll look at feature visualization and responses to synthetic curve stimuli together. We see the feature visualizations indeed render curves, except they are grayscale since we never include cosmetic features such as color-contrast detectors in our artificial curves. We see their response to curve stimuli approximates the natural curve detectors across a range of radii and all orientations. One difference is our artificial neurons have a slight echo
We can also get a qualitative sense for the differences by looking at a saliency map of dataset examples that cause artificial curve detectors to fire strongly
Next we can compare the weights for the circuits connecting neuron families in the two models, alongside feature visualizations for each of those families. We see they follow approximately the same circuit structure.
We can also zoom into specific weight matrices that we've already studied in this article. We see the raw weights between early curves and curves as well as lines to curves look approximately like the ones in InceptionV1, but cleaner since we set them programmatically.
Finally, a preliminary experiment suggests that adding artificial curve detectors helps recover some of the loss in classification accuracy across the dataset of removing them from the model entirely
Additionally, there are two caveats worth mentioning about our experimental setup. First, our ImageNet evaluation likely doesn't mimic the exact conditions the model was trained under (eg. data preprocessing), since the original model was trained at Google using a precursor Tensorflow. Secondly, the reason we ran it on less than the full validation set was operational, not a result of cherry-picking. We initially ran it on a small set in a prototype experiment to validate our hypothesis. We planned to run it on the full set before our publication date, but due OpenAI infrastructure changes our setup broke and we were unable to reimplement it in time. For this reason we emphasize that the experiment is preliminary, although we suspect it's likely to work on the full validation set as well.
Overall, we believe these five experiments show our artificial curve detectors are roughly analogous to the naturally trained ones. Since they are nearly a direct translation from the neuron families and circuit motifs we've described in this article into Python code for setting weights, we think this is strong evidence these patterns accurately reflect the underlying circuits that construct curve detectors.
While this article focused mostly on how curve detection works upstream of the curve detector, it's also worth briefly considering how curves are used downstream in the model. It's easiest to see their mark on the next layer, 4a, where they're used to construct more sophisticated shapes.
The curve neurons in 3b are used to build more complex and specific shape detectors such as sophisticated curves, circles, S-shapes, spirals, divots, and “evolutes" (a term we’ve repurposed to describe units detecting curves facing away from the middle).
Many of these shapes look for individual curves at different spatial positions, such as circles and evolutes. These shapes often reappear across different branches of 4a, such as the 3x3 and 5x5 branch.
Layer 4a also constructs a series of curve detectors, mostly in the 5x5 branch that specializes in 3d geometry. However, we believe they should be thought of less as pure abstract shapes and more as corresponding to specific worldly objects, like 4a:406 which often detects the top of cups and pans.
As we mentioned in Curve Detectors, our first investigation into curve neurons, it’s hard to separate author contributions between different papers in the Circuits project. Much of the original research on curve neurons came before we decided to separate the publications into the behavior of curve neurons and how they are built. In this section we’ve tried to isolate contributions specific to the mechanics of the curve neurons.
Interface Design & Prototyping. Many weight diagrams were first prototyped by Chris during his first investigations of different families of neurons in early early vision, and some of these were turned into presentations. Nick extended them for use in this paper. Chris designed and implemented the decomposed feature visualization figure in the first section. Many of the other interfaces were designed by Nick with the help of Shan and Chris. In particular, Shan helped to design the figure showing how the different families of early vision connect leading up to the curve family.
Conceptual Contributions. The earliest understandings of how curve neurons are built from lines and edges came from Chris, and the details came from further investigation by Nick. Nick investigated the line families in detail, including finding cliff line neurons and studying they are used. Nick studied through neuron families in the early layers, studying how shape neurons incrementally incorporate increasingly sophisticated texture and cosmetic neurons, working towards the neuron families diagram in the first section. The artificial artificial neural network was done by Chris and Nick expanded on it for use in the article. Gabe was instrumental in helping discover many of the techniques used for closely studying Circuits, and provided input and suggestions at many steps throughout our investigation of the curve circuit.
Writing. Nick and Chris wrote the text of the article with significant help editing from Chelsea.
Infrastructure. Nick built the infrastructure for extracting figures from the paper for reproduction in Colab. Ludwig is responsible for the distributed infrastructure that was used for many experiments.
Our article was greatly improved thanks to the detailed feedback by Patricia Robinson, Jennifer Lin, Adam Shimi, Sam Havens, Stefan Sietzen, Dave Vladman, Maxim Liu, Fred Hohman, Vincent Tjeng, and Humza Iqbal.
We also really appreciate the conversations in the #circuits channel of the open Distill Slack, which at the time of publishing contains more than 600 people.
If you see mistakes or want to suggest changes, please create an issue on GitHub.
Diagrams and text are licensed under Creative Commons Attribution CC-BY 4.0 with the source available on GitHub, unless noted otherwise. The figures that have been reused from other sources don’t fall under this license and can be recognized by a note in their caption: “Figure from …”.
For attribution in academic contexts, please cite this work as
Cammarata, et al., "Curve Circuits", Distill, 2021.
BibTeX citation
@article{cammarata2021curve, author = {Cammarata, Nick and Goh, Gabriel and Carter, Shan and Voss, Chelsea and Schubert, Ludwig and Olah, Chris}, title = {Curve Circuits}, journal = {Distill}, year = {2021}, note = {https://distill.pub/2020/circuits/curve-circuits}, doi = {10.23915/distill.00024.006} }