Neural Cellular Automata Model of Pattern Formation
Neural Cellular Automata (NCA
In this work, we apply NCA to the task of texture synthesis. This task involves reproducing the general appearance of a texture template, as opposed to making pixel-perfect copies. We are going to focus on texture losses that allow for a degree of ambiguity. After training NCA models to reproduce textures, we subsequently investigate their learned behaviors and observe a few surprising effects. Starting from these investigations, we make the case that the cells learn distributed, local, algorithms.
To do this, we apply an old trick: we employ neural cellular automata as a differentiable image parameterization
Zebra stripes are an iconic texture. Ask almost anyone to identify zebra stripes in a set of images, and they will have no trouble doing so. Ask them to describe what zebra stripes look like, and they will gladly tell you that they are parallel stripes of slightly varying width, alternating in black and white. And yet, they may also tell you that no two zebra have the same set of stripes
Put another way, patterns and textures are ill-defined concepts. The Cambridge English Dictionary defines a pattern as “any regularly repeated arrangement, especially a design made from repeated lines, shapes, or colours on a surface”. This definition falls apart rather quickly when looking at patterns and textures that impart a feeling or quality, rather than a specific repeating property. A coloured fuzzy rug, for instance, can be considered a pattern or a texture, but is composed of strands pointing in random directions with small random variations in size and color, and there is no discernable regularity to the pattern. Penrose tilings do not repeat (they are not translationally invariant), but show them to anyone and they’ll describe them as a pattern or a texture. Most patterns in nature are outputs of locally interacting processes that may or may not be stochastic in nature, but are often based on fairly simple rules. There is a large body of work on models which give rise to such patterns in nature; most of it is inspired by Turing’s seminal paper on morphogenesis.
Such patterns are very common in developmental biology
As a result, when having any model learn to produce textures or patterns, we want it to learn a generative process for the pattern. We can think of such a process as a means of sampling from the distribution governing this pattern. The first hurdle is to choose an appropriate loss function, or qualitative measure of the pattern. To do so, we employ ideas from Gatys et. al
NCA are well suited for generating textures. To understand why, we’ll demonstrate parallels between texture generation in nature and NCA. Given these parallels, we argue that NCA are a good model class for texture generation.
In “The Chemical Basis of Morphogenesis”
To tackle the problem of reproducing our textures, we propose a more general version of the above systems, described by a simple Partial Differential Equation (PDE) over the state space of an image.
Here, is a function that depends on the gradient () and Laplacian () of the state space and determines the time evolution of this state space. represents a k dimensional vector, whose first three components correspond to the visible RGB color channels.
Intuitively, we have defined a system where every point of the image changes with time, in a way that depends on how the image currently changes across space, with respect to its immediate neighbourhood. Readers may start to recognize the resemblance between this and another system based on immediately local interactions.
Differential equations governing natural phenomena are usually evaluated using numerical differential equation solvers. Indeed, this is sometimes the only way to solve them, as many PDEs and ODEs of interest do not have closed form solutions. This is even the case for some deceptively simple ones, such as the three-body problem. Numerically solving PDEs and ODEs is a vast and well-established field. One of the biggest hammers in the metaphorical toolkit for numerically evaluating differential equations is discretization: the process of converting the variables of the system from continuous space to a discrete space, where numerical integration is tractable. When using some ODEs to model a change in a phenomena over time, for example, it makes sense to advance through time in discrete steps, possibly of variable size.
We now show that numerically integrating the aforementioned PDE is equivalent to reframing the problem as a Neural Cellular Automata, with assuming the role of the NCA rule.
The logical approach to discretizing the space the PDE operates on is to discretize the continuous 2D image space into a 2D raster grid. Boundary conditions are of concern but we can address them by moving to a toroidal world where each dimension wraps around on itself.
Similarly to space, we choose to treat time in a discretized fashion and evaluate our NCA at fixed-sized time steps. This is equivalent to explicit Euler integration. However, here we make an important deviation from traditional PDE numerical integration methods for two reasons. First, if all cells are updated synchronously, initial conditions must vary from cell-to-cell in order to break the symmetry. Second, the physical implementation of the synchronous model would require the existence of a global clock, shared by all cells. One way to work around the former is by initializing the grid with random noise, but in the spirit of self organisation we instead choose to decouple the cell updates by asynchronously evaluating the CA. We sample a subset of all cells at each time-step to update. This introduces both asynchronicity in time (cells will sometimes operate on information from their neighbours that is several timesteps old), and asymmetry in space, solving both aforementioned issues.
Our next step towards representing a PDE with cellular automata is to discretize the gradient and Laplacian operators. For this we use the sobel operator and the 9-point variant of the discrete Laplace operator, as below.
With all the pieces in place, we now have a space-discretized version of our PDE that looks very much like a Cellular Automata: the time evolution of each discrete point in the raster grid depends only on its immediate neighbours. These discrete operators allow us to formalize our PDE as a CA. To double check that this is true, simply observe that as our grid becomes very fine, and the asynchronous updates approach uniformity, the dynamics of these discrete operators will reproduce the continuous dynamics of the original PDE as we defined it.
The final step in implementing the above general PDE for texture generation is to translate it to the language of deep learning. Fortunately, all the operations involved in iteratively evaluating the generalized PDE exist as common operations in most deep learning frameworks. We provide both a Tensorflow and a minimal PyTorch implementation for reference, and refer readers to these for details on our implementation.
We build on the Growing CA NCA model
We use a well known deep convolutional network for image recognition, VGG (Visual Geometry Group Net
The template images for this dataset are from the Oxford Describable Textures Dataset
After a few iterations of training, we see the NCA converge to a solution that at first glance looks similar to the input template, but not pixel-wise identical. The very first thing to notice is that the solution learned by the NCA is not time-invariant if we continue to iterate the CA. In other words it is constantly changing!
This is not completely unexpected. In Differentiable Parametrizations, the authors noted that the images produced when backpropagating into image space would end up different each time the algorithm was run due to the stochastic nature of the parametrizations. To work around this, they introduced some tricks to maintain alignment between different visualizations. In our model, we find that we attain such alignment along the temporal dimension without optimizing for it; a welcome surprise. We believe the reason is threefold. First, reaching and maintaining a static state in an NCA appears to be non-trivial in comparison to a dynamic one, so much so that in Growing CA a pool of NCA states at various iteration times had to be maintained and sampled as starting states to simulate loss being applied after a time period longer than the NCAs iteration period, to achieve a static stability. We employ the same sampling mechanism here to prevent the pattern from decaying, but in this case the loss doesn’t enforce a static fixed target; rather it guides the NCA towards any one of a number of states that minimizes the style loss. Second, we apply our loss after a random number of iterations of the NCA. This means that, at any given time step, the pattern must be in a state that minimizes the loss. Third, the stochastic updates, local communication, and quantization all limit and regularize the magnitude of updates at each iteration. This encourages changes to be small between one iteration and the next. We hypothesize that these properties combined encourage the NCA to find a solution where each iteration is aligned with the previous iteration. We perceive this alignment through time as motion, and as we iterate the NCA we observe it traversing a manifold of locally aligned solutions.
We now posit that finding temporally aligned solutions is equivalent to finding an algorithm, or process, that generates the template pattern, based on the aforementioned findings and qualitative observation of the NCA. We proceed to demonstrate some exciting behaviours of NCA trained on different template images.
Here, we see that the NCA is trained using a template image of a simple black and white grid.
We notice that:
Such behaviour is not entirely unlike what one would expect in a hand-engineered algorithm to produce a consistent grid with local communication. For instance, one potential hand-engineered approach would be to have cells first try and achieve local consistency, by choosing the most common colour from the cells surrounding them, then attempting to form a diamond of correct size by measuring distance to the four edges of this patch of consistent colour, and moving this boundary if it were incorrect. Distance could be measured by using a hidden channel to encode a gradient in each direction of interest, with each cell decreasing the magnitude of this channel as compared to its neighbour in that direction. A cell could then localize itself within a diamond by measuring the value of two such gradient channels. The appearance of such an algorithm would bear resemblance to the above - with patches of cells becoming either black, or white, diamonds then resizing themselves to achieve consistency.
In this video, the NCA has learned to reproduce a texture based on a template of clear bubbles on a blue background. One of the most interesting behaviours we observe is that the density of the bubbles remains fairly constant. If we re-initialize the grid states, or interactively destroy states, we see a multitude of bubbles re-forming. However, as soon as two bubbles get too close to each other, one of them spontaneously collapses and disappears, ensuring a constant density of bubbles throughout the entire image. We regard these bubbles as ”solitons″ in the solution space of our NCA. This is a concept we will discuss and investigate at length below.
If we speed the animation up, we see that different bubbles move at different speeds, yet they never collide or touch each other. Bubbles also maintain their structure by self-correcting; a damaged bubble can re-grow.
This behaviour is remarkable because it arises spontaneously, without any external or auxiliary losses. All of these properties are learned from a combination of the template image, the information stored in the layers of VGG, and the inductive bias of the NCA. The NCA learned a rule that effectively approximates many of the properties of the bubbles in the original image. Moreover, it has learned a process that generates this pattern in a way that is robust to damage and looks realistic to humans.
Here we see one of our favourite patterns: a simple geometric “weave”. Again, we notice the NCA seems to have learned an algorithm for producing this pattern. Each “thread” alternately joins or detaches from other threads in order to produce the final pattern. This is strikingly similar to what one would attempt to implement, were one asked to programmatically generate the above pattern. One would try to design some sort of stochastic algorithm for weaving individual threads together with other nearby threads.
Here, misaligned stripe fragments travel up or down the stripe until either they merge to form a single straight stripe or a stripe shrinks and disappears. Were this to be implemented algorithmically with local communication, it is not infeasible that a similar algorithm for finding consistency among the stripes would be used.
This foray into pattern generation is by no means the first. There has been extensive work predating deep-learning, in particular suggesting deep connections between spatial patterning of anatomical structure and temporal patterning of cognitive and computational processes (e.g., reviewed in
Early work in pattern generation focused on texture sampling. Patches were often sampled from the original image and reconstructed or rejoined in different ways to obtain an approximation of the texture. This method has also seen recent success with the work of Gumin
Gatys et. al’s work
Other work has focused on using a convolutional generator combined with path sampling and trained using an adversarial loss to produce textures of similar quality
Perhaps the most unconventional approach, with which we find kinship, is laid out in Interactive Evolution of Camouflage
Two other noteworthy examples of similar work are Portilla et. al’s work with the wavelet transform
We have now explored some of the fascinating behaviours learned by the NCA when presented with a template image. What if we want to see them learn even more “unconstrained” behaviour?
Some butterflies have remarkably lifelike eyes on their wings. It’s unlikely the butterflies are even aware of this incredible artwork on their own bodies. Evolution placed these there to trigger a response of fear in potential predators or to deflect attacks from them
Even more remarkable is the fact that the individual cells composing the butterfly’s wings can self assemble into coherent, beautiful, shapes far larger than an individual cell - indeed a cell is on the order of
A common approach to investigating neural networks is to look at what inhibits or excites individual neurons in a network
We can explore this idea with minimal effort by taking our pattern-generating NCA and exploring what happens if we task it to enter a state that excites a given neuron in Inception. One of the common resulting NCAs we notice is eye and eye-related shapes - such as the video below - likely as a result of having to detect various animals in ImageNet. In the same way that cells form eye patterns on the wings of butterflies to excite neurons in the brains of predators, our NCA’s population of cells has learned to collaborate to produce a pattern that excites certain neurons in an external neural network.
We use a model identical to the one used for exploring pattern generation, but with a different discriminator network: Imagenet-trained Inception v1 network
Our loss maximizes the activations of chosen neurons, when evaluated on the output of the NCA. We add an auxiliary loss to encourage the outputs of the NCA to be , as this is not inherently built into the model. We keep the weights of the Inception frozen and use ADAM
There is no explicit dataset for this task. Inception is trained on ImageNet. The layers and neurons we chose to excite are chosen qualitatively using OpenAI Microscope.
Similar to the pattern generation experiment, we see quick convergence and a tendency to find temporally dynamic solutions. In other words, resulting NCAs do not stay still. We also observe that the majority of the NCAs learn to produce solitons of various kinds. We discuss a few below, but encourage readers to explore them in the demo.
Solitons in the form of regular circle-like shapes with internal structure are quite commonly observed in the inception renderings. Two solitons approaching each other too closely may cause one or both of them to decay. We also observe that solitons can divide into two new solitons.
In textures that are composed of threads or lines, or in certain excitations of Inception neurons where the resulting NCA has a “thread-like” quality, the threads grow in their respective directions and will join other threads, or grow around them, as required. This behaviour is similar to the regular lines observed in the striped patterns during pattern generation.
We encode local information flow within the NCA using the same fixed Laplacian and gradient filters. As luck would have it, these can be defined for most underlying manifolds, giving us a way of placing our cells on various surfaces and in various configurations without having to modify the learned model. Suppose we want our cells to live in a hexagonal world. We can redefine our kernels as follows:
Our model, trained in a purely square environment, works out of the box on a hexagonal grid! Play with the corresponding setting in the demo to experiment with this. Zooming in allows observation of the individual hexagonal or square cells. As can be seen in the demo, the cells have no problem adjusting to a hexagonal world and producing identical patterns after a brief period of re-alignment.
In theory, the cells can be evaluated on any manifold where one can define approximations to the Sobel kernel and the Laplacian kernel. We demonstrate this in our demo by providing an aforementioned “hexagonal” world for the cells to live in. Instead of having eight equally-spaced neighbours, each cell now has six equally-spaced neighbours. We further demonstrate this versatility by rotating the Sobel and Laplacian kernels. Each cell receives an innate global orientation based on these kernels, because they are defined with respect to the coordinate system of the state. Redefining the Sobel and Laplacian kernel with a rotated coordinate system is straightforward and can even be done on a per-cell level. Such versatility is exciting because it mirrors the extreme robustness found in biological cells in nature. Cells in most tissues will generally continue to operate whatever their location, direction, or exact placement relative to their neighbours. We believe this versatility in our model could even extend to a setting where the cells are placed on a manifold at random, rather than on an ordered grid.
Stochastic updates teach the cells to be robust to asynchronous updates. We investigate this property by taking it to an extreme and asking how do the cells react if two manifolds are allowed to communicate but one runs the NCA at a different speed than the other? The result is surprisingly stable; the CA is still able to construct and maintain a consistent texture across the combined manifold. The time discrepancy between the two CAs sharing the state is far larger than anything the NCA experiences during training, showing remarkable robustness of the learned behaviour. Parallels can be drawn to organic matter self repairing, for instance a fingernail can regrow in adulthood despite the underlying finger already having fully developed; the two do not need to be sync. This result also hints at the possibility of designing distributed systems without having to engineer for a global clock, synchronization of compute units or even homogenous compute capacity.
An even more drastic example of this robustness to time asynchronicity can be seen above. Here, an NCA is iterated until it achieves perfect consistency in a pattern. Then, the state space is expanded, introducing a border of new cells around the existing state. This border quickly interfaces with the existing cells and settles in a consistent pattern, with almost no perturbation to the already-converged inner state.
The failure modes of a complex system can teach us a great deal about its internal structure and process. Our model has many quirks and sometimes these prevent it from learning certain patterns. Below are some examples.
Some patterns are reproduced somewhat accurately in terms of structure, but not in colour, while some are the opposite. Others fail completely. It is difficult to determine whether these failure cases have their roots in the parametrization (the NCA), or in the hard-to-interpret gradient signals from VGG, or Inception. Existing work with style transfer suggests that using a loss on Gram matrices in VGG can introduce instabilities
When biological cells communicate with each other, they do so through a multitude of available communication channels. Cells can emit or absorb different ions and proteins, sense physical motion or “stiffness” of other cells, and even emit different chemical signals to diffuse over the local substrate
There are various ways to visualize communication channels in real cells. One of them is to add to cells a potential-activated dye. Doing so gives a clear picture of the voltage potential the cell is under with respect to the surrounding substrate. This technique provides useful insight into the communication patterns within groups of cells and helps scientists visualize both local and global communication over a variety of time-scales.
As luck would have it, we can do something similar with our Neural Cellular Automata. Our NCA model contains 12 channels. The first three are visible RGB channels and the rest we treat as latent channels which are visible to adjacent cells during update steps, but excluded from loss functions. Below we map the first three principle components of the hidden channels to the R,G, and B channels respectively. Hidden channels can be considered “floating,” to abuse a term from circuit theory. In other words, they are not pulled to any specific final state or intermediate state by the loss. Instead, they converge to some form of a dynamical system which assists the cell in fulfilling its objective with respect to its visible channels. There is no pre-defined assignment of different roles or meaning to different hidden channels, and there is almost certainly redundancy and correlation between different hidden channels. Such correlation may not be visible when we visualize the first three principal components in isolation. But this concern aside, the visualization yields some interesting insights anyways.
In the principal components of this coral-like texture, we see a pattern which is similar to the visible channels. However, the “threads” pointing in each diagonal direction have different colours - one diagonal is green and the other is a pale blue. This suggests that one of the things encoded into the hidden states is the direction of a “thread”, likely to allow cells that are inside one of these threads to keep track of which direction the thread is growing, or moving, in.
The chequerboard pattern likewise lends itself to some qualitative analysis and hints at a fairly simple mechanism for maintaining the shape of squares. Each square has a clear gradient in PCA space across the diagonal, and the values this gradient traverses differ for the white and black squares. We find it likely the gradient is used to provide a local coordinate system for creating and sizing the squares.
We find surprising insight in NCA trained on Inception as well. In this case, the structure of the eye is clearly encoded in the hidden state with the body composed primarily of one combination of principal components, and an halo, seemingly to prevent collisions of the eye solitons, composed of another set of principal components.
Analysis of these hidden states is something of a dark art; it is not always possible to draw rigorous conclusions about what is happening. We welcome future work in this direction, as we believe qualitative analysis of these behaviours will be useful for understanding more complex behaviours of CAs. We also hypothesize that it may be possible to modify or alter hidden states in order to affect the morphology and behaviour of NCA.
In this work, we selected texture templates and individual neurons as targets and then optimized NCA populations so as to produce similar excitations in a pre-trained neural network. This procedure yielded NCAs that could render nuanced and hypnotic textures. During our analysis, we found that these NCAs have interesting and unexpected properties. Many of the solutions for generating certain patterns in an image appear similar to the underlying model or physical behaviour producing the pattern. For example, our learned NCAs seem to have a bias for treating objects in the pattern as individual objects and letting them move freely across space. While this effect was present in many of our models, it was particularly strong in the bubble and eye models. The NCA is forced to find algorithms that can produce such a pattern with purely local interaction. This constraint seems to produce models that favor high-level consistency and robustness.
We’d like to thank Sam Greydanus for especially thoughtful proofreading and giving extensive feedback throughout the article. We also extend our gratitude to the other reviewers; Maximilian Otte, Aleksandr Groznykh, and Smitty van Bodegom. Finally, we would like to acknowledge the continued support from Blaise Agüera y Arcas and Dominik Roblek, without whom this work wouldn’t be possible.
Research: Alexander proposed using Neural CA for texture synthesis and feature visualization and prototyped the TF and PyTorch implementations. Eyvind refined the implementation and performed most of the article experiments.
Demos: Alexander implemented the WebGL NCA engine. Eyvind implemented the demo UI.
Writing and Diagrams: Eyvind wrote most of the article text and created most of the diagrams and videos. Michael provided the biological context for the article. Alexander and Ettore contributed to the content.
The motion of waves propagating through a medium can be described using the classical wave equation. The equation below defines the change of some quantity (be it the surface height map of a body of water, the position of a vibrating string, etc.) with respect to the laplacian of . ends up being the propagation speed of the wave.
One can imagine waves to come either as a single wave, or as a larger mixture of waves of different frequencies (a phenomenon referred to as a group or a packet). Physical phenomena, however, are rarely as structured and regular as we would like them to be. To describe the propagation of real-world waves in most physical media, such as waves in water, or sound, we must use somewhat more complex partial differential equations. Many of these systems, with the notable exception of light, share a property that waves of different frequencies will travel at different speeds. “Speed” in such a context is a tricky thing to define - however in this case we are referring to the speed of any localized quantity of energy - how fast its peak travels in space. In the above, classic, wave equation this corresponds to . However, in the real world, when waves of different frequencies have different speeds, groups of waves are no longer cohesive and will experience “dispersion”: the envelope of the group of waves will change shape over time and potentially not remain cohesive. Even in wave groups with dispersive properties, it is possible to find solutions to their partial differential equations where nonlinearities in the propagation and interaction between waves will counteract the dispersive properties of the wave group. This phenomenon, while lacking a strict definition, is called a “soliton”. It describes a wave packet which retains its shape during propagation.
Recall the idea (touched upon in Growing Neural Cellular Automata) that a grid of communicating NCAs can be thought of as a finite difference approximation of a partial differential equation in both time and space. Several of the patterns we render in the pattern-generation experiment, as well as in the inception experiment, consist of well-defined structures such as circles or polygons. We consider such structures to be functionally equivalent to solitons and refer to them as such. Such a classification is inspired by B. Chan’s reference to solitons in “Lenia.” He defines them as solid, self-maintaining structures which arise in the continuous approximation of the Game of Life.
The ”mouse″ and ”swipe″ icons in the demo are licensed under CC-BY.
If you see mistakes or want to suggest changes, please create an issue on GitHub.
Diagrams and text are licensed under Creative Commons Attribution CC-BY 4.0 with the source available on GitHub, unless noted otherwise. The figures that have been reused from other sources don’t fall under this license and can be recognized by a note in their caption: “Figure from …”.
For attribution in academic contexts, please cite this work as
Niklasson, et al., "Self-Organising Textures", Distill, 2021.
BibTeX citation
@article{niklasson2021self-organising, author = {Niklasson, Eyvind and Mordvintsev, Alexander and Randazzo, Ettore and Levin, Michael}, title = {Self-Organising Textures}, journal = {Distill}, year = {2021}, note = {https://distill.pub/selforg/2021/textures}, doi = {10.23915/distill.00027.003} }