The presence of a large number of inhibitory contacts at the soma and axon initial segment of cortical pyramidal cells has inspired a large and influential class of neural network model that use post-integration lateral inhibition as a mechanism for competition between nodes. However, inhibitory synapses also target the dendrites of pyramidal cells. The role of this dendritic inhibition in competition between neurons has not previously been addressed. We demonstrate, using a simple computational model, that such pre-integration lateral inhibition provides networks of neurons with useful representational and computational properties that are not provided by post-integration inhibition.
Lateral inhibition between cortical excitatory cells plays an important role in determining the receptive field properties of those cells. Such lateral inhibition provides a mechanism through which cells compete to respond to the current pattern of stimulation. Inhibitory inputs are concentrated on the soma and axon initial segment of pyramidal cells (Somogyi and Martin, 1985; Mountcastle, 1998) where they can be equally effective at inhibiting responses to excitatory inputs stimulating any part of the dendritic tree.
This observation has formed the basis for many theories of receptive field formation, and is an essential feature of many computational (neural network) models of cortical function (von der Malsburg, 1973; Rumelhart and Zipser, 1985; Grossberg, 1987; Földiák, 1989, 1990, 1991; Oja, 1989; Sanger, 1989; Hertz et al., 1991; Ritter et al., 1992; Sirosh and Miikkulainen, 1994; Marshall, 1995; Swindale, 1996; Wallis, 1996; Kohonen, 1997; O'Reilly, 1998). Such neural network algorithms have also found application beyond the neurosciences as a means of data analysis, classification and visualization in a huge variety of fields. These algorithms vary greatly in the details of their implementation. In some, competition is achieved explicitly by using lateral connections between the nodes of the network (von der Malsburg, 1973; Földiák, 1989, 1990; Oja, 1989; Sanger, 1989; Sirosh and Miikkulainen, 1994; Marshall, 1995; Swindale, 1996; O'Reilly, 1998), while in others competition is implemented implicitly through a selection process which chooses the ‘winning’ node(s) (Rumelhart and Zipser, 1985; Grossberg, 1987; Földiák, 1991; Hertz et al., 1991; Ritter et al., 1992; Wallis, 1996; Kohonen, 1997). However, in all of these algorithms nodes compete for the right to generate a response to the current pattern of input activity. A node's success in this competition is dependent on the total strength of the stimulation it receives and nodes which compete unsuccessfully have their output activity suppressed. This class of models can thus be described as implementing ‘post-integration inhibition’.
Inhibitory contacts also occur on the dendrites of cortical pyramidal cells (Kim et al., 1995; Rockland, 1998) and certain classes of interneuron (e.g. double bouquet cells) specifically target dendritic spines and shafts (Tamas et al., 1997; Mountcastle, 1998). Such contacts would have relatively little impact on excitatory inputs more proximal to the cell body or on the action of synapses on other branches of the dendritic tree. Thus these synapses do not appear to contribute to post-integration inhibition. However, such synapses are likely to have strong inhibitory effects on inputs within the same dendritic branch that are more distal to the site of inhibition (Rall, 1964; koch et al., 1983; Segev, 1995; Borg-Graham et al., 1998; Kock and Segev, 2000). Hence, they could potentially selectively inhibit specific groups of excitatory inputs. Related synapses cluster together within the dendritic tree so that local operations are performed by multiple, functionally distinct, dendritic subunits before integration at the soma (Mel, 1994, 1999; Segev, 1995; Segev and Rall, 1998; Häusser et al., 2000; Kock and Segev, 2000; Häusser, 2001). Dendritic inhibition could thus act to ‘block’ the output from individual functional compartments. It has long been recognized that a dendrite composed of multiple subunits would provide a significant enhancement to the computational powers of an individual neuron (Mel, 1993, 1994, 1999) and that dendritic inhibition could contribute to this enhancement (Koch et al., 1983; Segev and Rall, 1998; Kock and Segev, 2000). However, the role of dendritic inhibition in competition between cells and its subsequent effect on neural coding and receptive field properties has not previously been investigated.
We introduce a neural network model which demonstrates that competition via dendritic inhibition significantly enhances the computational properties of networks of neurons. As with models of post-integration inhibition we simplify reality by combining the action of inhibitory interneurons into direct inhibitory connections between nodes. Furthermore, we group all the synapses contributing to a dendritic compartment together as a single input. Dendritic inhibition is then modeled as (linear) inhibition of this input. The algorithm is described fully in the Methods section, but essentially it operates by causing each node to attempt to ‘block’ its preferred inputs from activating other nodes. It is thus described as ‘pre-integration inhibition’.
We illustrate the advantages of this form of competition with the aid of a few simple tasks that have been used previously to demonstrate the pattern recognition abilities required by models of the human perceptual system (Nigrin, 1993; Marshall, 1995; Marshall and Gupta, 1998). Although these tasks appear to be trivial, succeeding in all of them is beyond the abilities of single-layer neural networks using post-integration inhibition. These tasks demonstrate that pre-integration inhibition (in contrast to post-integration inhibition) enables a neural network to respond simultaneously to multiple stimuli, to distinguish overlapping stimuli, and to deal correctly with incomplete and ambiguous stimuli.
A simple, two-node, neural network in which there is pre-integration inhibition is shown in Figure 1. The essential idea is that each node inhibits other nodes from responding to the same inputs. Hence, if a node is active and has a strong synaptic weight to a certain input then it should inhibit other nodes from responding to that input. A simple implementation of this idea for a two-node network would be:
In order to apply pre-integration lateral inhibition to larger networks, a more complex formulation was used that is suitable for networks containing an arbitrary number of nodes (n) and receiving an arbitrary number of inputs (m):
For the simulation shown in Figure 5 a bias was added to the activation of one node. This was implemented by adding 0.1 to the activation of that node during competition. Experiments showed that this bias could occur at any time (and for any duration) prior to α reaching a value of 1.5 to generate the same result. Although results have not been shown here this method is not restricted to working with binary encodings of input patterns and works equally well with analog encodings.
In many situations distinct sensory events will share many features in common. If such situations are to be distinguished it is necessary for different sets of neurons to respond despite this overlap in input features. As a simple example, consider the task of representing two overlapping patterns: ‘ab’ and ‘abc’. A network consisting of two nodes receiving input from three sources (labeled ‘a’, ‘b’ and ‘c’) should be sufficient. However, because these input patterns overlap, when the pattern ‘ab’ is presented the node representing ‘abc’ will be partially activated, while when the pattern ‘abc’ is presented the node representing ‘ab’ will be fully activated.
When the synaptic weights have certain values both nodes will respond with equal strength to the same pattern. For example, when the weights are all equal, both nodes will respond to pattern ‘ab’ with equal strength (Marshall, 1995). Similarly, when the total synaptic weight from each input is normalized (‘post-synaptic normalization’) both nodes will respond equally to pattern ‘ab’ (Marshall, 1995). When the total synaptic weight to each node is normalized (‘pre-synaptic normalization’) both nodes will respond to pattern ‘abc’ with equal activation (Marshall, 1995). Under all these conditions the response fails to distinguish between distinct input patterns and post-integration inhibition can do nothing to resolve the situation (and will, in general, result in a node chosen at random winning the competition).
Several solutions to this problem have been suggested. Some require adjusting the activations using a function of the total synaptic weight received by the node [i.e. using the Webber Law (Marshall, 1995) or a masking field (Cohen and Grossberg, 1987; Marshall, 1995)]. These solutions scale badly with the number of overlapping inputs, and do not work when (as is common practice in many neural network models) the total synaptic weight to each node is normalized. Other suggestions have involved tailoring the lateral weights to ensure the correct node wins the competition (Földiák, 1990; Marshall, 1995). These methods work well (Marshall, 1995), but fail to meet other criteria as discussed below.
The most obvious, but most overlooked, solution would be to remove constraints placed on allowable values for synaptic weights (e.g. normalization) which serve to prevent the input patterns being distinguished in weight space. It is simple to invent sets of weights which unambiguously classify the two overlapping patterns (e.g. if both weights to the node representing ‘ab’ are 0.5 and each weight to the node representing ‘abc’ are 0.4 then each node responds most strongly to its preferred pattern and could then successfully inhibit the activation of the other node).
Using pre-integration lateral inhibition, overlapping patterns can be successfully distinguished even when normalization is used (either pre- or post-synaptic normalization). Figure 2 shows the response of such a network to all possible input patterns. The two networks on the right show that the correct response is generated to input patterns ‘ab’ and ‘abc’. The other networks show that when partial input patterns are presented the node that represents the most similar pattern is activated in proportion to the degree of overlap between the partial pattern and the preferred input of that node. Hence, when the input is ‘a’ or ‘b’, which partially matches both of the training patterns, then the node representing the smallest pattern responds since these partial patterns are more similar to ‘ab’ than to ‘abc’. When the input is ‘c’ this partially matches only one of the training patterns and hence the node representing ‘abc’ responds. Similarly, patterns ‘bc’ and ‘ac’ most strongly resemble ‘abc’ and hence cause activation of that node.
While it is sufficient in certain circumstances for a single node to represent the input (local coding) it is desirable in many other situations to have multiple nodes providing a factorial or distributed representation. As an extremely simple example consider three inputs (‘a’, ‘b’ and ‘c’), each of which is represented by one of three nodes. Any pattern of inputs can be represented by having zero, one or multiple nodes active. In this particular case the input to the network provides just as good a representation as the output so there is little to be gained. However, this example captures the essence of other, more realistic, tasks in which multiple nodes, each of which represent multiple inputs, may need to be active.
Post-integration lateral inhibition can be modified to enable multiple nodes to be active (Földiák, 1990; Marshall, 1995) by weakening the strength of the competition between those pairs of nodes that require to be co-active (the lateral weights need to reach a compromise strength which provides sufficient competition for distinct patterns while allowing multiple nodes to respond to multiple patterns). This either requires a priori knowledge of which nodes will be co-active or the ability to learn appropriate lateral weights. However, information locally available at a synapse is insufficient to determine if the correct compromise weights have been reached (Spratling, 1999) and it is thus necessary to add further constraints to derive a learning rule. The proposed constraints require that all input patterns occur with equal probability and that pairs of nodes are co-active with equal frequency (Földiák, 1990; Marshall, 1995). These constraints severely restrict the class of problems that can be successfully represented to those in which all input patterns are mutually exclusive or in which all pairs of input patterns occur simultaneously with equal frequency. As an example of a case for which these networks would fail, consider using a single network to represent the color and shape of an object. At any given time only one node (or group of nodes) representing a single color and one node (or group of nodes) representing a single shape should be active. There thus needs to be strong inhibition between nodes representing properties within the same class, and weak inhibition between nodes representing different properties. This task fails to match the requirements implicitly defined in the learning rules, and application of those rules would lead to weakening of lateral inhibition within each class until multiple color nodes and multiple shape nodes were co-active with equal frequency. Hence, post-integration lateral inhibition, implemented using explicit lateral weights, fails to provide factorial coding except for the exceptional case in which all pairs of patterns co-occur together, or in which external knowledge is available to set appropriate lateral weights.
Networks in which competition is implemented using a selection mechanism can also be modified to allow multiple nodes to be simultaneously active (e.g. k-winners-takes-all). However, these networks also place restrictions on the types of task that can be successfully represented to those in which a pre-defined number of nodes need to be active in response to every pattern of stimuli.
In contrast, pre-integration lateral inhibition places no restrictions on the number of active nodes, nor on the frequency with which nodes, or pairs of nodes, are active. Such an network can thus respond appropriately to any combination of input patterns; for example, it can directly solve the problem of representing any arbitrary combination of the inputs ‘a’, ‘b’ and ‘c’. A more challenging problem is shown in Figure 3. Here nodes represent six overlapping patterns. The network responds correctly to each of these patterns and to multiple, overlapping, patterns (even in cases where only partial patterns are presented).
In some circumstances there simply is no correct parsing of the input pattern. Consider a neural network with two nodes and three inputs (‘a’, ‘b’ and ‘c’). If one node represents the pattern ‘ab’ and the other represents the pattern ‘bc’ then the input ‘b’ is ambiguous since it equally matches the preferred input of both nodes. In this situation, most implementations of post-synaptic lateral inhibition would allow one node, chosen at random, to be active at half its normal strength. An alternative implementation (Marshall, 1995) is to use weaker lateral weights to enable both nodes to respond with one-quarter of the maximum response (Marshall and Gupta, 1998). However, this approach is also unsatisfactory since it suggests that one-quarter of each pattern is present, when this is not the case. Neither of these activity patterns seem to provide an appropriate representation. Any response in which both nodes generate equal activity suggests that a single piece of data provides evidence for two interpretations simultaneously. While any response in which one node has higher activity than the other is making an unjustified, arbitrary, selection. Pre-integration lateral inhibition avoids generating responses that are not justified by the available data by preventing any response (Fig. 4). It thus produces no representation of the input rather than a potentially misleading representation.
As an example of a situation in which such an approach would be advantageous, consider again using a network to represent the color and shape of an object. However, in this situation the network is wired up to generate localist representations of conjunctions of color and shape from a distributed input representation of these separate features. For example, consider a network with four nodes representing ‘black squares’, ‘white squares’, ‘black triangles’ and ‘white triangles’ (with the inputs to this network signaling ‘black’, ‘white’, ‘square’ and ‘triangle’). In this case the ambiguous situation occurs when multiple objects are presented to the network simultaneously: a black square and a white triangle would cause an identical input pattern as a black triangle and a white square (Thorpe, 1995). Given such a situation it is important to prevent illusory conjunctions from being represented (Roelfsema et al., 2000), pre-integration lateral inhibition does so by suppressing all responses (Fig. 5). One solution to this ‘binding’ problem would be the action of expectation or attention in disambiguating the situation (Reynolds and Desimone, 1999; Roelfsema et al., 2000). If such modulatory effects are modeled by adding a small increase to the activity of one node during competition then this succeeds in causing a response from those nodes compatible with the biased interpretation, while suppressing activity in the other two nodes (Fig. 5). A similar bias applied to a network using post-integration inhibition would cause the biased node to be the most active, but would also suppress the response of the node representing the second object. An alternative solution would be for inputs representing the features of one object to be active simultaneously but out-of-phase with those inputs representing the other object (von der Malsburg, 1981; Gray, 1999; Singer, 1999). In this case the network succeeds (as would a network using the standard method of competition) by responding alternately to the non-ambiguous patterns generated by each individual object presented in isolation.
The above examples have shown that pre-integration lateral inhibition provides useful computational capacities that can not be generated using post-integration lateral inhibition. A network of neurons competing through pre-integration lateral inhibition is thus capable of generating correct representations based on the ‘knowledge’ stored in the synaptic weights of the neural network. Specifically, it is capable of generating a local encoding of individual input patterns as well as responding simultaneously to multiple patterns, when they are present, in order to generate a factorial or distributed encoding. It can produce an appropriate representation even when patterns overlap. It is able to respond to partial patterns such that the response is proportional to how well that input matches the stored pattern, and it can detect ambiguities and suppress responses to them. Our algorithm simplifies reality by assuming that the role of inhibitory cells can be approximated by direct inhibitory weights from excitatory cells, and that these lateral weights have the same strength as corresponding afferent weights. The latter simplification can be justified since weights that have identical values also have identical pre- and post-synaptic activation values and hence could be learnt independently. Such a learning mechanism would require inhibitory synapses contacting the dendrite to be modified as a function of the local dendritic activity rather than the output activity of the inhibited cell. More complex models, which include a separate inhibitory cell population, and which use multi-compartmental models of dendritic processes could relate our proposal more directly with physiology. We hope that our demonstration of the computational and representational advantages that could arise from dendritic inhibition will serve to stimulate such more detailed studies.
Computational considerations have led us to suggest that competition via dendritic inhibition could significantly enhance the information-processing capacities of networks of cortical neurons. This claim is anatomically plausible since it has been shown that cortical pyramidal cells innervate inhibitory cell types, which in turn form synapses on the dendrites of pyramidal cells (Buhl et al., 1997; Tamas et al., 1997). However, determining the functional role of these connections will require further experimental evidence. Our model predicts that it should be possible to find pairs of cortical pyramidal cells for which action potentials generated by one cell induce inhibitory post-synaptic potentials within the dendrites of the other. Independent of such experimental support, the algorithm we have presented could have immediate advantages for a great number of neural network applications in a huge variety of fields.
This work was funded by MRC Research Fellowship number G81/512.