## Abstract

Synaptic changes impair previously acquired memory traces. The smaller this impairment the larger is the longevity of memories. Two strategies have been suggested to keep memories from being overwritten too rapidly while preserving receptiveness to new contents: either introducing synaptic meta levels that store the history of synaptic state changes or reducing the number of synchronously active neurons, which decreases interference. We find that synaptic metaplasticity indeed can prolong memory lifetimes but only under the restriction that the neuronal population code is not too sparse. For sparse codes, metaplasticity may actually hinder memory longevity. This is important because in memory-related brain regions as the hippocampus population codes are sparse. Comparing 2 different synaptic cascade models with binary weights, we find that a serial topology of synaptic state transitions gives rise to larger memory capacities than a model with cross transitions. For the serial model, memory capacity is virtually independent of network size and connectivity.

## Introduction

Since Hebb (1949), synaptic plasticity is seen as a graded change of the amplitude of a postsynaptic current. A huge body of theoretical literature has applied and discussed this paradigm in the light of associative memories (Hopfield 1982), self-organized map formation (Linsker 1986), and, more recently, also spike-timing–dependent synaptic plasticity (Gerstner et al. 1996; Kempter et al. 1999; Song et al. 2000). Synaptic physiology, however, has shifted the view on synaptic plasticity from a continuous modification of coupling strength to a switching between stable discrete states (for review, see Lüscher et al. 2000; Lüscher and Frerking 2001; Kullmann 2003; Montgomery and Madison 2004). Two remarkable discoveries are all-or-none potentiation and depression (Petersen et al. 1998; O'Connor et al. 2005) and silent synapses (Isaac et al. 1995; Liao et al. 1995; Montgomery et al. 2001). In addition to state changes that alter the efficacy of a synaptic connection, there are also changes that leave the efficacy unaltered but modify a synapse's capability to undergo subsequent plastic changes. The latter state transitions are generally referred to as metaplasticity (Abraham and Bear 1996; Montgomery and Madison 2002; Sajikumar and Frey 2004).

We are presently only beginning to understand how learning with discrete synapses affects functional and computational properties of neural systems (e.g., Brunel et al. 2004; Fusi and Abbott 2007). Obviously, the larger the synaptic state space, the more information can be stored in the synaptic configuration. But how can this information be retrieved in a dynamical neural network? This is a nontrivial problem, in particular because metaplastic changes do not affect postsynaptic currents and, hence, information about synaptic meta states cannot be derived from the activity of neurons.

Metaplasticity is commonly thought to prolong the lifetime of memories while keeping the network receptive to the storage of new memories (Abraham and Bear 1996; Montgomery and Madison 2004). This idea has recently been corroborated by Fusi et al. (2005). Assuming a homogeneous population of synapses, they have derived that a certain number of synaptic meta states is optimal with respect to maximizing memory lifetime. Their model, however, does not consider the dynamics of a neural network, that is, how the evoked postsynaptic potentials sum up and give rise to suprathreshold network activity reflecting the stored memories. In this paper, we show that the necessity to retrieve synaptically stored information from the activity of neurons poses a crucial constraint. In particular, the advantage of metaplasticity may be outweighed by sparse representations of memories (Tsodyks and Feigel'man 1988; Amit and Fusi 1994). For highly distributed codes, on the other hand, we propose a new metaplasticity model with an optimal memory capacity that is independent of network size and connectivity.

## Results

Memories are considered to be successively imprinted into the synapses of a recurrently connected network of *N* neurons through discrete changes of synaptic states. These memories degrade over time because of ongoing storage of new memories. Here, a memory trace is regarded as the association between 2 random assemblies of *M* neurons, a cue and a target assembly (e.g., Willshaw et al. 1969; Nadal 1991; Leibold and Kempter 2006). The association is said to be present in the network if an active cue assembly evokes a sufficient amount of activity in the target assembly and does not significantly activate other assemblies.

Learning is thought to establish such an association between 2 formerly unrelated assemblies via synaptic alterations in a supervised fashion. Synapses from cue to target neurons receive a signal that is proposed to initiate long-term potentiation (LTP) with some probability. Synapses in the reverse direction (target to cue) may undergo long-term depression (LTD). The learning rule is thus assumed to be local: it triggers plasticity signals only at synapses that connect neurons of the cue and target assembly, whereas all other synapses remain unaffected. In addition, we assume that synapses related to neurons that are members of both the cue and the target assembly receive an indecisive signal and do not change states (Discussion). If *f* = *M*/*N* denotes the probability that a randomly chosen neuron belongs to a specific assembly, the indecisive signal is conveyed to the synapses of the *Mf* neurons in the overlap between both assemblies. The abbreviation *f* = *M*/*N* is also called the coding ratio.

Synaptic alterations are considered to occur via switch-like transitions between the discrete synaptic states that are induced by the plasticity events associated with the storage of new memories. Three models of synapses with discrete states are considered: 1) two-state synapses, which are either silent or activated; 2) binary synapses with multiple states and complex metaplasticity, which resemble the cascade model of Fusi et al. (2005); and 3) binary synapses with multiple serial state transitions, in which the probabilities of all transitions equals one.

In the present framework, the overall distribution of synaptic states in the whole network resides in equilibrium at any time. The specific shape of the equilibrium distribution is determined by the specific choices of synapse model and learning rules (see below and Appendix A.5.1, A.5.2, and A.5.3). Intuitively, if the probability for LTD exceeds the probability for LTP, more synapses are in a depotentiated state and vice versa. In contrast to this overall distribution, the state distribution attached to one specific memory (association) may be far from equilibrium: Right after learning a new association, a more than average fraction of synapses connecting cue to target neurons is in the potentiated state. This association-specific distribution then asymptotically converges to the equilibrium distribution while successively storing other associations. The convergence to equilibrium thus corresponds to the overwriting of this particular memory.

As a measure of memory lifetime, we define the number *P* of successive associations that are necessary to overwrite a specific initial association such that the latter no longer can be retrieved. Given a constant rate ρ of new associations per time, the time *P*/ρ can be reinterpreted as the duration a memory remains stored in the synaptic configuration of the network (Amit and Fusi 1994). The learning rate ρ, however, is difficult to estimate and will be substantially influenced by a variety of factors like species, brain region, environment, behavioral state, attention, etc. In what follows, we thus simply neglect ρ and, instead, use *P* directly as a measure of memory lifetime. In other words, we measure time in units of newly acquired memories. Technically, this is equivalent to setting ρ = 1.

### Two-State Synapses

To gain understanding of how memory lifetime (longevity) arises in a dynamical recurrent network, we start out by considering binary synapses without meta states, which means that each single synapse can only exist in one of 2 states, silent or activated. Immediately after having stored an association, all synapses that connect the cue neurons to the target neurons shall be activated, that is, the learning rule is such that these synapses switch from the silent to the activated state with probability one. Lower transition probabilities not only enhance memory lifetime but also reduce the network's receptiveness to new memories (Amit and Fusi 1994; Fusi et al. 2005) and, thus, are not considered here. Furthermore, let us assume that synapses from the target neurons back to the cue neurons are silenced with probability one. This leads to an equilibrium distribution of synaptic states, in which half of the synapses are activated and the other half is silent. For a given probability *c*_{m} that a neuron is synaptically connected to another one, the number of activated synapses in a network of *N* neurons therefore fluctuates around a mean value of *c*_{m}*N*^{2}/2. The probability *c*_{m} of a morphological connection is also called the morphological connectivity.

As a result of the equilibrium distribution of synaptic states, an arbitrary pair of pre- and postsynaptic neuron is connected via an activated synapse with probability *c*_{m}/2. Thus, at neurons that do not belong to a particular target assembly, *M* simultaneously spiking cue cells give rise to *c*_{m}*M*/2 inputs from nonsilent synapses, which is thus considered as a baseline or background depolarization in what follows. We therefore define the “signal” of a memory trace to be the excess depolarization with respect to this baseline. Because, for the 2-state model, we have modeled the storage of a memory through the activation of all synapses connecting the cue to the target assembly, the initial memory signal equals the number of synapses at a target cell that were switched on. Because we have assumed that the synapses at the *Mf* neurons in the overlap of cue and target assembly do not change state, the initial memory signal at one target neuron amounts to *M*(1 – *f*)*c*_{m}/2 synapses on average.

#### Sparse Representations

To understand the temporal evolution of the memory signal, let us first focus on the case of sparse coding, *f*=*M/N*≪1. This scenario allows for an easy analytic treatment and interpretation. A more general scenario is outlined in the Appendix.

The initial signal *c*_{m}*M*/2 of a memory trace at *t* = 0 decays because of the storage of further memories. The storage of a single new memory in a subsequent time step switches off synapses in the initial memory trace with probability *f*^{2}. The number of activated synapses in a specific association therefore decays to a resting level of *c*_{m}*M*/2 with time constant τ = 1/(2*f*^{2}), and thus the signal after *t* further associations amounts to *c*_{m}*M*/2 exp[–2*f*^{2}*t*]. This decay of the number of activated synapses is illustrated in Figure 1A.

The ongoing storage of associations between pairs of random cell assemblies also induces fluctuations of the memory signal. These fluctuations define a noise level above which the signal must be detectable. The standard deviation of these fluctuations equals the square root of the average number of activated synapses for random selections of *M* presynaptic neurons, that is, .

The lifetime *P* of a memory is the time *t* at which, on average, the signal arrives at the noise level. We again note that time is measured in units of stored associations, so that *P* also equals the total number of associations that can be stored in the network. More specifically, we define *P* as the time at which the signal-to-noise ratio of a memory imprinted at time *t* = 0 equals a threshold *K* > 0, which is determined by the desired amount of activation of the target assembly. It will turn out that the specific choice of the threshold *K* does not change the general dependence of *P* on system parameters such as the network size *N*, assembly size *M*, and connectivity *c*_{m}. We emphasize that here the detectability of the memory signal is defined via an average over the ensemble of target neurons, and thus the memory signal is no single-cell quantity, though it may seem so (Appendix A.4).

Given a memory signal *c*_{m}*M*/2 exp[–2*f*^{2}*t*] and a noise level , the signal-to-noise ratio attains a value of *K* at time . This is an encouraging result because the lifetime *P* of a memory increases quadratically with network size *N*. Moreover, it is proportional to the total number *c*_{m}*N*^{2} of synapses in the network, leaving *c*_{m} and *M* fixed. In contrast, for a fixed coding ratio *f* = *M*/*N, P* scales logarithmically with the average number *c*_{m}*M* of synapses per target neuron involved in a particular association. Maximum longevity *P*^{max} is obtained from d*P*/d*M* = 0, which yields an optimal assembly size

*N*. This independence of

*N*reflects the assumption that there is no spurious (or spontaneous) activity in the network. Spurious activity would induce fluctuations of the memory signal that increase with network size and thus would break up

*N*independence, as it is the case for attractor-type networks (e.g., Golomb et al. 1990). Moreover, the

*N*independence of

*M*

^{opt}can also be broken up by further constraints, as, for example, the assumption of a fixed number

*Nc*of synapses per neuron (Leibold and Kempter 2006) (Discussion).

_{m}At the optimal assembly size, the memory lifetime reaches its maximum

*c*

_{m}

*N*of synapses per neuron. For biologically reasonable choices, 0.01 <

*c*

_{m}< 0.3,

*N*> 10

^{4}, and

*K*≈ 1, we find the optimal coding ratio

*f*

^{opt}=

*M*

^{opt}/

*N*to be small, which is consistent with the initial assumption

*M*≪

*N*. We note that throughout the paper (cf., Appendix A.5), analytical results concerning

*M*

^{opt}and

*P*

^{max}are derived in the limit of sparse coding

*M*≪

*N*and checked for self-consistency.

#### General Solutions

Analytical results obtained in a regime of sparse representations are corroborated by numerical evaluations of the readout criterion (Appendix A.4), which allowed us to also consider nonsparse coding ratios.

To compare memory performance of networks with different sizes *N* and connectivities *c*_{m}, one defines the memory capacity

*P*divided by the number

*c*

_{m}

*N*of synapses per neuron. The functional dependence of α on the coding ratio

*f*is stereotypic (Fig. 1B). For infinitely low

*f*, the assembly size is too small to make the signal (∝

*M*) overcome the noise . If

*f*grows beyond some minimal value, the capacity rapidly rises toward its maximum, which is taken at some optimal, sparse level

*f*

^{opt}≪1. As

*f*is further increased, the capacity drops again because larger assembly sizes generate larger overlaps between memory traces, which in turn leads to a higher amount of synaptic interference during learning and, hence, forgetting is faster.

The numerical results shown in Figure 1C confirm our analytical results that the storage capacity α at the optimal coding ratio *f*^{opt} scales linearly with *N*. Classical approaches to memory capacity (e.g., Golomb et al. 1990; Nadal 1991), however, do not optimize with respect to *f* but treat it as a free parameter. Yet, for fixed *f*, the capacity exhibits inferior scaling behavior and may even decrease with *N* (e.g., *f* = 0.3 in Fig. 1C).

### Binary Synapses with Complex Metaplasticity

The lifetime of a memory can be prolonged by reducing the probability of synaptic changes (Amit and Fusi 1994). An obvious drawback of such a strategy is that memories become more rigid and, hence, it is more difficult to store new associations. This problem is the well-known dilemma of finding a compromise between providing a high amount of plasticity and, at the same time, ensuring a high longevity of memories.

Fusi et al. (2005) have recently shown for *M/N*≲1 that synaptic metaplasticity might offer an elegant solution to this dilemma. Metaplasticity means that synaptic changes do not necessarily alter a synapse's influence on the postsynaptic membrane potential but rather modulate the probability that the synapse undergoes a weight change the next time it is exposed to an LTP or LTD stimulus. Figure 2A illustrates the transition probabilities for a slightly modified (see Appendix A.5.2) version of the cascade model defined by Fusi et al. (2005) for *n* = 4 meta levels. The synaptic weight either attains the value *w* = 0 (silent) or *w* = 1 (activated), as in the case of 2-state synapses. The probabilities of state transitions for *n* > 1 are constructed as follows: if the synapse is silent (activated) and is exposed to an LTP (LTD) stimulus, it becomes potentiated (depotentiated) with probability (1/2)^{μ}, where μ = 0, 1, …, *n* − 1 μ = 0, 1, …, *n* − 1 counts the meta levels.. Thus, the smallest transition probability amounts to If a synapse is silent (activated) and receives an LTD (LTP) stimulus, it switches to level μ + 1 with probability (1/2)^{μ}. In that case, the synaptic weight *w* remains unchanged. However, if a silent (activated) synapse is in the “lowest” meta level μ = *n* – 1 and receives an LTD (LTP) stimulus, no state change occurs.

Metaplasticity is thought to increase memory lifetime because the synaptic meta level reflects the “history of the synapse,” and future plasticity is “dictated by previous plastic changes” (Montgomery and Madison 2004; see also Abraham and Bear 1996). The decay of memory traces is thus considered to be represented through a trajectory in the synaptic state space. As the number of synaptic meta levels increases, the history of a synapse can be maintained for a longer amount of time and, hence, memory longevity is enhanced.

Memory longevity, however, depends not only on the number *n* of meta levels but also on the sparseness of the code, that is, the assembly size *M*. Both quantities cannot be optimized independently of each other to maximize memory longevity: larger *n*, fewer synapses are driven into a potentiated state by an LTP signal because more synapses will reside at meta levels with a decreased probability of potentiation. As a result, the mean memory signal is smaller, the more meta levels the synapse provides. To nevertheless produce a sufficiently high memory signal, the number *M* of cells in a synchronously active assembly must increase with the number *n* of meta levels. Larger assemblies, however, are detrimental for memory lifetime because of the increased amount of interference between the stored associations. It is a nontrivial problem to understand whether the advantages of sparsification outweigh those of an increased synaptic state space or vice versa.

Here we discuss the trade-off between sparse coding and the number of synaptic meta levels in the light of our network model. The longevity *P* of memories as a function of the assembly size *M* is derived numerically (Appendix) and is illustrated in Figure *2B* for several different numbers *n* of meta levels. We observe that increasing *n* reduces the maximal memory lifetime *P*^{max} while the optimal assembly size *M*^{opt} at the maximum becomes larger. Analytical results derived for *f* = *M*/*N* → 0 show the optimal assembly size to scale like (*n* + 1)^{2} and the maximal lifetime to decrease like (Appendix, eq. [30])

*M*, that is, nonsparse representations (inset in Fig. 2B), models with many meta levels become superior, but the memory longevity can be orders of magnitude below the sparse-coding maximum. This finding is consistent with the results reported by Fusi et al. (2005), who have derived the existence of a lifetime-optimal number

*n*

^{opt}of meta levels in a framework, which corresponds to an assembly size that is of the same order of magnitude as the number

*N*of neurons in the network, that is,

*f*=

*M*/

*N*≈ 0.5.

### Binary Synapses with Serial State Transitions

The above findings about the longevity of memories may either reflect a specific feature of the transition topology of the model of Fusi et al. (2005) or they may be a general property of metaplasticity rules. To further investigate this question, we propose a simpler topology of transitions between synaptic states, in which synaptic states are connected one after the other. This model is illustrated in Figure 3A for *n* = 3 meta levels. The transition probabilities between all states equal one, that is, after a plasticity signal every possible synaptic state change occurs. However, only state changes within meta level μ = 0 are also associated with a weight change.

Numerical evaluation of memory lifetime for the serial model (Fig. 3B) reveals a similar behavior as for the complex metaplasticity model. The smaller the number *n* of synaptic meta levels, the higher we find the maximal lifetime *P*^{max}. For large assembly sizes *M*, meta levels (*n* > 1) can again be advantageous. An estimate of *P* in the case *f* = *M*/*N* → 0 reveals that the optimal assembly size *M*^{opt} scales with *n*^{2} and that the maximal lifetime decreases like (Appendix, eq. [34])

*n*– 1 different timescales of forgetting that correspond to the nonzero eigenvalues of the transition matrix (Appendix A.5.3). These timescales vary between (2

*f*)

^{−2}and (Amit and Fusi 1994) and account for an approximate power-law forgetting in the same way as for the complex cascade model of Fusi et al. (2005).

### Unbalanced Plasticity

In all models presented so far, LTP and LTD are precisely balanced, which is defined via the depolarization of nontarget cells to be half the maximum depolarization, that is, (Appendix A.1). To rule out that the assumption of balanced LTP and LTD accounts for the longest memory lifetimes of 2-state synapses, we also investigated variants of the serial model in which we reduced the transition probabilities for either LTP or LTD stimuli, which serves the purpose of unbalancing the learning rule.

The numerical results of Figure 4A reveal that an LTD-prone regime results in a dramatically altered equilibrium state distribution: depressed states (≤*n*) are more strongly occupied than potentiated states (>*n*). Interestingly, in such an LTD-prone regime, with more silent (*w* = 0) synapses than activated (*w* = 1) ones, the maximum memory lifetime *P*^{max} can even be slightly enhanced compared with a balanced plasticity rule (Fig. 4B; e.g., Brunel et al. 2004; Leibold and Kempter 2006). This enhancement is essentially due to the increase of the initial memory signal, which is roughly the difference between the maximal depolarization *c*_{m}*M* and the equilibrium depolarization. The latter is smaller in an LTD-prone regime giving rise to a larger initial memory signal as compared with a balanced or LTP-prone regime. The equilibrium depolarization, however, must not be too small, because noise becomes more influential, the fewer synapses are involved in an association.

The maximal memory lifetime *P*^{max} is quite robust against unbalancing LTP and LTD for a small number *n* of synaptic meta levels (Fig. 4B). With increasing *n*, the memory performance of the network becomes more sensitive to unbalancing. However, even for *n* = 15, where the maximum memory lifetime is obtained at an LTP/LTD ratio of about 90%, a deviation of 10% from this optimal LTP/LTD ratio still accounts for about 90% of the largest possible longevity *P*^{max}.

### Highly Distributed Representations

We have shown that 2-state synapses are better suited for maximizing the lifetime of associations between random assemblies if a neuronal system is capable of optimizing the number *M* of neurons firing synchronously in an assembly. This optimization of *M* may, however, not always be feasible. For large assembly sizes, we suspect that more complex synapses may become superior in general (see insets in Figs 2B and 3B).

To compare the performances of both synapse models at high coding ratios, we fixed *f* at a value of 0.3 and numerically calculated the optimal number *n*^{opt} of meta levels and the respective memory capacity α = *P*/(*c*_{m}*N*); see Figure 5. We find that for both models, the optimal number *n*^{opt} of synaptic meta levels is an increasing function of network size and morphological connectivity. For the model with complex transitions, Fusi et al. (2005) have shown *n*^{opt} to scale logarithmically with the total number *c*_{m}*N*^{2} of synapses in the network. In the model with serial state transitions, *n*^{opt} grows considerably faster (, not shown). However, with respect to memory capacity α, the serial model is superior to the model with complex transitions. More specifically, the complex model's capacity decreases with increasing network size and connectivity, whereas the serial model exhibits a level of capacity that is virtually independent of both network parameters.

## Discussion

Synaptic metaplasticity is thought to enhance memory lifetimes via storing the history of past synaptic changes (Abraham and Bear 1996; Montgomery and Madison 2004; Fusi et al. 2005). Though intuitive, the present paper shows that this idea is not generally valid. In fact, we find that memory lifetime is governed by an intricate interplay between synaptic state changes and population coding. More specifically, if neuronal representations are allowed to be arbitrarily sparse, an increase of the number *n* of synaptic meta levels reduces the longevity of memory traces: the more synaptic meta levels, the fewer synapses are potentiated when storing a new memory. The readout of a memory trace, however, requires spiking activity, which needs a sufficient amount of postsynaptic depolarization in response to a synchronously active set of presynaptic neurons. Thus, the number of synchronously firing cells must increase if the number *n* of synaptic meta levels gets higher. The use of larger assemblies, however, increases the interference between memories, which strongly reduces memory longevity. Figures 2 and 3 show that this disadvantage cannot be overcome by using cascade models of synaptic plasticity. As a result, 2-state synapses are optimal for online learning of sparsely encoded associations. If the assembly size, however, is large and representations are highly distributed, binary synapses with multiple meta levels are generally superior to 2-state synapses, and the network can make use of the history of synaptic changes, which is stored in the synaptic meta levels.

### Model of Online Learning

The mathematical model we use for online learning is motivated by the central idea that old memories are gradually replaced by new memories. Online learning is described by the dynamics of the distribution of synaptic states related to a specific association that is stored in a recurrent network (Amit and Fusi 1994). This dynamics of the state distribution is a multidimensional linear iterative map. The eigenvalues of the linear map provide the inverse timescales for overwriting of memories. As a criterion to test whether these memories can still be read out, Amit and Fusi (1994) use a fixed specific value of the signal-to-noise ratio of the subthreshold membrane potential. In contrast, we optimize the neuronal firing threshold θ for a given assembly size *M* and network size *N* so as to fulfill a signal-detection criterion based on the suprathreshold activity of the target assembly (Leibold and Kempter 2006).

### Scaling Laws of Memory Lifetime

For the 2-state synaptic model, we find the maximal memory lifetime to grow linearly with the total number of synapses in the network. This result seems to contradict the findings by Fusi et al. (2005), who showed memory lifetimes *P* to grow “logarithmically as a function of the number of synapses used to store the memory.” Both results are, however, consistent, since Fusi et al. (2005) assumed that about as many synapses are involved in any single stored memory as there are synapses in the network, whereas we also allow sparse representations that require only a small number *(∝M ^{2}≪N^{2})* of synapses to be involved in a memory trace.

Moreover, already Amit and Fusi (1994) have pointed out that the logarithmic dependence of *P* can be broken up by aptly relating the probability *q* of having a synaptic state change to parameters like network size, coding ratio, or the number of synaptic states: in the case of sparse coding, *q* is proportional to the probability that a particular synapse is used in an association. If the storage of a new memory requires potentiation of a random set of *c*_{m}*M*^{2} synapses and there are a total of *c*_{m}*N*^{2} synapses in the network, the probability that the one synapse receives a plasticity signal is (*M*/*N*)^{2} = *f*^{2}.

### Models of Metaplasticity

We consider 2 different models of synaptic metaplasticity. As a starting point, we used the model by Fusi et al. (2005). As a second model of synaptic metaplasticity, we studied a serial topology of state transitions, which turned out to provide longer memory lifetimes as the original model with cross transitions. A possible disadvantage of the serial topology, however, might arise from the optimal number *n*^{opt} of meta levels to increase faster with network size *N* and connectivity *c*_{m} than *n*^{opt} in the model with cross transitions (Fig. 5). Additional costs for each meta level thus might favor the latter model.

### Symmetric Learning Rules and Attractor-Type Memories

The learning rule discussed in this paper is asymmetric in the sense that synapses between cue and target neurons are strengthened, whereas synapses in the reverse direction are weakened. As a result, the memory traces are sequence-type associations (e.g., Willshaw et al. 1969; Nadal 1991; Leibold and Kempter 2006). The model does not consider pattern completion, as the latter requires symmetric synaptic connections. Though not explicitly shown here, the fundamental conflict between storing plasticity history and reducing memory interference also remains for pattern completion. However, because pattern completion is generally discussed in the light of dynamical attractors (e.g., Hopfield 1982; Golomb et al. 1990; Treves and Rolls 1992), stability of firing patterns requires larger assembly sizes and, hence, more distributed representations as compared with sequence-type memories. We thus expect that for attractor-type memories, synaptic meta levels are even more useful for enhancing memory lifetimes as compared with the presently investigated sequence-type associations.

### On Disregarding the Overlap

Our results are based on the assumption that synapses that connect neurons belonging to both the cue and the target assembly do not change when storing a new memory. For small coding ratios *f* = *M*/*N*, this effect is negligible because the fraction of neurons in the overlap between cue and target assembly is small (∝*f*^{2}). For higher coding ratios, this assumption is important, though, and its validity depends on how synapses are changed by triplets and quadruplets of pre- and postsynaptic spikes. Froemke and Dan (2002) suggest that these synapses are likely to be depressed, which would correspond to a decrease of the target assembly. This reduction of the assembly size *M* effectively decreases the quality of readout, and thus our results overestimate the memory capacity for large *f*. We, however, did not take this into account, mainly because the effects of spike triplets and quadruplets on synaptic changes are still not completely described and, moreover, the most important part of our results is observed for low coding ratios.

### Application to Hippocampal CA3 Network

Sparse coding, though beneficial for high storage capacities, may not always be a feasible mode of operation. Constraints that limit the degree of sparseness may favor complex synaptic cascades if the network is large enough, for example, *N*≳10^{4} in Figure 5A. As an example, we calculate the memory lifetimes for parameters corresponding to the hippocampal CA3 region of rats (*N* ≈ 250 000, *c*_{m} = 0.05, not shown in figures), where assemblies have been estimated to contain few thousands of neurons (Csicsvari et al. 2000). In this system, a sparser code could be prohibited by requiring dynamical stability of the replay of a sequence of activity patterns (Lee and Wilson 2002; Leibold and Kempter 2006). The evaluation of both metaplasticity models in the CA3-like parameter regime yields a maximal lifetime of about *P*^{max} ≈ 7 000 at a number *n*^{opt} = 2 of synaptic meta levels for the model with complex metaplasticity and a lifetime *P*^{max} ≈ 13 500 at *n*^{opt} = 3 for the serial topology. We thus conclude that few meta levels could increase memory longevity in the hippocampus. For a 2-state synaptic model (*n* = 1) to be optimal for a CA3-type regime, the required representation would have to be sparser than reported, that is, assemblies should contain only few hundreds of neurons.

### Population Sparseness

The level *f* of sparseness as referred to in this paper is generally termed population sparseness (Olshausen and Field 2004). This quantity is hard to assess experimentally because it requires to identify and measure a large number of nonactive neurons. Most experiments therefore address the temporal sparseness derived from the firing rate distributions of single neurons (Rolls and Tovee 1995).

Experimental estimates on population sparseness are few. In the hippocampus, one finds *f* ≈ 10^{−2} (Csicsvari et al. 2000), whereas the barrel cortex (Brecht and Sakmann 2002) and the visual cortex (Weliky et al. 2003) provide coding ratios with large values of *f* ≈ 0.5. For networks with such highly distributed representations, our framework predicts that maximization of memory longevity requires a larger number of synaptic meta levels than expected in the hippocampus.

### If Connectivity Depends on Network Size

Network size *N* and morphological connectivity *c*_{m} were assumed to be independent variables. As a result, the optimal assembly size *M*^{opt} is independent of *N* (eq. 1). However, if one assumes a constant number *c _{m} N* of synapses per neuron, the connectivity

*c*

_{m}decreases with network size like 1/

*N*, and therefore

*M*

^{opt}is proportional to

*N*. In this case, the optimal coding ratio

*f*

^{opt}=

*M*

^{opt}/

*N*and the maximal memory lifetime

*P*

^{max}are independent of

*N*(eq. 2). That is to say, the memory performance of the network for constant

*c*

_{m}

*N*is determined by the number of synapses a neuron can support, that is, the “size” of single neurons rather than the network size (Leibold and Kempter 2006).

### Contributions of Noise

The analytical results presented here are based on a mean-field approach and an evaluation of signal-to-noise ratios, with an inherent source of noise owing to given random morphological connectivities; see Appendix. We neglect several additional sources of noise that may be present in biology. One might discern between external and internal noise contributions. Possible external sources of noise are fluctuations of the neuronal firing threshold, errors while activating a cue pattern, or variations of assembly sizes. Other noise sources can be considered as internal, such as variations of the synaptic state distributions between different cells of a target assembly. Independent of their nature, these additional noise sources will always increase the variance of the postsynaptic depolarization; see Figure 1A and equation (13) in the Appendix. As a consequence, memory lifetime, on average, will decrease, although some specific associations may even have an enhanced lifetime. For example, if we consider a variability of the assembly sizes *M*, an association between 2 assemblies that occasionally are larger than average will remain stored longer than it would be in the case of a network with all assemblies having identical size. However, on average, the lifetime would be reduced because associations with fewer synapses are forgotten faster and, moreover, an association between larger-than-average assemblies overwrites more-than-average synapses required by earlier memory traces.

To conclude, our mean-field results provide an upper bound of memory lifetimes. This upper bound is a good approximation if there is little additional noise. Internal noise is small specifically for the case of large network size *N* and large assembly sizes *M*.

### Alternative Roles for Metaplasticity

Though it may seem elementary that increasing the complexity of synaptic state transitions prolongs memory lifetimes, our results clearly demonstrate that this is not the case in general. For sparsely encoded memories, increasing the level of metaplasticity can even be detrimental. Metaplasticity prolongs memory longevity only if the neuronal encoding is highly distributed. One thus might speculate that besides the prolongation of memory lifetime, metaplasticity could also serve a different functional purpose. For example, metaplasticity may provide a substrate to evaluate memories. More important memory traces can be reflected through meta levels with lower transition probabilities as compared with less important memories. Evaluation may be reward-based, repetition-based (Sajikumar and Frey 2004), or context-based. A functional understanding of the design of synaptic plasticity will hence also be closely related to specific behavioral contexts.

### Appendix

We consider a recurrent network of *N* randomly coupled McCulloch–Pitts neurons (McCulloch and Pitts 1943) in discrete time. The network's morphological connectivity *c*_{m} is the probability that one neuron is synaptically connected to another. A neuron fires when its postsynaptic potential *h* crosses a firing threshold θ. The potential *h* is determined by the synaptic inputs arising from the network activity in the previous time step.

We assume that each synapse can exist in one of 2*n* discrete states. In the case *n* = 1, we have 2-state synapses: a synapse is either silent (state 1) and has zero weight, *w*_{1} = 0, and therefore no effect on the postsynaptic neuron, or it is activated (state 2) and may increase the postsynaptic depolarization *h* by weight *w*_{2} = 1. In general, the state-specific weights are described by **w** = (*w*_{1}, …, *w*_{2n})^{T} in which 0 ≤ *w*_{ν} ≤ 1 is the synaptic weight assigned to state ν∈{1,…,2*n*}.

We define an assembly as a group of *M* randomly selected neurons. The collective activation of this specific set of neurons is thought to represent some particular external event. The probability that a randomly selected neuron in a network of *N* cells belongs to a specific assembly of size *M* is called coding ratio *f* = *M*/*N*. A sparse representation of memories is then reflected by *f*≪1.

An association is a link between a random pair of externally predefined assemblies such that synchronous firing of the neurons forming a cue assembly activates a sufficiently large portion of the neurons in a target assembly.

Because we consider a recurrent network, *fM* neurons on average belong to both the cue and the target assembly. These *fM* neurons are also referred to as overlap between the 2 assemblies. For small *f*, the overlap is negligible and framework becomes essentially a feed-forward network.

The set of synapses contributing to one particular association are described by the state distribution **z**=(*z*_{1},…,*z*_{2n})^{T}∈[0,1]^{2n} (Amit and Fusi 1994), which determines the occupancies of the 2*n* states and is normalized to (1,…,1)**z**=∑_{ν=1}^{2n}*z*_{ν}=1.

#### A.1 Dynamics of the State Distribution

During learning, synapses change states. The storage of a new association requires that many of the synapses from cue to target neurons undergo LTP. Synapses in the reverse direction, from target to cue, preferentially experience LTD.

We assume that synapses stay unaltered if they connect neurons that belong to both the cue and the target assembly, that is, synapses that are related to the *fM* neurons in the overlap (see above). As a result, while learning a new association, the number of synapses that receive an LTP stimulus equals *c*_{m}[*M*(1 – *f*)]^{2}, owing to the *M*(1 – *f*) cells in the cue and target assemblies that are not in the overlap. In the same way, the identical number *c*_{m}[*M*(1 – *f*)]^{2} of synapses from target to cue neurons are exposed to an LTD stimulus. We note that this exclusion of the overlap synapses is not equivalent to considering a feed-forward network with assembly size *M*(1 – *f*) because the overlap synapses also contribute to both the mean and the variance of postsynaptic depolarization.

After being exposed to an LTP stimulus, a synapse may change its state from ν to ν′ > ν. For an LTD stimulus, synapses may switch from ν to ν′ < ν. The probabilities that these state changes actually occur are denoted as *q*_{ν′ν}. The coefficients *q*_{ν′ν} constitute the nondiagonal elements of the plasticity matrix **Q**. Its diagonal elements *q*_{νν}=−∑_{ν′≠ν}*q*_{ν′ν} are constructed such that **Q** has vanishing column sums, that is, (1,…,1) **Q**=0, which preserves the normalization of **z** (see end of Appendix A.1).

The fraction of synapses that connect the disjoint subsets of cells in a given pair of cue and target assembly equals *f*^{2}(1 – *f*)^{2}. Storing another association, we thus find the probability *f*^{2}(1 – *f*)^{2}*q*_{ν′ν} that an “arbitrary” synapse in state ν changes its state to ν′ ≠ ν. Similarly, the probability of an arbitrary synapse to remain in state ν amounts to 1 + *f*^{2}(1 – *f*)^{2}*q*_{νν}. For a given plasticity matrix **Q**, online learning of random associations can then be described by a linear iterated map (Amit and Fusi 1994). Storing of one further random association maps the state distribution **z**(*t*) of an association at time *t* to the distribution:

**z**(0) denotes the state distribution of an association right after learning.

The fixed point of the dynamics equation (5) is defined via . If a learning rule is such that , we also say that this learning rule is “balanced.” Otherwise, if , we call the learning rule unbalanced.

To fully specify the dynamics in equation (6), we have to find an expression for **z**(0) immediately after imprinting a particular association. The state distribution **z**(0) can be expressed as a superposition of (1 – *f*)^{2} times a potentiated state **z**_{LTP}, and 1 – (1 – *f*)^{2} times the equilibrium state , that is,

*f*→ 0, the initial state

**z**(0) equals

**z**

_{LTP}, and in the limit

*f*→ 1, we have

**z**(0) = , which means no synaptic state changes occur. Both

**z**

_{LTP}and are determined by the specific choice of the synapse model reflected by the plasticity matrix

**Q**. Combining equations (6) and (7), we arrive at an expression for the temporal evolution of the state distribution during online learning,

*t*further associations is adequate to still retrieve the respective memory, in Appendix A.4, we introduce a readout criterion based on equation (9).

As already announced, the dynamics of **z** preserves the normalization (1, …, 1) **z** = 1 because of the vanishing column sums of **Q**. This can be shown from equation (5) because

**z**(

*t*).

#### A.2 Mean Membrane Depolarization and Variance

Given a state distribution **z**, we can calculate the first and second moment of the postsynaptic potential *h*, which are needed below to calculate the readout quality of the target assembly. If the whole group of cue neurons fires simultaneously, the mean membrane depolarization of a target cell amounts to

*t*of subsequently stored memories as

In analogy to equation (10), we also find an expression for the variance of the membrane potential *h*,

_{ν}∈ {0, 1}), this expression simplifies to

In what follows, we will assume the assembly size *M* to be large, that is, only groups of *M* ≫ 1 simultaneously active neurons are considered to carry meaningful information. As a result, the distribution of *h* can be approximated to be Gaussian and, hence, the first 2 moments are sufficient in the sense of this approximation.

#### A.3 Example: 2-State Synapses

The case of 2-state synapses (*n* = 1) allows an explicit illustration of the dynamics of the state distribution **z** = (*z*_{1}, *z*_{2})^{T}. There, the fraction of silent synapses is denoted by *z*_{1} and the fraction of activated synapses supporting an association is *z*_{2} = 1 – *z*_{1}. For a weight vector **w** = (0, 1)^{T}, we find the mean membrane depolarization to be = *c*_{m}*Mz*_{2} and the variance to equal .

Assuming that a new association activates all possible cue-to-target synapses and depresses all possible target-to-cue connections, we have a plasticity matrix and a potentiated state **z**_{LTP} = (0, 1)^{T}. Then, the equilibrium state equals and the state excess is given by . Together with equation (7), we find an initial state distribution:

**Qδz**= −2

**δz**, we then derive the dynamics of the activated synapses,

*z*

_{2}decays with a time constant of τ = 1/|ln[1 – 2

*f*

^{2}(1 –

*f*)

^{2}]| The fraction

*z*

_{2}decays with a time constant τ = 1/ln[1−2

*f*

^{2}(1−

*f*)

^{2}] toward an equilibrium value of see Figure 1. For low coding ratios

*f*, we shall use the approximation τ = 1/(2

*f*

^{2}).

#### A.4 Readout Criterion

The number *P* of associations that can be stored in a network is determined via signal-detection theory as described in Leibold and Kempter (2006). There, an activity pattern is said to be read out if the fraction *p*_{1} of correctly activated neurons (hits) exceeds the fraction *p*_{0} of incorrectly activated neurons (false alarms) by some constant detection threshold γ > 0. The maximum lifetime *P* of a memory is then derived from the condition γ = *p*_{1}(*P*) – *p*_{0}.

Assuming Gaussian statistics, the fraction of false alarms equals

*t*) at the target neurons and firing threshold θ. Hence, the detection criterion amounts to

*t*an association remains in memory. Thus (for a subset of thresholds), we can numerically determine

*t*as a function

**F**of the firing threshold θ. One then defines the maximum number

*P*of storable associations via

*P*of maximally stored associations is also termed the lifetime or longevity of a memory trace, which implies the idea that the network is persistently exposed to plasticity signals that drive the storage of new associations.

All results are derived with a readout quality of γ = 0.7. It has been shown in Leibold and Kempter (2006) that different values for γ do not change the scaling laws of memory capacity.

#### A.5 Limit of Low Coding Ratios

For low coding ratios *f* ≪ 1, we can find explicit formulas to approximate the memory lifetime *P*. We therefore expand equation (11) to the lowest order in *f*^{2}. Defining **L** to be the lowest exponent *t* ≥ **L**, at which **w** · **Q**^{t}**δz** ≠ 0, this expansion yields

*w*

_{ν}∈ {0, 1}), we can combine equations (15) and (16) through elimination of θ and solve the resulting quadratic equation for 〈h〉(

*P*)–. Together with equations (10) and (13), we find that the memory signal is given by

*K*for the signal-to-noise ratio reflects both the neuronal firing threshold θ and the readout quality γ. In what follows, we assume

*K*to optimized with respect to θ.

We now use Stirling's formula to approximate the binomial coefficient in equation (19) for *P* ≫ **L**, . Then, together with equation (20) and *f* → 0, we obtain the following expression for the maximum number *P* of stored associations

*f*to be optimized such that we find

*f*≪ 1 for a large number

*c*

_{m}

*N*of synapses per neuron. The longevity of the memory trace then amounts to

##### A.5.1 Two-State Synapses

In the case of 2-state synapses (*n* = 1), we have **w** · **δz** = and **w** · **Qδz** = −1 with **L** = 1. From equation (22), we then find an optimal coding ratio

The optimal assembly size for 2-state synapses derived in the Results is equivalent to equation (24) in the sense of the linear approximation *e*^{−Pf2} = 1–*Pf*^{2} of the exponential function used in equation (19). More specifically, the prefactor approximates the factor 2*e*^{1/2} in equation (1).

From equation (24), we derive the maximal number of associations as

##### A.5.2 Synapses with Complex Metaplasticity

The states in the metaplasticity model by Fusi et al. (2005) are enumerated such that ν = 1 is the most depressed state (bottom left in Fig. 2A). The other states are counted clockwise such that ν = 2*n* denotes the most potentiated state (bottom right in Fig. 2A). Then, the plasticity matrix for *n* = 4 reads

*n*= 4) to be smaller by a factor This is done in order to break the degeneracy of the models for

*n*= 1 and

*n*= 2, which are equivalent in the original formulation. As a drawback, our mathematical expressions are slightly more complicated, for example, the steady state in the original model is exactly equally distributed, whereas in our formulation it is not (see below). This slight alteration, however, does not substantially change the behavior of the model because the main results of Fusi et al. (2005) can be reproduced. In our model, the steady-state distribution reads

**L**= 1, the linearization in equation (23) requires expressions for

**w**·

**δz**and

**w**·

**Qδz**. With the above formulas and the weight vector

**w**= (0, …, 0, 1, …, 1), we obtain

##### A.5.3 Synapses with Serial State Transitions

Another model in which the state transitions connect the synaptic states one after the other is depicted in Figure 3A. We again enumerate the states clockwise such that ν = 1 corresponds to the most depressed state (bottom left) and ν = 2*n* corresponds to the most potentiated state *w*_{2n} = 1 (bottom right). Then, the plasticity matrix (for *n* = 3) reads

**w**= (0, …, 0, 1, …, 1)

^{T}defines the scalar products

**L**=

*n*can be obtained from property of vanishing column sums 0 = (1, …, 1)

**Q**

^{L}. From that and equation (23), one finds the scaling law

We thank Stefano Fusi for discussions and comments on the manuscript and Walter Senn for discussions. We are also indebted to Tim Gollisch and Roland Schaette for valuable suggestions and careful reading. This research was supported by the Deutsche Forschungsgemeinschaft (Emmy Noether Programm: Ke 788/1-3, SFB 618) and the Bundesministerium für Bildung und Forschung (Bernstein Center for Computational Neuroscience Berlin, 01GQ0410). *Conflict of Interest*: None declared.