Abstract

Synaptic changes impair previously acquired memory traces. The smaller this impairment the larger is the longevity of memories. Two strategies have been suggested to keep memories from being overwritten too rapidly while preserving receptiveness to new contents: either introducing synaptic meta levels that store the history of synaptic state changes or reducing the number of synchronously active neurons, which decreases interference. We find that synaptic metaplasticity indeed can prolong memory lifetimes but only under the restriction that the neuronal population code is not too sparse. For sparse codes, metaplasticity may actually hinder memory longevity. This is important because in memory-related brain regions as the hippocampus population codes are sparse. Comparing 2 different synaptic cascade models with binary weights, we find that a serial topology of synaptic state transitions gives rise to larger memory capacities than a model with cross transitions. For the serial model, memory capacity is virtually independent of network size and connectivity.

Introduction

Since Hebb (1949), synaptic plasticity is seen as a graded change of the amplitude of a postsynaptic current. A huge body of theoretical literature has applied and discussed this paradigm in the light of associative memories (Hopfield 1982), self-organized map formation (Linsker 1986), and, more recently, also spike-timing–dependent synaptic plasticity (Gerstner et al. 1996; Kempter et al. 1999; Song et al. 2000). Synaptic physiology, however, has shifted the view on synaptic plasticity from a continuous modification of coupling strength to a switching between stable discrete states (for review, see Lüscher et al. 2000; Lüscher and Frerking 2001; Kullmann 2003; Montgomery and Madison 2004). Two remarkable discoveries are all-or-none potentiation and depression (Petersen et al. 1998; O'Connor et al. 2005) and silent synapses (Isaac et al. 1995; Liao et al. 1995; Montgomery et al. 2001). In addition to state changes that alter the efficacy of a synaptic connection, there are also changes that leave the efficacy unaltered but modify a synapse's capability to undergo subsequent plastic changes. The latter state transitions are generally referred to as metaplasticity (Abraham and Bear 1996; Montgomery and Madison 2002; Sajikumar and Frey 2004).

We are presently only beginning to understand how learning with discrete synapses affects functional and computational properties of neural systems (e.g., Brunel et al. 2004; Fusi and Abbott 2007). Obviously, the larger the synaptic state space, the more information can be stored in the synaptic configuration. But how can this information be retrieved in a dynamical neural network? This is a nontrivial problem, in particular because metaplastic changes do not affect postsynaptic currents and, hence, information about synaptic meta states cannot be derived from the activity of neurons.

Metaplasticity is commonly thought to prolong the lifetime of memories while keeping the network receptive to the storage of new memories (Abraham and Bear 1996; Montgomery and Madison 2004). This idea has recently been corroborated by Fusi et al. (2005). Assuming a homogeneous population of synapses, they have derived that a certain number of synaptic meta states is optimal with respect to maximizing memory lifetime. Their model, however, does not consider the dynamics of a neural network, that is, how the evoked postsynaptic potentials sum up and give rise to suprathreshold network activity reflecting the stored memories. In this paper, we show that the necessity to retrieve synaptically stored information from the activity of neurons poses a crucial constraint. In particular, the advantage of metaplasticity may be outweighed by sparse representations of memories (Tsodyks and Feigel'man 1988; Amit and Fusi 1994). For highly distributed codes, on the other hand, we propose a new metaplasticity model with an optimal memory capacity that is independent of network size and connectivity.

Results

Memories are considered to be successively imprinted into the synapses of a recurrently connected network of N neurons through discrete changes of synaptic states. These memories degrade over time because of ongoing storage of new memories. Here, a memory trace is regarded as the association between 2 random assemblies of M neurons, a cue and a target assembly (e.g., Willshaw et al. 1969; Nadal 1991; Leibold and Kempter 2006). The association is said to be present in the network if an active cue assembly evokes a sufficient amount of activity in the target assembly and does not significantly activate other assemblies.

Learning is thought to establish such an association between 2 formerly unrelated assemblies via synaptic alterations in a supervised fashion. Synapses from cue to target neurons receive a signal that is proposed to initiate long-term potentiation (LTP) with some probability. Synapses in the reverse direction (target to cue) may undergo long-term depression (LTD). The learning rule is thus assumed to be local: it triggers plasticity signals only at synapses that connect neurons of the cue and target assembly, whereas all other synapses remain unaffected. In addition, we assume that synapses related to neurons that are members of both the cue and the target assembly receive an indecisive signal and do not change states (Discussion). If f = M/N denotes the probability that a randomly chosen neuron belongs to a specific assembly, the indecisive signal is conveyed to the synapses of the Mf neurons in the overlap between both assemblies. The abbreviation f = M/N is also called the coding ratio.

Synaptic alterations are considered to occur via switch-like transitions between the discrete synaptic states that are induced by the plasticity events associated with the storage of new memories. Three models of synapses with discrete states are considered: 1) two-state synapses, which are either silent or activated; 2) binary synapses with multiple states and complex metaplasticity, which resemble the cascade model of Fusi et al. (2005); and 3) binary synapses with multiple serial state transitions, in which the probabilities of all transitions equals one.

In the present framework, the overall distribution of synaptic states in the whole network resides in equilibrium at any time. The specific shape of the equilibrium distribution is determined by the specific choices of synapse model and learning rules (see below and Appendix A.5.1, A.5.2, and A.5.3). Intuitively, if the probability for LTD exceeds the probability for LTP, more synapses are in a depotentiated state and vice versa. In contrast to this overall distribution, the state distribution attached to one specific memory (association) may be far from equilibrium: Right after learning a new association, a more than average fraction of synapses connecting cue to target neurons is in the potentiated state. This association-specific distribution then asymptotically converges to the equilibrium distribution while successively storing other associations. The convergence to equilibrium thus corresponds to the overwriting of this particular memory.

As a measure of memory lifetime, we define the number P of successive associations that are necessary to overwrite a specific initial association such that the latter no longer can be retrieved. Given a constant rate ρ of new associations per time, the time P/ρ can be reinterpreted as the duration a memory remains stored in the synaptic configuration of the network (Amit and Fusi 1994). The learning rate ρ, however, is difficult to estimate and will be substantially influenced by a variety of factors like species, brain region, environment, behavioral state, attention, etc. In what follows, we thus simply neglect ρ and, instead, use P directly as a measure of memory lifetime. In other words, we measure time in units of newly acquired memories. Technically, this is equivalent to setting ρ = 1.

Two-State Synapses

To gain understanding of how memory lifetime (longevity) arises in a dynamical recurrent network, we start out by considering binary synapses without meta states, which means that each single synapse can only exist in one of 2 states, silent or activated. Immediately after having stored an association, all synapses that connect the cue neurons to the target neurons shall be activated, that is, the learning rule is such that these synapses switch from the silent to the activated state with probability one. Lower transition probabilities not only enhance memory lifetime but also reduce the network's receptiveness to new memories (Amit and Fusi 1994; Fusi et al. 2005) and, thus, are not considered here. Furthermore, let us assume that synapses from the target neurons back to the cue neurons are silenced with probability one. This leads to an equilibrium distribution of synaptic states, in which half of the synapses are activated and the other half is silent. For a given probability cm that a neuron is synaptically connected to another one, the number of activated synapses in a network of N neurons therefore fluctuates around a mean value of cmN2/2. The probability cm of a morphological connection is also called the morphological connectivity.

As a result of the equilibrium distribution of synaptic states, an arbitrary pair of pre- and postsynaptic neuron is connected via an activated synapse with probability cm/2. Thus, at neurons that do not belong to a particular target assembly, M simultaneously spiking cue cells give rise to cmM/2 inputs from nonsilent synapses, which is thus considered as a baseline or background depolarization in what follows. We therefore define the “signal” of a memory trace to be the excess depolarization with respect to this baseline. Because, for the 2-state model, we have modeled the storage of a memory through the activation of all synapses connecting the cue to the target assembly, the initial memory signal equals the number of synapses at a target cell that were switched on. Because we have assumed that the synapses at the Mf neurons in the overlap of cue and target assembly do not change state, the initial memory signal at one target neuron amounts to M(1 – f)cm/2 synapses on average.

Sparse Representations

To understand the temporal evolution of the memory signal, let us first focus on the case of sparse coding, f=M/N≪1. This scenario allows for an easy analytic treatment and interpretation. A more general scenario is outlined in the Appendix.

The initial signal cmM/2 of a memory trace at t = 0 decays because of the storage of further memories. The storage of a single new memory in a subsequent time step switches off synapses in the initial memory trace with probability f2. The number of activated synapses in a specific association therefore decays to a resting level of cmM/2 with time constant τ = 1/(2f2), and thus the signal after t further associations amounts to cmM/2 exp[–2f2t]. This decay of the number of activated synapses is illustrated in Figure 1A.

Figure 1.

Online learning with 2-state synapses. (A) The 2 noisy traces display the number of activated synapses at 2 target neurons as a function of the number of subsequently stored associations between pairs of cell assemblies of size M = 1000. For a morphological connectivity cm = 0.1, the number of initially activated synapses equals cmM = 100. On average (thick line), the number of activated synapses decays exponentially as a function of the number of successively stored associations. For a network size of N = 106, the decay time constant τ equals 1/(2f2) = 5 × 105 associations. After storing many associations, the number of activated synapses of single cells fluctuates around a steady-state value of cmM/2 = 50, which reflects the equilibrium distribution of synaptic states of the current learning rule. The lifetime P of the memory is determined by the intersection between the mean signal and a detection threshold (dashed line) obtained from signal-detection theory (Appendix A.4). (B) As a consequence of the applied readout criterion, the memory capacity α = P/(cmN) of online learning is a function of the coding ratio f = M/N. The maximum memory capacity αmax occurs at small f. (C) The maximum capacity αmax scales like N (discs). For fixed and high coding ratios, for example, f = 0.3 (squares), however, αmax decreases with increasing N.

Figure 1.

Online learning with 2-state synapses. (A) The 2 noisy traces display the number of activated synapses at 2 target neurons as a function of the number of subsequently stored associations between pairs of cell assemblies of size M = 1000. For a morphological connectivity cm = 0.1, the number of initially activated synapses equals cmM = 100. On average (thick line), the number of activated synapses decays exponentially as a function of the number of successively stored associations. For a network size of N = 106, the decay time constant τ equals 1/(2f2) = 5 × 105 associations. After storing many associations, the number of activated synapses of single cells fluctuates around a steady-state value of cmM/2 = 50, which reflects the equilibrium distribution of synaptic states of the current learning rule. The lifetime P of the memory is determined by the intersection between the mean signal and a detection threshold (dashed line) obtained from signal-detection theory (Appendix A.4). (B) As a consequence of the applied readout criterion, the memory capacity α = P/(cmN) of online learning is a function of the coding ratio f = M/N. The maximum memory capacity αmax occurs at small f. (C) The maximum capacity αmax scales like N (discs). For fixed and high coding ratios, for example, f = 0.3 (squares), however, αmax decreases with increasing N.

The ongoing storage of associations between pairs of random cell assemblies also induces fluctuations of the memory signal. These fluctuations define a noise level above which the signal must be detectable. The standard deviation of these fluctuations equals the square root of the average number of activated synapses for random selections of M presynaptic neurons, that is, forumla.

The lifetime P of a memory is the time t at which, on average, the signal arrives at the noise level. We again note that time is measured in units of stored associations, so that P also equals the total number of associations that can be stored in the network. More specifically, we define P as the time at which the signal-to-noise ratio of a memory imprinted at time t = 0 equals a threshold K > 0, which is determined by the desired amount of activation of the target assembly. It will turn out that the specific choice of the threshold K does not change the general dependence of P on system parameters such as the network size N, assembly size M, and connectivity cm. We emphasize that here the detectability of the memory signal is defined via an average over the ensemble of target neurons, and thus the memory signal is no single-cell quantity, though it may seem so (Appendix A.4).

Given a memory signal cmM/2 exp[–2f2t] and a noise level forumla, the signal-to-noise ratio attains a value of K at time forumla. This is an encouraging result because the lifetime P of a memory increases quadratically with network size N. Moreover, it is proportional to the total number cmN2 of synapses in the network, leaving cm and M fixed. In contrast, for a fixed coding ratio f = M/N, P scales logarithmically with the average number cmM of synapses per target neuron involved in a particular association. Maximum longevity Pmax is obtained from dP/dM = 0, which yields an optimal assembly size 

(1)
graphic
that is independent of N. This independence of N reflects the assumption that there is no spurious (or spontaneous) activity in the network. Spurious activity would induce fluctuations of the memory signal that increase with network size and thus would break up N independence, as it is the case for attractor-type networks (e.g., Golomb et al. 1990). Moreover, the N independence of Mopt can also be broken up by further constraints, as, for example, the assumption of a fixed number Ncm of synapses per neuron (Leibold and Kempter 2006) (Discussion).

At the optimal assembly size, the memory lifetime reaches its maximum 

(2)
graphic
that is proportional to the square of the average number cmN of synapses per neuron. For biologically reasonable choices, 0.01 < cm < 0.3, N > 104, and K ≈ 1, we find the optimal coding ratio fopt = Mopt/N to be small, which is consistent with the initial assumption MN. We note that throughout the paper (cf., Appendix A.5), analytical results concerning Mopt and Pmax are derived in the limit of sparse coding MN and checked for self-consistency.

General Solutions

Analytical results obtained in a regime of sparse representations are corroborated by numerical evaluations of the readout criterion (Appendix A.4), which allowed us to also consider nonsparse coding ratios.

To compare memory performance of networks with different sizes N and connectivities cm, one defines the memory capacity 

graphic
that is the number of stored memories P divided by the number cmN of synapses per neuron. The functional dependence of α on the coding ratio f is stereotypic (Fig. 1B). For infinitely low f, the assembly size is too small to make the signal (∝M) overcome the noise forumla. If f grows beyond some minimal value, the capacity rapidly rises toward its maximum, which is taken at some optimal, sparse level fopt≪1. As f is further increased, the capacity drops again because larger assembly sizes generate larger overlaps between memory traces, which in turn leads to a higher amount of synaptic interference during learning and, hence, forgetting is faster.

The numerical results shown in Figure 1C confirm our analytical results that the storage capacity α at the optimal coding ratio fopt scales linearly with N. Classical approaches to memory capacity (e.g., Golomb et al. 1990; Nadal 1991), however, do not optimize with respect to f but treat it as a free parameter. Yet, for fixed f, the capacity exhibits inferior scaling behavior and may even decrease with N (e.g., f = 0.3 in Fig. 1C).

Binary Synapses with Complex Metaplasticity

The lifetime of a memory can be prolonged by reducing the probability of synaptic changes (Amit and Fusi 1994). An obvious drawback of such a strategy is that memories become more rigid and, hence, it is more difficult to store new associations. This problem is the well-known dilemma of finding a compromise between providing a high amount of plasticity and, at the same time, ensuring a high longevity of memories.

Fusi et al. (2005) have recently shown for M/N≲1 that synaptic metaplasticity might offer an elegant solution to this dilemma. Metaplasticity means that synaptic changes do not necessarily alter a synapse's influence on the postsynaptic membrane potential but rather modulate the probability that the synapse undergoes a weight change the next time it is exposed to an LTP or LTD stimulus. Figure 2A illustrates the transition probabilities for a slightly modified (see Appendix A.5.2) version of the cascade model defined by Fusi et al. (2005) for n = 4 meta levels. The synaptic weight either attains the value w = 0 (silent) or w = 1 (activated), as in the case of 2-state synapses. The probabilities of state transitions for n > 1 are constructed as follows: if the synapse is silent (activated) and is exposed to an LTP (LTD) stimulus, it becomes potentiated (depotentiated) with probability (1/2)μ, where μ = 0, 1, …, n − 1 μ = 0, 1, …, n − 1 counts the meta levels.. Thus, the smallest transition probability amounts to forumla If a synapse is silent (activated) and receives an LTD (LTP) stimulus, it switches to level μ + 1 with probability (1/2)μ. In that case, the synaptic weight w remains unchanged. However, if a silent (activated) synapse is in the “lowest” meta level μ = n – 1 and receives an LTD (LTP) stimulus, no state change occurs.

Figure 2.

Memory lifetime for synapses with complex metaplasticity. (A) Synapses have either weight w = 0 (blue, left) or weight w = 1 (red, right) and exist in distinct meta levels μ = 0, 1, 2, 3 (top to bottom), as in the cascade model proposed by Fusi et al. (2005). Arrows indicate possible transitions between states. Transition probabilities for an LTP stimulus are depicted in brown and transition probabilities for an LTD stimulus in cyan. (B) The lifetime P of a memory as a function of assembly size M shows a maximum Pmax (black disc) at a low assembly size MN. The 2-state model (n = 1, red line; cf., Fig. 1B) has the highest maximum. The maximal lifetime Pmax decreases with increasing number n of meta levels (n = 1, …, 20). For a large assembly size M (inset), there exists an optimal number nopt > 1 of meta levels. The numerical results were obtained with parameters cm = 0.1 and N = 106.

Figure 2.

Memory lifetime for synapses with complex metaplasticity. (A) Synapses have either weight w = 0 (blue, left) or weight w = 1 (red, right) and exist in distinct meta levels μ = 0, 1, 2, 3 (top to bottom), as in the cascade model proposed by Fusi et al. (2005). Arrows indicate possible transitions between states. Transition probabilities for an LTP stimulus are depicted in brown and transition probabilities for an LTD stimulus in cyan. (B) The lifetime P of a memory as a function of assembly size M shows a maximum Pmax (black disc) at a low assembly size MN. The 2-state model (n = 1, red line; cf., Fig. 1B) has the highest maximum. The maximal lifetime Pmax decreases with increasing number n of meta levels (n = 1, …, 20). For a large assembly size M (inset), there exists an optimal number nopt > 1 of meta levels. The numerical results were obtained with parameters cm = 0.1 and N = 106.

Metaplasticity is thought to increase memory lifetime because the synaptic meta level reflects the “history of the synapse,” and future plasticity is “dictated by previous plastic changes” (Montgomery and Madison 2004; see also Abraham and Bear 1996). The decay of memory traces is thus considered to be represented through a trajectory in the synaptic state space. As the number of synaptic meta levels increases, the history of a synapse can be maintained for a longer amount of time and, hence, memory longevity is enhanced.

Memory longevity, however, depends not only on the number n of meta levels but also on the sparseness of the code, that is, the assembly size M. Both quantities cannot be optimized independently of each other to maximize memory longevity: larger n, fewer synapses are driven into a potentiated state by an LTP signal because more synapses will reside at meta levels with a decreased probability of potentiation. As a result, the mean memory signal is smaller, the more meta levels the synapse provides. To nevertheless produce a sufficiently high memory signal, the number M of cells in a synchronously active assembly must increase with the number n of meta levels. Larger assemblies, however, are detrimental for memory lifetime because of the increased amount of interference between the stored associations. It is a nontrivial problem to understand whether the advantages of sparsification outweigh those of an increased synaptic state space or vice versa.

Here we discuss the trade-off between sparse coding and the number of synaptic meta levels in the light of our network model. The longevity P of memories as a function of the assembly size M is derived numerically (Appendix) and is illustrated in Figure 2B for several different numbers n of meta levels. We observe that increasing n reduces the maximal memory lifetime Pmax while the optimal assembly size Mopt at the maximum becomes larger. Analytical results derived for f = M/N → 0 show the optimal assembly size to scale like (n + 1)2 and the maximal lifetime to decrease like (Appendix, eq. [30]) 

(3)
graphic
Thus, the gain in memory longevity due to the introduction of meta levels is by far lower than the loss owing to the increase of the assembly size to maintain a sufficiently high postsynaptic signal. In contrast, for large assembly sizes M, that is, nonsparse representations (inset in Fig. 2B), models with many meta levels become superior, but the memory longevity can be orders of magnitude below the sparse-coding maximum. This finding is consistent with the results reported by Fusi et al. (2005), who have derived the existence of a lifetime-optimal number nopt of meta levels in a framework, which corresponds to an assembly size that is of the same order of magnitude as the number N of neurons in the network, that is, f = M/N ≈ 0.5.

Binary Synapses with Serial State Transitions

The above findings about the longevity of memories may either reflect a specific feature of the transition topology of the model of Fusi et al. (2005) or they may be a general property of metaplasticity rules. To further investigate this question, we propose a simpler topology of transitions between synaptic states, in which synaptic states are connected one after the other. This model is illustrated in Figure 3A for n = 3 meta levels. The transition probabilities between all states equal one, that is, after a plasticity signal every possible synaptic state change occurs. However, only state changes within meta level μ = 0 are also associated with a weight change.

Figure 3.

Memory lifetime for synapses with serial state transitions. (A) Synaptic states are connected one after another. All transition probabilities (green) equal one. (B) The lifetime P of a memory as a function of the assembly size M shows a maximum (black disc) at low values of M; compared with Figure 2B. The 2-state model (red line) has the highest maximum. With increasing number n = 1, …, 20 of synaptic meta levels, the maximum decreases, but not as fast as in Figure 2B for complex metaplasticity. For large assembly size M (inset), there exists an optimal number nopt of meta levels that is larger than in the case of complex metaplasticity. Further parameters were cm = 0.1, N = 106.

Figure 3.

Memory lifetime for synapses with serial state transitions. (A) Synaptic states are connected one after another. All transition probabilities (green) equal one. (B) The lifetime P of a memory as a function of the assembly size M shows a maximum (black disc) at low values of M; compared with Figure 2B. The 2-state model (red line) has the highest maximum. With increasing number n = 1, …, 20 of synaptic meta levels, the maximum decreases, but not as fast as in Figure 2B for complex metaplasticity. For large assembly size M (inset), there exists an optimal number nopt of meta levels that is larger than in the case of complex metaplasticity. Further parameters were cm = 0.1, N = 106.

Numerical evaluation of memory lifetime for the serial model (Fig. 3B) reveals a similar behavior as for the complex metaplasticity model. The smaller the number n of synaptic meta levels, the higher we find the maximal lifetime Pmax. For large assembly sizes M, meta levels (n > 1) can again be advantageous. An estimate of P in the case f = M/N → 0 reveals that the optimal assembly size Mopt scales with n2 and that the maximal lifetime decreases like (Appendix, eq. [34]) 

(4)
graphic
We note that, although all probabilities for state transitions equal one, the serial model contains 2n – 1 different timescales of forgetting that correspond to the nonzero eigenvalues of the transition matrix (Appendix A.5.3). These timescales vary between (2f)−2 and forumla (Amit and Fusi 1994) and account for an approximate power-law forgetting in the same way as for the complex cascade model of Fusi et al. (2005).

Unbalanced Plasticity

In all models presented so far, LTP and LTD are precisely balanced, which is defined via the depolarization of nontarget cells to be half the maximum depolarization, that is, forumla (Appendix A.1). To rule out that the assumption of balanced LTP and LTD accounts for the longest memory lifetimes of 2-state synapses, we also investigated variants of the serial model in which we reduced the transition probabilities for either LTP or LTD stimuli, which serves the purpose of unbalancing the learning rule.

The numerical results of Figure 4A reveal that an LTD-prone regime results in a dramatically altered equilibrium state distribution: depressed states (≤n) are more strongly occupied than potentiated states (>n). Interestingly, in such an LTD-prone regime, with more silent (w = 0) synapses than activated (w = 1) ones, the maximum memory lifetime Pmax can even be slightly enhanced compared with a balanced plasticity rule (Fig. 4B; e.g., Brunel et al. 2004; Leibold and Kempter 2006). This enhancement is essentially due to the increase of the initial memory signal, which is roughly the difference between the maximal depolarization cmM and the equilibrium depolarization. The latter is smaller in an LTD-prone regime giving rise to a larger initial memory signal as compared with a balanced or LTP-prone regime. The equilibrium depolarization, however, must not be too small, because noise becomes more influential, the fewer synapses are involved in an association.

Figure 4.

Unbalanced plasticity for serial state transitions. (A) Reduction of the transition probabilities after an LTP stimulus by 10% yields an equilibrium state distribution that is strongly biased toward more depotentiated states. The 30 synaptic states result from n = 15 meta levels. (B) The maximum memory lifetime Pmax (cf., black discs in Figs 2B and 3B) is robust against unbalancing of the LTP/LTD probability ratio for few (n ⪅ 5) synaptic meta levels. In general, Pmax is more robust against LTD bias as compared with LTP bias. In an LTD-prone regime, memory lifetimes can even be slightly prolonged. Further parameters were cm = 0.1, N = 106.

Figure 4.

Unbalanced plasticity for serial state transitions. (A) Reduction of the transition probabilities after an LTP stimulus by 10% yields an equilibrium state distribution that is strongly biased toward more depotentiated states. The 30 synaptic states result from n = 15 meta levels. (B) The maximum memory lifetime Pmax (cf., black discs in Figs 2B and 3B) is robust against unbalancing of the LTP/LTD probability ratio for few (n ⪅ 5) synaptic meta levels. In general, Pmax is more robust against LTD bias as compared with LTP bias. In an LTD-prone regime, memory lifetimes can even be slightly prolonged. Further parameters were cm = 0.1, N = 106.

The maximal memory lifetime Pmax is quite robust against unbalancing LTP and LTD for a small number n of synaptic meta levels (Fig. 4B). With increasing n, the memory performance of the network becomes more sensitive to unbalancing. However, even for n = 15, where the maximum memory lifetime is obtained at an LTP/LTD ratio of about 90%, a deviation of 10% from this optimal LTP/LTD ratio still accounts for about 90% of the largest possible longevity Pmax.

Highly Distributed Representations

We have shown that 2-state synapses are better suited for maximizing the lifetime of associations between random assemblies if a neuronal system is capable of optimizing the number M of neurons firing synchronously in an assembly. This optimization of M may, however, not always be feasible. For large assembly sizes, we suspect that more complex synapses may become superior in general (see insets in Figs 2B and 3B).

To compare the performances of both synapse models at high coding ratios, we fixed f at a value of 0.3 and numerically calculated the optimal number nopt of meta levels and the respective memory capacity α = P/(cmN); see Figure 5. We find that for both models, the optimal number nopt of synaptic meta levels is an increasing function of network size and morphological connectivity. For the model with complex transitions, Fusi et al. (2005) have shown nopt to scale logarithmically with the total number cmN2 of synapses in the network. In the model with serial state transitions, nopt grows considerably faster (forumla, not shown). However, with respect to memory capacity α, the serial model is superior to the model with complex transitions. More specifically, the complex model's capacity decreases with increasing network size and connectivity, whereas the serial model exhibits a level of capacity that is virtually independent of both network parameters.

Figure 5.

Serial versus complex metaplasticity at a fixed coding ratio f = M/N = 0.3. (A) With increasing network size N, the optimal number nopt of synaptic meta levels is increasing for both synaptic models (top, cm = 0.1). The memory capacity α at nopt is always higher for the model with a serial synaptic transition topology as compared to the one with complex topology (bottom). In the serial model, α is virtually independent of N for N > 104, whereas in the complex model, α decreases with increasing N. (B) The optimal number nopt of meta levels is an increasing function of the morphological connectivity cm (top, N = 106). In the serial model, α is virtually independent of cm, whereas for the complex model, α decreases with increasing cm.

Figure 5.

Serial versus complex metaplasticity at a fixed coding ratio f = M/N = 0.3. (A) With increasing network size N, the optimal number nopt of synaptic meta levels is increasing for both synaptic models (top, cm = 0.1). The memory capacity α at nopt is always higher for the model with a serial synaptic transition topology as compared to the one with complex topology (bottom). In the serial model, α is virtually independent of N for N > 104, whereas in the complex model, α decreases with increasing N. (B) The optimal number nopt of meta levels is an increasing function of the morphological connectivity cm (top, N = 106). In the serial model, α is virtually independent of cm, whereas for the complex model, α decreases with increasing cm.

Discussion

Synaptic metaplasticity is thought to enhance memory lifetimes via storing the history of past synaptic changes (Abraham and Bear 1996; Montgomery and Madison 2004; Fusi et al. 2005). Though intuitive, the present paper shows that this idea is not generally valid. In fact, we find that memory lifetime is governed by an intricate interplay between synaptic state changes and population coding. More specifically, if neuronal representations are allowed to be arbitrarily sparse, an increase of the number n of synaptic meta levels reduces the longevity of memory traces: the more synaptic meta levels, the fewer synapses are potentiated when storing a new memory. The readout of a memory trace, however, requires spiking activity, which needs a sufficient amount of postsynaptic depolarization in response to a synchronously active set of presynaptic neurons. Thus, the number of synchronously firing cells must increase if the number n of synaptic meta levels gets higher. The use of larger assemblies, however, increases the interference between memories, which strongly reduces memory longevity. Figures 2 and 3 show that this disadvantage cannot be overcome by using cascade models of synaptic plasticity. As a result, 2-state synapses are optimal for online learning of sparsely encoded associations. If the assembly size, however, is large and representations are highly distributed, binary synapses with multiple meta levels are generally superior to 2-state synapses, and the network can make use of the history of synaptic changes, which is stored in the synaptic meta levels.

Model of Online Learning

The mathematical model we use for online learning is motivated by the central idea that old memories are gradually replaced by new memories. Online learning is described by the dynamics of the distribution of synaptic states related to a specific association that is stored in a recurrent network (Amit and Fusi 1994). This dynamics of the state distribution is a multidimensional linear iterative map. The eigenvalues of the linear map provide the inverse timescales for overwriting of memories. As a criterion to test whether these memories can still be read out, Amit and Fusi (1994) use a fixed specific value of the signal-to-noise ratio of the subthreshold membrane potential. In contrast, we optimize the neuronal firing threshold θ for a given assembly size M and network size N so as to fulfill a signal-detection criterion based on the suprathreshold activity of the target assembly (Leibold and Kempter 2006).

Scaling Laws of Memory Lifetime

For the 2-state synaptic model, we find the maximal memory lifetime to grow linearly with the total number of synapses in the network. This result seems to contradict the findings by Fusi et al. (2005), who showed memory lifetimes P to grow “logarithmically as a function of the number of synapses used to store the memory.” Both results are, however, consistent, since Fusi et al. (2005) assumed that about as many synapses are involved in any single stored memory as there are synapses in the network, whereas we also allow sparse representations that require only a small number (∝M2≪N2) of synapses to be involved in a memory trace.

Moreover, already Amit and Fusi (1994) have pointed out that the logarithmic dependence of P can be broken up by aptly relating the probability q of having a synaptic state change to parameters like network size, coding ratio, or the number of synaptic states: in the case of sparse coding, q is proportional to the probability that a particular synapse is used in an association. If the storage of a new memory requires potentiation of a random set of cmM2 synapses and there are a total of cmN2 synapses in the network, the probability that the one synapse receives a plasticity signal is (M/N)2 = f2.

Models of Metaplasticity

We consider 2 different models of synaptic metaplasticity. As a starting point, we used the model by Fusi et al. (2005). As a second model of synaptic metaplasticity, we studied a serial topology of state transitions, which turned out to provide longer memory lifetimes as the original model with cross transitions. A possible disadvantage of the serial topology, however, might arise from the optimal number nopt of meta levels to increase faster with network size N and connectivity cm than nopt in the model with cross transitions (Fig. 5). Additional costs for each meta level thus might favor the latter model.

Symmetric Learning Rules and Attractor-Type Memories

The learning rule discussed in this paper is asymmetric in the sense that synapses between cue and target neurons are strengthened, whereas synapses in the reverse direction are weakened. As a result, the memory traces are sequence-type associations (e.g., Willshaw et al. 1969; Nadal 1991; Leibold and Kempter 2006). The model does not consider pattern completion, as the latter requires symmetric synaptic connections. Though not explicitly shown here, the fundamental conflict between storing plasticity history and reducing memory interference also remains for pattern completion. However, because pattern completion is generally discussed in the light of dynamical attractors (e.g., Hopfield 1982; Golomb et al. 1990; Treves and Rolls 1992), stability of firing patterns requires larger assembly sizes and, hence, more distributed representations as compared with sequence-type memories. We thus expect that for attractor-type memories, synaptic meta levels are even more useful for enhancing memory lifetimes as compared with the presently investigated sequence-type associations.

On Disregarding the Overlap

Our results are based on the assumption that synapses that connect neurons belonging to both the cue and the target assembly do not change when storing a new memory. For small coding ratios f = M/N, this effect is negligible because the fraction of neurons in the overlap between cue and target assembly is small (∝f2). For higher coding ratios, this assumption is important, though, and its validity depends on how synapses are changed by triplets and quadruplets of pre- and postsynaptic spikes. Froemke and Dan (2002) suggest that these synapses are likely to be depressed, which would correspond to a decrease of the target assembly. This reduction of the assembly size M effectively decreases the quality of readout, and thus our results overestimate the memory capacity for large f. We, however, did not take this into account, mainly because the effects of spike triplets and quadruplets on synaptic changes are still not completely described and, moreover, the most important part of our results is observed for low coding ratios.

Application to Hippocampal CA3 Network

Sparse coding, though beneficial for high storage capacities, may not always be a feasible mode of operation. Constraints that limit the degree of sparseness may favor complex synaptic cascades if the network is large enough, for example, N≳104 in Figure 5A. As an example, we calculate the memory lifetimes for parameters corresponding to the hippocampal CA3 region of rats (N ≈ 250 000, cm = 0.05, not shown in figures), where assemblies have been estimated to contain few thousands of neurons (Csicsvari et al. 2000). In this system, a sparser code could be prohibited by requiring dynamical stability of the replay of a sequence of activity patterns (Lee and Wilson 2002; Leibold and Kempter 2006). The evaluation of both metaplasticity models in the CA3-like parameter regime yields a maximal lifetime of about Pmax ≈ 7 000 at a number nopt = 2 of synaptic meta levels for the model with complex metaplasticity and a lifetime Pmax ≈ 13 500 at nopt = 3 for the serial topology. We thus conclude that few meta levels could increase memory longevity in the hippocampus. For a 2-state synaptic model (n = 1) to be optimal for a CA3-type regime, the required representation would have to be sparser than reported, that is, assemblies should contain only few hundreds of neurons.

Population Sparseness

The level f of sparseness as referred to in this paper is generally termed population sparseness (Olshausen and Field 2004). This quantity is hard to assess experimentally because it requires to identify and measure a large number of nonactive neurons. Most experiments therefore address the temporal sparseness derived from the firing rate distributions of single neurons (Rolls and Tovee 1995).

Experimental estimates on population sparseness are few. In the hippocampus, one finds f ≈ 10−2 (Csicsvari et al. 2000), whereas the barrel cortex (Brecht and Sakmann 2002) and the visual cortex (Weliky et al. 2003) provide coding ratios with large values of f ≈ 0.5. For networks with such highly distributed representations, our framework predicts that maximization of memory longevity requires a larger number of synaptic meta levels than expected in the hippocampus.

If Connectivity Depends on Network Size

Network size N and morphological connectivity cm were assumed to be independent variables. As a result, the optimal assembly size Mopt is independent of N (eq. 1). However, if one assumes a constant number cm N of synapses per neuron, the connectivity cm decreases with network size like 1/N, and therefore Mopt is proportional to N. In this case, the optimal coding ratio fopt = Mopt/N and the maximal memory lifetime Pmax are independent of N (eq. 2). That is to say, the memory performance of the network for constant cmN is determined by the number of synapses a neuron can support, that is, the “size” of single neurons rather than the network size (Leibold and Kempter 2006).

Contributions of Noise

The analytical results presented here are based on a mean-field approach and an evaluation of signal-to-noise ratios, with an inherent source of noise owing to given random morphological connectivities; see Appendix. We neglect several additional sources of noise that may be present in biology. One might discern between external and internal noise contributions. Possible external sources of noise are fluctuations of the neuronal firing threshold, errors while activating a cue pattern, or variations of assembly sizes. Other noise sources can be considered as internal, such as variations of the synaptic state distributions between different cells of a target assembly. Independent of their nature, these additional noise sources will always increase the variance of the postsynaptic depolarization; see Figure 1A and equation (13) in the Appendix. As a consequence, memory lifetime, on average, will decrease, although some specific associations may even have an enhanced lifetime. For example, if we consider a variability of the assembly sizes M, an association between 2 assemblies that occasionally are larger than average will remain stored longer than it would be in the case of a network with all assemblies having identical size. However, on average, the lifetime would be reduced because associations with fewer synapses are forgotten faster and, moreover, an association between larger-than-average assemblies overwrites more-than-average synapses required by earlier memory traces.

To conclude, our mean-field results provide an upper bound of memory lifetimes. This upper bound is a good approximation if there is little additional noise. Internal noise is small specifically for the case of large network size N and large assembly sizes M.

Alternative Roles for Metaplasticity

Though it may seem elementary that increasing the complexity of synaptic state transitions prolongs memory lifetimes, our results clearly demonstrate that this is not the case in general. For sparsely encoded memories, increasing the level of metaplasticity can even be detrimental. Metaplasticity prolongs memory longevity only if the neuronal encoding is highly distributed. One thus might speculate that besides the prolongation of memory lifetime, metaplasticity could also serve a different functional purpose. For example, metaplasticity may provide a substrate to evaluate memories. More important memory traces can be reflected through meta levels with lower transition probabilities as compared with less important memories. Evaluation may be reward-based, repetition-based (Sajikumar and Frey 2004), or context-based. A functional understanding of the design of synaptic plasticity will hence also be closely related to specific behavioral contexts.

Appendix

We consider a recurrent network of N randomly coupled McCulloch–Pitts neurons (McCulloch and Pitts 1943) in discrete time. The network's morphological connectivity cm is the probability that one neuron is synaptically connected to another. A neuron fires when its postsynaptic potential h crosses a firing threshold θ. The potential h is determined by the synaptic inputs arising from the network activity in the previous time step.

We assume that each synapse can exist in one of 2n discrete states. In the case n = 1, we have 2-state synapses: a synapse is either silent (state 1) and has zero weight, w1 = 0, and therefore no effect on the postsynaptic neuron, or it is activated (state 2) and may increase the postsynaptic depolarization h by weight w2 = 1. In general, the state-specific weights are described by w = (w1, …, w2n)T in which 0 ≤ wν ≤ 1 is the synaptic weight assigned to state ν∈{1,…,2n}.

We define an assembly as a group of M randomly selected neurons. The collective activation of this specific set of neurons is thought to represent some particular external event. The probability that a randomly selected neuron in a network of N cells belongs to a specific assembly of size M is called coding ratio f = M/N. A sparse representation of memories is then reflected by f≪1.

An association is a link between a random pair of externally predefined assemblies such that synchronous firing of the neurons forming a cue assembly activates a sufficiently large portion of the neurons in a target assembly.

Because we consider a recurrent network, fM neurons on average belong to both the cue and the target assembly. These fM neurons are also referred to as overlap between the 2 assemblies. For small f, the overlap is negligible and framework becomes essentially a feed-forward network.

The set of synapses contributing to one particular association are described by the state distribution z=(z1,…,z2n)T∈[0,1]2n (Amit and Fusi 1994), which determines the occupancies of the 2n states and is normalized to (1,…,1)z=∑ν=12nzν=1.

A.1 Dynamics of the State Distribution

During learning, synapses change states. The storage of a new association requires that many of the synapses from cue to target neurons undergo LTP. Synapses in the reverse direction, from target to cue, preferentially experience LTD.

We assume that synapses stay unaltered if they connect neurons that belong to both the cue and the target assembly, that is, synapses that are related to the fM neurons in the overlap (see above). As a result, while learning a new association, the number of synapses that receive an LTP stimulus equals cm[M(1 – f)]2, owing to the M(1 – f) cells in the cue and target assemblies that are not in the overlap. In the same way, the identical number cm[M(1 – f)]2 of synapses from target to cue neurons are exposed to an LTD stimulus. We note that this exclusion of the overlap synapses is not equivalent to considering a feed-forward network with assembly size M(1 – f) because the overlap synapses also contribute to both the mean and the variance of postsynaptic depolarization.

After being exposed to an LTP stimulus, a synapse may change its state from ν to ν′ > ν. For an LTD stimulus, synapses may switch from ν to ν′ < ν. The probabilities that these state changes actually occur are denoted as qν′ν. The coefficients qν′ν constitute the nondiagonal elements of the plasticity matrix Q. Its diagonal elements qνν=−∑ν′≠νqν′ν are constructed such that Q has vanishing column sums, that is, (1,…,1) Q=0, which preserves the normalization of z (see end of Appendix A.1).

The fraction of synapses that connect the disjoint subsets of cells in a given pair of cue and target assembly equals f2(1 – f)2. Storing another association, we thus find the probability f2(1 – f)2qν′ν that an “arbitrary” synapse in state ν changes its state to ν′ ≠ ν. Similarly, the probability of an arbitrary synapse to remain in state ν amounts to 1 + f2(1 – f)2qνν. For a given plasticity matrix Q, online learning of random associations can then be described by a linear iterated map (Amit and Fusi 1994). Storing of one further random association maps the state distribution z(t) of an association at time t to the distribution: 

(5)
graphic
The explicit solution to equation (5) is given by 
(6)
graphic
in which z(0) denotes the state distribution of an association right after learning.

The fixed point forumla of the dynamics equation (5) is defined via forumla. If a learning rule is such that forumla, we also say that this learning rule is “balanced.” Otherwise, if forumla, we call the learning rule unbalanced.

To fully specify the dynamics in equation (6), we have to find an expression for z(0) immediately after imprinting a particular association. The state distribution z(0) can be expressed as a superposition of (1 – f)2 times a potentiated state zLTP, and 1 – (1 – f)2 times the equilibrium state forumla, that is, 

(7)
graphic
with the initial state excess 
(8)
graphic
For f → 0, the initial state z(0) equals zLTP, and in the limit f → 1, we have z(0) = forumla, which means no synaptic state changes occur. Both zLTP and forumla are determined by the specific choice of the synapse model reflected by the plasticity matrix Q. Combining equations (6) and (7), we arrive at an expression for the temporal evolution of the state distribution during online learning, 
(9)
graphic
To check whether the state distribution after learning of t further associations is adequate to still retrieve the respective memory, in Appendix A.4, we introduce a readout criterion based on equation (9).

As already announced, the dynamics of z preserves the normalization (1, …, 1) z = 1 because of the vanishing column sums of Q. This can be shown from equation (5) because 

graphic
for every realization of z(t).

A.2 Mean Membrane Depolarization and Variance

Given a state distribution z, we can calculate the first and second moment of the postsynaptic potential h, which are needed below to calculate the readout quality of the target assembly. If the whole group of cue neurons fires simultaneously, the mean membrane depolarization forumla of a target cell amounts to 

(10)
graphic
Here, the brackets denote an average over the ensemble of all target neurons. From equation (9), we then derive the explicit dynamics of forumla as a function of the number t of subsequently stored memories as 
(11)
graphic
with the mean membrane depolarization of a nontarget cell (equilibrium potential) defined as 
(12)
graphic
The “signal” of the memory trace is then defined as the difference forumla between mean depolarization and equilibrium potential.

In analogy to equation (10), we also find an expression for the variance of the membrane potential h, 

(13)
graphic
For binary synapses (wν ∈ {0, 1}), this expression simplifies to forumla

In what follows, we will assume the assembly size M to be large, that is, only groups of M ≫ 1 simultaneously active neurons are considered to carry meaningful information. As a result, the distribution of h can be approximated to be Gaussian and, hence, the first 2 moments are sufficient in the sense of this approximation.

A.3 Example: 2-State Synapses

The case of 2-state synapses (n = 1) allows an explicit illustration of the dynamics of the state distribution z = (z1, z2)T. There, the fraction of silent synapses is denoted by z1 and the fraction of activated synapses supporting an association is z2 = 1 – z1. For a weight vector w = (0, 1)T, we find the mean membrane depolarization to be forumla = cmMz2 and the variance to equal forumla.

Assuming that a new association activates all possible cue-to-target synapses and depresses all possible target-to-cue connections, we have a plasticity matrix forumla and a potentiated state zLTP = (0, 1)T. Then, the equilibrium state equals forumla and the state excess is given by forumla. Together with equation (7), we find an initial state distribution: 

graphic
From Qδz = −2δz, we then derive the dynamics of the activated synapses, 
(14)
graphic
The fraction z2 decays with a time constant of τ = 1/|ln[1 – 2f2(1 – f)2]| The fraction z2 decays with a time constant τ = 1/ln[1−2f2(1−f)2] toward an equilibrium value of forumla see Figure 1. For low coding ratios f, we shall use the approximation τ = 1/(2f2).

A.4 Readout Criterion

The number P of associations that can be stored in a network is determined via signal-detection theory as described in Leibold and Kempter (2006). There, an activity pattern is said to be read out if the fraction p1 of correctly activated neurons (hits) exceeds the fraction p0 of incorrectly activated neurons (false alarms) by some constant detection threshold γ > 0. The maximum lifetime P of a memory is then derived from the condition γ = p1(P) – p0.

Assuming Gaussian statistics, the fraction of false alarms equals 

graphic
in which 
(15)
graphic
is the difference between the neuronal firing threshold θ and the equilibrium potential forumla normalized by the equilibrium standard deviation; see equation (13). Similarly, the fraction 
graphic
of hits is determined by another parameter 
(16)
graphic
that measures the normalized distance between mean membrane depolarization forumla(t) at the target neurons and firing threshold θ. Hence, the detection criterion amounts to 
(17)
graphic
Equation (17) provides an implicit relation between threshold θ and the time t an association remains in memory. Thus (for a subset of thresholds), we can numerically determine t as a function F of the firing threshold θ. One then defines the maximum number P of storable associations via 
(18)
graphic
This number P of maximally stored associations is also termed the lifetime or longevity of a memory trace, which implies the idea that the network is persistently exposed to plasticity signals that drive the storage of new associations.

All results are derived with a readout quality of γ = 0.7. It has been shown in Leibold and Kempter (2006) that different values for γ do not change the scaling laws of memory capacity.

A.5 Limit of Low Coding Ratios

For low coding ratios f ≪ 1, we can find explicit formulas to approximate the memory lifetime P. We therefore expand equation (11) to the lowest order in f2. Defining L to be the lowest exponent tL, at which w · Qtδz ≠ 0, this expansion yields 

(19)
graphic
Moreover, for binary synapses (wν ∈ {0, 1}), we can combine equations (15) and (16) through elimination of θ and solve the resulting quadratic equation for 〈h〉(P)–forumla. Together with equations (10) and (13), we find that the memory signal is given by 
(20)
graphic
with 
graphic
We note that, owing to equations (15) and (16), the threshold K for the signal-to-noise ratio reflects both the neuronal firing threshold θ and the readout quality γ. In what follows, we assume K to optimized with respect to θ.

We now use Stirling's formula to approximate the binomial coefficient in equation (19) for PL, forumla. Then, together with equation (20) and f → 0, we obtain the following expression for the maximum number P of stored associations 

(21)
graphic
If we assume the coding ratio f to be optimized such that forumla we find 
(22)
graphic
that is consistent with the initial assumption f ≪ 1 for a large number cmN of synapses per neuron. The longevity of the memory trace then amounts to 
(23)
graphic

A.5.1 Two-State Synapses

In the case of 2-state synapses (n = 1), we have w · δz = forumla and w · Qδz = −1 with L = 1. From equation (22), we then find an optimal coding ratio 

(24)
graphic
in which equation (20) defines the constant forumla.

The optimal assembly size for 2-state synapses derived in the Results is equivalent to equation (24) in the sense of the linear approximation ePf2 = 1–Pf2 of the exponential function used in equation (19). More specifically, the prefactor forumla approximates the factor 2e1/2 in equation (1).

From equation (24), we derive the maximal number of associations as 

(25)
graphic

A.5.2 Synapses with Complex Metaplasticity

The states in the metaplasticity model by Fusi et al. (2005) are enumerated such that ν = 1 is the most depressed state (bottom left in Fig. 2A). The other states are counted clockwise such that ν = 2n denotes the most potentiated state (bottom right in Fig. 2A). Then, the plasticity matrix for n = 4 reads 

(26)
graphic
We note that our definition is slightly different from the original one (Fusi et al. 2005) in that we assume the least transition probabilities (e.g., forumla for n = 4) to be smaller by a factor forumla This is done in order to break the degeneracy of the models for n = 1 and n = 2, which are equivalent in the original formulation. As a drawback, our mathematical expressions are slightly more complicated, for example, the steady state in the original model is exactly equally distributed, whereas in our formulation it is not (see below). This slight alteration, however, does not substantially change the behavior of the model because the main results of Fusi et al. (2005) can be reproduced. In our model, the steady-state distribution reads 
graphic
and the initial excess is given by 
(27)
graphic
Because it turns out that L = 1, the linearization in equation (23) requires expressions for w · δz and w · Qδz. With the above formulas and the weight vector w = (0, …, 0, 1, …, 1), we obtain 
(28)
graphic
 
(29)
graphic
Combining equations (28) and (29) with equation (23) reveals the maximum lifetime to scale like 
(30)
graphic

A.5.3 Synapses with Serial State Transitions

Another model in which the state transitions connect the synaptic states one after the other is depicted in Figure 3A. We again enumerate the states clockwise such that ν = 1 corresponds to the most depressed state (bottom left) and ν = 2n corresponds to the most potentiated state w2n = 1 (bottom right). Then, the plasticity matrix (for n = 3) reads 

(31)
graphic
Furthermore, we obtain the steady state 
graphic
and the initial excess 
(32)
graphic
The weight vector w = (0, …, 0, 1, …, 1)T defines the scalar products 
(33)
graphic
in which the minimal nonvanishing exponent L = n can be obtained from property of vanishing column sums 0 = (1, …, 1)QL. From that and equation (23), one finds the scaling law 
(34)
graphic
as announced in equation (4).

We thank Stefano Fusi for discussions and comments on the manuscript and Walter Senn for discussions. We are also indebted to Tim Gollisch and Roland Schaette for valuable suggestions and careful reading. This research was supported by the Deutsche Forschungsgemeinschaft (Emmy Noether Programm: Ke 788/1-3, SFB 618) and the Bundesministerium für Bildung und Forschung (Bernstein Center for Computational Neuroscience Berlin, 01GQ0410). Conflict of Interest: None declared.

References

Abraham
WC
Bear
MF
Metaplasticity: the plasticity of synaptic plasticity
Trends Neurosci
 , 
1996
, vol. 
19
 (pg. 
126
-
130
)
Amit
DJ
Fusi
S
Learning in neural networks with material synapses
Neural Comput
 , 
1994
, vol. 
6
 (pg. 
957
-
982
)
Brecht
M
Sakmann
B
Dynamic representation of whisker deflection by synaptic potentials in spiny stellate and pyramidal cells in the barrels and septa of layer 4 rat somatosensory cortex
J Physiol
 , 
2002
, vol. 
543
 (pg. 
49
-
70
)
Brunel
N
Hakim
V
Isope
P
Nadal
J-P
Barbour
B
Optimal information storage and the distribution of synaptic weights: perceptron versus Purkinje cell
Neuron
 , 
2004
, vol. 
43
 (pg. 
745
-
757
)
Csicsvari
J
Hirase
H
Mamiya
A
Buzsaki
G
Ensemble patterns of hippocampal CA3-CA1 neurons during sharp-wave associated population events
Neuron
 , 
2000
, vol. 
28
 (pg. 
585
-
594
)
Froemke
RC
Dan
Y
Spike-timing-dependent plasticity induced by natural spike trains
Nature
 , 
2002
, vol. 
416
 (pg. 
433
-
438
)
Fusi
S
Drew
PJ
Abbott
LF
Cascade models of synaptically stored memories
Neuron
 , 
2005
, vol. 
45
 (pg. 
599
-
611
)
Fusi
S
Abbott
LF
Limits on the memory storage capacity of bounded synapses
Nat Neurosci
 , 
2007
, vol. 
10
 (pg. 
485
-
493
)
Gerstner
W
Kempter
R
van Hemmen
JL
Wagner
H
A neuronal learning rule for sub-millisecond temporal coding
Nature
 , 
1996
, vol. 
383
 (pg. 
76
-
78
)
Golomb
D
Rubin
N
Sompolinsky
H
Willshaw model: associative memory with sparse coding and low firing rates
Phys Rev A
 , 
1990
, vol. 
41
 (pg. 
1843
-
1854
)
Hebb
DO
The organization of behavior
 , 
1949
New York (NY)
Wiley
Hopfield
JJ
Neural networks and physical systems with emergent collective computational abilities
Proc Natl Acad Sci USA
 , 
1982
, vol. 
79
 (pg. 
2554
-
2558
)
Isaac
JTR
Nicoll
RA
Malenka
RC
Evidence for silent synapses: implications for the expression of LTP
Neuron
 , 
1995
, vol. 
15
 (pg. 
427
-
434
)
Kempter
R
Gerstner
W
van Hemmen
JL
Hebbian learning and spiking neurons
Phys Rev E
 , 
1999
, vol. 
59
 (pg. 
4498
-
4514
)
Kullmann
DM
Silent synapses: what are they telling us about long-term potentiation?
Philos Trans R Soc Lond B Biol Sci
 , 
2003
, vol. 
358
 (pg. 
727
-
733
)
Lee
AK
Wilson
MA
Memory of sequential experience in the hippocampus during slow wave sleep
Neuron
 , 
2002
, vol. 
36
 (pg. 
1183
-
1194
)
Leibold
C
Kempter
R
Memory capacity for sequences in a recurrent network with biological constraints
Neural Comput
 , 
2006
, vol. 
18
 (pg. 
904
-
941
)
Liao
D
Hessler
NA
Malinow
R
Activation of postsynaptically silent synapses during pairing-induced LTP in CA1 region of hippocampal slice
Nature
 , 
1995
, vol. 
375
 (pg. 
400
-
404
)
Linsker
R
From basic network principles to neural architecture: emergence of orientation columns
Proc Natl Acad Sci USA
 , 
1986
, vol. 
83
 (pg. 
8779
-
8783
)
Lüscher
C
Nicoll
RA
Malenka
RC
Muller
D
Synaptic plasticity and dynamic modulation of the postsynaptic membrane
Nat Neurosci
 , 
2000
, vol. 
3
 (pg. 
545
-
550
)
Lüscher
C
Frerking
M
Restless AMPA receptors: implications for synaptic transmission and plasticity
Trends Neurosci
 , 
2001
, vol. 
24
 (pg. 
665
-
670
)
McCulloch
WS
Pitts
W
A logical calculus of the ideas immanent in nervous activity
Bull Math Biol
 , 
1943
, vol. 
5
 (pg. 
115
-
133
)
Montgomery
JM
Pavlidis
P
Madison
DV
Pair recordings reveal all-silent synaptic connections and the postsynaptic expression of long-term potentiation
Neuron
 , 
2001
, vol. 
29
 (pg. 
691
-
701
)
Montgomery
JM
Madison
DV
State-dependent heterogeneity in synaptic depression between pyramidal cell pairs
Neuron
 , 
2002
, vol. 
33
 (pg. 
765
-
777
)
Montgomery
JM
Madison
DV
Discrete synaptic states define a major mechanism of synapse plasticity
Trends Neurosci
 , 
2004
, vol. 
27
 (pg. 
744
-
750
)
Nadal
J-P
Associative memory: on the (puzzling) sparse coding limit
J Phys A Math Gen
 , 
1991
, vol. 
24
 (pg. 
1093
-
1101
)
O'Connor
DH
Wittenberg
GM
Wang
SS-H
Graded bidirectional synaptic plasticity is composed of switch-like unitary events
Proc Natl Acad Sci USA
 , 
2005
, vol. 
102
 (pg. 
9679
-
9684
)
Olshausen
BA
Field
DJ
Sparse coding of sensory inputs
Curr Opin Neurobiol
 , 
2004
, vol. 
14
 (pg. 
481
-
487
)
Petersen
CCH
Malenka
RC
Nicoll
RA
Hopfield
JJ
All-or-none potentiation at CA3-CA1 synapses
Proc Natl Acad Sci USA
 , 
1998
, vol. 
95
 (pg. 
4732
-
4737
)
Rolls
ET
Tovee
MJ
Sparseness of the neuronal representation of stimuli in the primate temporal visual cortex
J Neurophysiol
 , 
1995
, vol. 
73
 (pg. 
713
-
726
)
Sajikumar
S
Frey
JU
Late-associativity, synaptic tagging, and the role of dopamine during LTP and LTD
Neurobiol Learn Mem
 , 
2004
, vol. 
82
 (pg. 
12
-
25
)
Song
S
Miller
KD
Abbott
LF
Competitive Hebbian learning through spike-timing-dependent synaptic plasticity
Nat Neurosci
 , 
2000
, vol. 
3
 (pg. 
919
-
926
)
Treves
A
Rolls
ET
Computational constraints suggest the need for two distinct input systems to the hippocampal CA3 network
Hippocampus
 , 
1992
, vol. 
2
 (pg. 
189
-
200
)
Tsodyks
MV
Feigel'man
MV
The enhanced storage capacity in neural networks with low activity level
Europhys Lett
 , 
1988
, vol. 
6
 (pg. 
101
-
105
)
Weliky
M
Fiser
J
Hunt
RH
Wagner
DN
Coding of natural scenes in primary visual cortex
Neuron
 , 
2003
, vol. 
37
 (pg. 
703
-
718
)
Willshaw
DJ
Buneman
OP
Longuet-Higgins
HC
Non-holographic associative memory
Nature
 , 
1969
, vol. 
222
 (pg. 
960
-
962
)