Working memory supports the recognition of objects in the environment. Memory models have postulated that recognition relies on 2 processes: assessing the degree of similarity between an external stimulus and memory representations and testing the resulting summed-similarity value against a critical level for recognition. Here, we varied the similarity between samples held in working memory and a probe to investigate these 2 processes with magnetoencephalography. Two separable components matched our expectations: First, from 280 ms after probe onset, clearly nonmatching probes differed from both similar nonmatches and matches over left frontal cortex. At 350–400 ms, these signals evolved into a pattern of gradually increasing activation as a function of sample-probe similarity, as expected for a neural representation of summed similarity. Second, a signal potentially reflecting criterion testing was observed at 600–700 ms at right frontotemporal sensors that differentiated between matches and nonmatches without further differences between similar and dissimilar probes. Thus, analysis of the time course of recognition provided strong evidence that similarity summation and criterion testing have separable neural bases. As probably both working and long-term memory recognition draw on these processes, they may be involved in many domains of behavior.
Identifying a sought-after item in the environment draws upon working memory mechanisms for recognition. For example at an airport, you may have to remember the terminal and the gate number in order to recognize and follow them on the direction signs. In experimental settings, recognition is studied typically by asking participants to retain a few sample items in memory and to decide after a short delay whether a single presented probe replicates one of the items held in memory. Obviously, this comprises a mechanism for matching external stimuli to the contents of working memory to assess the degree of accordance between them. The matching mechanism has been described as a process of mental scanning (Sternberg 1966) and may proceed in a similar fashion as visual search (Hyun et al. 2009; Kuo et al. 2009).
But how is the degree of accordance between the probe and the contents of working memory actually computed? An influential class of models from behavioral research assumes that the accordance is computed as the sum of similarities of each memory item to the probe (“summed-similarity models”; see, e.g., Nosofsky 1988; Kahana and Sekuler 2002; Sekuler and Kahana 2007). In consequence, increased similarity of a probe to any of the samples, the so-called sample-probe similarity, reliably provokes a higher number of “yes” responses. The scope of these models is not limited to visual information but also applies to verbal and auditory information (Nosofsky and Zaki 2003; Visscher et al. 2007).
The similarity-based mechanism is unlikely to be specific to short-term memory. For example, behavioral studies have suggested that the same retrieval mechanism that underlies “familiarity” in long-term memory also subserves short-term memory (McElree 2006). In both cases, the probe can be thought to act as a cue that activates matching memory representations, may they be maintained in working memory or lingering in long-term memory. In line with the notion of a single familiarity/similarity-based mechanism recent functional magnetic resonance imaging (fMRI) results have suggested a large degree of overlap between working memory recognition and long-term memory (Nee and Jonides 2008; Bledowski et al. 2009; Oztekin et al. 2009).
Similarity/familiarity operates on the item level, and recognition from working memory most often is conceived as an itemwise process. In contrast, dual-process models of long-term memory assume that recognition may also rely on a qualitatively different retrieval process named “recollection” (Rugg and Curran 2007). Recollection is thought to recover an item including the specific episodic context in which it was encountered, such as when you do not only remember that a person is familiar but also where and when you made their acquaintance. As with familiarity, recollection can be expected to play a role in working memory when the retrieval of an item's episodic context is required (cf. McElree 2006). In contrast, the process of summing similarity seems ubiquitously involved in working and long-term memory recognition devoid of context.
Even when an item seems highly familiar or yields high summed-similarity values, this does not strictly determine that a person will respond “yes, I recognize the item.” Rather, summed-similarity models assume that whether the degree of similarity is high enough to elicit recognition depends on a second component. This “criterion testing” consists of the comparison of the summed similarity against a criterion level of evidence for responding. Thus, a “yes, I recognize the item” response will occur only if summed similarity exceeds a critical level. Individuals differ in their tendency to apply conservative or liberal criteria, and this is commonly indexed by measures of “bias” in decision making. Crucially, similarity summation and criterion testing are regarded as independent processes. Thus, subjects may apply different criterion levels of summed similarity for responding “yes” independent of their ability to retain and match information in memory.
Despite the general interest in visual working memory, the neural correlates of similarity summation and criterion testing—that theoretically form the basis of item recognition—have remained unidentified. For example, Agam et al. (2009) have varied sample-probe similarity but were interested in the relative timing of brain regions in recognition only and therefore did not include matching probes in their analysis. In consequence, potential neural reflections of summed similarity and criterion testing were not examined.
Here, we sought to specify the neural implementation of similarity summation and criterion testing in a delayed match-to-sample task, assessing the time course of recognition with magnetoencephalography (MEG). Specifically, we asked subjects to encode, retain, and recognize color stimuli. Sample-probe similarity was varied by presenting a probe stimulus that was either dissimilar to all samples, similar to one sample, or actually matched one. In order to minimize the potential behavioral impact of similarity between the memory samples (Kahana and Sekuler 2002), on every trial, all colors that had to be retained were dissimilar to each other (for details, see Materials and Methods).
In search of a neural basis of similarity summation and criterion testing, we aimed at 2 dissociable neural signals: similarity summation should be reflected by a gradual variation of physiological response amplitudes ordered by sample-probe similarity. The strongest responses were thus expected for matching probes, followed by similar probes, and the lowest response strengths for dissimilar probes. In contrast, the outcome of criterion testing is a classification of trials into matches (correct “yes” responses) and nonmatches (correct “no” responses). Thus, matches can be expected to yield a neural signal that discriminates them from nonmatches, crucially without further differentiation between similar and dissimilar probes. Concerning the time course of recognition, a neural reflection of criterion testing should not precede a potential substrate of similarity summation in time. Rather, criterion testing may follow similarity summation, potentially with some temporal overlap.
In addition to sample-probe similarity, we also varied independently the number of samples to be retained. This “memory load”, like sample-probe similarity, affects task difficulty, with longer response times and lower accuracy under increased load. By including it, we aimed at isolating MEG signals that relate to task difficulty and can then be separated from similarity summation and criterion testing.
Materials and Methods
Twenty-one healthy volunteers gave written informed consent to participate in the MEG measurements. All subjects had normal or corrected-to-normal vision. Participants were screened for deficits in color vision with the colors used in the main experiment, presented at the same size and in the same ambient light conditions. None of them showed difficulties in perceptually discriminating these colors. Also, none of the subjects reported any history of neurological or psychiatric disorders. The data of 4 subjects were excluded from the analysis due to artifactual recordings in 1 case and low memory performance in 3 cases. Seventeen right-handed subjects remained (11 females, 6 males; age: 26.6 ± 3.9 years, mean ± standard deviation). The study was approved by the local ethics committee.
Stimuli and Design
A delayed match-to-sample task was implemented using colored squares as stimuli (Fig. 1a). Fifteen colors were used, which varied along the whole range of hues while avoiding large differences in luminance. More specifically, a “circle” of colors was selected so that each color would have exactly 2 neighboring colors. Pilot testing included ratings of color similarity to ensure that directly neighboring colors were perceptually similar but readily discriminable by the participants, whereas nonneighboring colors (that were separated by one or more intermediate colors) were perceptually dissimilar. Coordinates of the used colors in CIE Yxy color space were measured with a Konica Minolta CS 100A chromameter (Supplementary Table 1).
While each individual color could be easily assigned to a global category such as red, yellow, or green, among others, for each of these global labels, the stimulus set contained multiple colors. Also, many trials required discrimination between matching and similar nonmatching colors that were hard to discriminate verbally. We therefore assume that colors were not retained by verbal coding. In line with this, participants reported during the practice session before the main experiments that initially they had tried to name the colors and store them verbally, but they had given up this strategy quickly, as it did not work. Also speaking against the use of verbal labels, we observed a large load effect in accuracy, even though only a maximum of 3 items had to be stored, which is within the capacity limit of verbal working memory.
In the encoding phase of our visual working memory task, the sample array subtended 4.2° × 4.2° of visual angle and consisted of 4 gray squares (1.5° × 1.5°) with a thick white frame centered at 45°, 135°, 225°, and 315° around a fixation point, presented on a gray background. On each trial, either 1 (memory load 1) or 3 (memory load 3) squares of the sample array were filled with color(s) selected from the set of 15 colors. The selection of the color(s) was pseudorandomized, such that colors presented in a given trial had not been presented in the preceding trial. In addition, colors selected for the memory load 3 condition were separated by at least 3 intermediate colors on the color circle, so that each encoded color was dissimilar to every other to-be-remembered color, and a probe placed between 2 arbitrary colors from the sample set could be dissimilar to all samples. After the delay, a probe display was presented, consisting of a single white-framed colored square that subtended 1.5° × 1.5° and was displayed at the center of the screen. During the retention phase and the intertrial interval (ITI), a central white fixation square was presented. Each trial consisted of a sample array (700 ms), a retention interval (1700 ms), a probe item (700 ms), and a varying ITI (between 2500 and 3500 ms in steps of 500 ms).
In summary, participants had to encode and maintain either 1 (memory load 1) or 3 (memory load 3) colors of the sample array for a short retention interval of 1700 ms. In the recognition phase, the color probe either matched one of the learned colors, was similar to exactly one of them, or was dissimilar to all encoded color(s). The subject had to judge whether the probe matched the memorized item(s) and respond by lifting the right index finger for matching and the right middle finger for nonmatching (similar and dissimilar) color probes to trigger a light barrier. This corresponded to a 2 × 3 full factorial design with the independent repeated-measures factors memory load (memory load 1 vs. memory load 3) and sample-probe similarity (match vs. similar vs. dissimilar).
The experiment consisted of 5 runs. Each run lasted about 11.6 min and included 117 or 118 trials (overall 588 trials). Since nonmatch responses were required for both similar and dissimilar trials, an equal probability for each condition would have required twice as many nonmatch than match responses, which could have caused a response bias. Also the number of trials with a correct response would have varied across conditions. On the basis of a behavioral pilot study with 6 subjects, we set the number of trial presentations for memory load 1 to 108, 78 and 66 and for memory load 3 to 162, 96 and 78 for match, similar, and dissimilar, respectively, in order to yield comparable numbers of correct match and nonmatch trials at each memory load level.
Data Recording and Analysis
Data Recording and Preprocessing
MEG was recorded using a whole-head system (Omega 2005, CTF-MEG, VSM MedTech Inc., Coquitlam, Canada) comprising 275 magnetometers with an average distance between sensors of about 2.2 cm. The signals were recorded continuously at a sampling rate of 600 Hz with a low-pass filter at 150 Hz, in a synthetic third-order axial gradiometer configuration. Signals of 7 defunct channels were discarded (MLO21, MLP12, MLT41, MRC14, MRC25, MRP56, and MRT21), which were widespread across the whole head. Each subject's head position was determined with localization coils fixed at the nasion and the preauricular points at the beginning and at the end of each recording to ensure that head movements did not exceed 5 mm. Subjects were placed such that the inion and vertex of their head touched the sensor helmet. The head localization software of the MEG vendor (Acquisition 5.4) was then used for fine-tuning to ensure comparable head orientation across subjects. Subsequently, the head position was fixed by means of foam cushions. As a result of this procedure, all head positions lay in a range between −0.91 and 0.92, −1.06 and 0.59, and between −1.59 and 0.72 cm with respect to the mean position in the x, y, and z position, respectively. In comparison, the sensor spacing on the helmet surface was 2.2 cm. Hence, position differences were less than 1 sensor spacing. All head orientations were within 8.1° with respect to the mean head direction.
For preprocessing, MEG data were filtered with a 0.1 Hz high-pass filter and cut into epochs of 5400ms duration (including a 500ms presample baseline, a 2400ms sample-probe interval, and a 2500ms post-probe interval). Following manual rejection of epochs containing nonstereotyped artifacts (e.g., muscle activity, swallowing, and temporary sensor noise), concatenated single-trial data were submitted to extended infomax independent component analysis (Bell and Sejnowski 1995) with estimation of 80 components per subject and functional run using the EEGLAB 5.03 toolbox (www.sccn.ucsd.edu/eeglab/) in order to correct for eye blinks. Independent components reflecting eye blinks were identified visually on the basis of their time course and topography and discarded by back projecting all but these components to the data space.
Event-Related Field Analyses
Event-related fields (ERFs) were calculated separately for each subject and experimental condition (match, similar, and dissimilar, for memory load 1 and memory load 3, respectively), including correct trials only. For the analysis of recognition-related activity, MEG data were epoched from −100 to 1000 ms around the probe onset, baseline corrected with the first 100 ms serving as a baseline, and finally averaged across trials.
ERF data were statistically tested by repeated-measures analyses of variance (ANOVAs) with the factors memory load (memory load 1 vs. memory load 3) and sample-probe similarity (match vs. similar vs. dissimilar), separately for each sensor and sample. The resulting set of P values was corrected for multiple comparisons (both across sensors and time points) using the false discovery rate (FDR) with a critical P value of 0.05. In addition of the effects that survived FDR correction, only those that lasted for a minimum duration of 25 ms within a continuous time window of 30 ms were accepted. The criterion of 25 of 30 ms was chosen a priori to prevent an otherwise lasting effect that would be discarded because it was interrupted by one or few interspersed samples where it did not attain significance.
In order to evaluate the spatial and temporal separability of the identified components (to preview our results, C1–C3), to identify the specific ordering of signal amplitudes across conditions and to relate component activation to behavioral performance, the all-sensors-all-time-points ANOVAs analyses described above were supplemented by follow-up analyses focusing on sensor-groups-of-interest (SOI) and time-windows-of-interest (TOI). Herein, for each component, significant sensors and time points (as indicated by the ANOVAs described above) defined the component's SOI and TOI and then were averaged to form a measure of the component's activation. Component-specific information is given in the Results.
Recognition accuracy and reaction times (RTs) are depicted in Figure 1b. For statistical inference, separate 2 (memory load) × 3 (sample-probe similarity) repeated-measures ANOVAs were performed with accuracy and RTs as dependent variables. For the factor memory load, we observed a decrease in accuracy (F1,16 = 129.1, P < 0.001) and an increase in RT (F1,16 = 78.4, P < 0.001) from memory load 1 (accuracy: 88.6% ± 1.1; RT: 869.6 ms ± 89.9; mean ± standard error of the mean [SEM]) to memory load 3 (accuracy: 79.6% ± 1.0; RT: 994.4 ms ± 88.8; mean ± SEM). There was also a significant main effect of similarity for both accuracy (F2,32 = 56.2, P < 0.001) and RT (F2,32 = 24.7, P < 0.001). Pairwise comparisons showed significant decreases in accuracy from dissimilar (97.6% ± 0.8) to match (86.7% ± 2.0) and to similar probes (68.0% ± 2.5), all |t| > 5.0, P < 0.001. Responses to dissimilar (883.3 ms ± 93.4) and match (888.9 ms ± 84.6) probes were faster than to similar (1023.8 ms ± 91.8) probes |t| > 5.4, P < 0.001).
In addition, we observed significant memory load × sample-probe similarity interactions for both performance measures (accuracy: F2,32 = 7.4, P = 0.015; RT: F2,32 = 17.5, P = 0.001). Post hoc contrasts (P < 0.05, Bonferroni corrected) revealed that accuracy dropped more sharply for match than for dissimilar probes when 3 instead of 1 color had to be remembered (match 94.9% ± 1.0 and 78.5% ± 3.3 vs. dissimilar 98.2% ± 0.9 and 96.9% ± 1.0, for memory load 1 and memory load 3, respectively). Match trials showed a stronger RT increase than similar and dissimilar trials with increasing memory load (match 794.0 ms ± 82.9 and 983.7 ms ± 87.4 vs. similar 975.7 ms ± 93.2 and 1071.9 ms ± 91.7 and dissimilar 839.0 ms ± 95.7 and 927.5 ms ± 91.4, for memory load 1 and memory load 3, respectively) as evident from corresponding post hoc contrasts (all P < 0.001, Bonferroni corrected).
In summary, in line with previous studies on memory load and sample-probe similarity (for reviews, see, e.g., Sekuler and Kahana 2007 and Fukuda et al. 2010, respectively), our behavioral results confirmed that both higher memory load and increased sample-probe similarity prolonged RT and led to increased error rates. We also found that both manipulations interacted: higher memory load reduced recognition performance for match trials, whereas dissimilar probes were almost perfectly rejected regardless of memory load.
To evaluate at which sensors and time points sample-probe similarity and memory load affected the neural activity at recognition, we analyzed the magnetic responses in the first second following probe presentation. Figure 2a–d depicts field distribution maps (averaged within 100 ms time windows) for the grand average ERF and the corresponding statistical maps thresholded at P < 0.05 (FDR corrected) for main effects and the interaction of sample-probe similarity and memory load. (In the field distribution maps, red areas indicate that the magnetic flux is leaving the head [an efflux] and blue areas indicate that the magnetic flux is entering the head [an influx]. Please note that, unlike ERP voltage maps, MEG maps show opposite polarity effects over the left and right hemispheres due to the physics of the magnetic signal.) Follow-up analyses (reported below) were conducted on SOIs/TOIs to identify the specific pattern of results that contributed to the observed main effects.
The effect of sample-probe similarity was present between 278 and 875 ms (P < 0.05, FDR corrected). Based on the sign of the observed effects and their specific temporal and spatial patterns, 3 components could be distinguished. A left frontotemporal component “C1” started at about 280 ms and lasted until 440ms after probe onset. Upon inspection, magnetic activity was strongest for dissimilar probes (Fig. 2e, first row). Responses to match and similar probes showed comparable activities until about 350 ms but then tended to diverge with stronger responses for match than for similar than for dissimilar probes. A comparable activity pattern was also present between 320- and 420-ms after probe onset at right central sensors. At 420 ms, a temporoparietal component “C2” was observed that persisted until 640-ms after probe onset (Fig. 2e, second row). The temporoparietal component was mainly present at right temporoparietal sensors but there was also a focus over the left hemisphere that reached significance for a short period only, between about 500 and 530 ms. Similar probes elicited the strongest amplitudes at parietotemporal sensors. Over more inferior sensors, the magnetic field polarity was inverted, so that dissimilar probes elicited the strongest amplitudes. Finally, between about 600 and 880 ms a right frontotemporal component “C3” reflected a significant increase in magnetic activity for match trials as compared with both nonmatch types (Fig. 2e, third row).
Based on the overall main effect of similarity, SOIs and TOIs were defined to extract an aggregated measure of each component's activation. This value was then used to evaluate the specific pattern of results between conditions in follow-up analyses. First, we selected sensors that showed a significant sample-probe similarity effect (P < 0.001, uncorrected) with a minimum duration of 30 time points (corresponding to 50 ms) around 400 ms for the SOI set reflecting component C1 (C1-SOI), and with a minimum duration of 60 time points (100 ms) around 500 ms and 700 ms for SOIs reflecting the components C2 and C3, respectively (C2-SOI, C3-SOI). The longer duration requirement for component C2 and C3 aimed at extracting activity that reflected their larger temporal extent. In comparison to our overall results, here, we slightly raised the criteria (P value and minimum duration) in order to include only the most reliable signals in the SOI/TOI analyses. The resulting SOIs comprised 6, 11, and 6 neighboring sensors for the components C1-SOI, C2-SOI, and C3-SOI, respectively (see Fig. 2e). Next, we defined 3 TOIs (C1-TOI, C2-TOI, and C3-TOI) by selecting consecutive data points that showed significant sample-probe similarity effects (P < 0.001, uncorrected) at all sensors within the SOI that reflected the corresponding component. The resulting C1-TOI, C2-TOI, and C3-TOI were between 345 and 411, 455 and 565, and 618 and 735 ms, respectively. Thereafter, for each component (C1–C3), we extracted the mean magnetic activity at its corresponding SOI and TOI.
Follow-up analyses were then conducted as a series of t-tests to test for differences between conditions (match vs. similar, match vs. dissimilar, and similar vs. dissimilar). In order to additionally evaluate whether the ordering of conditions observed on the group-level would actually be present consistently across subjects, subjects were categorized by their nominal pattern of activation ordering (e.g., for the comparison of matches and similar probes: match > similar vs. similar > match), and the observed frequencies of these 2 categories were tested by χ2 statistics.
For component C1, follow-up t-tests fully confirmed the graded pattern of activity observed in the ERF time courses: matches elicited greater activation than similar (t16 = 2.927, P = 0.009) and dissimilar probes (t16 = 4.853, P < 0.001). Additionally, similar probes showed more activation than dissimilar probes (t16 = 3.156, P = 0.006). The pattern of stronger activation for matches compared with dissimilar nonmatches was evident in 15 of 17 subjects (χ12 = 9.941, P = 0.002), and similar probes elicited stronger C1 activation than dissimilar probes in 13/17 subjects (χ12 = 4.765, P = 0.029). These effects were thus highly consistent within the present group of subjects. Also in 12/17 subjects, match activation was higher than for similar nonmatches (χ12 = 2.882, P = 0.089) and thus tended to differ statistically.
The time courses of component C1 suggested a biphasic response, with similar probes differing from matches at late but not early time points. To statistically quantify this observation, we split the total component duration from 280 to 410 ms after probe onset into 2 halves of equal duration and computed the difference between matches and similar probes for each of them. Comparing these difference signals with a t-test revealed that matches and similar probes differed more strongly in the late phase of C1 (t16 = −3.56, P = 0.002). This pattern was evident in 15/17 subjects (χ12 = 9.941, P = 0.002). In contrast, no such differentiation over time was observed for matches as opposed to dissimilar nonmatches (t16 = −1.219, P = 0.240), with only 11/17 subjects showing a nominally larger difference in late C1 (χ12 = 1.471, P = 0.225). Thus, while dissimilar probes differed from matches to an equal amount throughout the course of C1, the differentiation between matches and similar nonmatches evolved over time.
Upon visual inspection, the activation of C2 was ordered inversely by the difficulty of task conditions (dissimilar > match > similar). Post hoc t-tests revealed that all conditions differed significantly from each other (match vs. similar: t16 = 2.950, P = 0.009; match vs. dissimilar: t16 = −3.867, P = 0.001; and similar vs. dissimilar: t16 = −5.213, P < 0.001). Also, these effects were consistent across subjects: dissimilar probes elicited a stronger activation in all 17 subjects (χ12 = 17, P < 0.001) and exceeded match activation in 14/17 subjects (χ12 = 7.118, P = 0.008). Also, C2 activation was enhanced for matches compared with similar probes in 14/17 subjects (χ12 = 7.118, P = 0.008).
Descriptively, C3 showed a pattern of activation that differed between matches and both types of nonmatches, without further discrimination between the latter. Post hoc t-tests again confirmed our expectations: matches yielded stronger activation than both similar (t16 = 4.947, P < 0.001) and dissimilar (t16 = 4.638, P < 0.001) probes. As expected, similar and dissimilar probes did not differ (t16 = −0.721, P = 0.481). These results were highly consistent within the subject group, with 15 and 16 participants showing stronger activation for matches compared with similar (χ12 = 13.235, P < 0.001) and dissimilar (χ12 = 9.941, P = 0.002) probes, respectively. Differences between similar and dissimilar probes were inconsistent: only about half of the subjects (8/17) showed stronger activation for similar than dissimilar probes (χ12 = 0.059, P = 0.808).
Hence, the analysis of sample-probe similarity–related MEG signals confirmed separable neuronal bases for different subprocesses of recognition in working memory that were highly consistent within the subject group. Specifically, the activity of component C1 indicated a monotonic increase with greater perceptual similarity between the probe and the memory samples, matching expectations for a similarity summation process. In contrast, the criterion-testing process that is assumed to discriminate matches from both types of nonmatches (similar and dissimilar) may be reflected by component C3. In addition, a further component C2 was temporally located between C1 and C3. Here, neural activity was ordered by the matching difficulty (dissimilar > match > similar) and thus corresponded to overall behavioral accuracy.
For the factor memory load, we observed a significant (P < 0.05 FDR corrected) increase in ERF activity from memory load 1 to load 3 between about 350 and 450 ms at right parietooccipital sensors (“C4”) (Fig. 2e, fourth row). In contrast to the similarity effect, where the FDR-corrected threshold of P < 0.05 corresponded to an uncorrected value of P < 0.0016, for memory load, it reflected an uncorrected value of P < 0.000067. Using a more liberal threshold of P < 0.001, we observed that the memory load–related modulation of the ERFs started earlier at about 210 ms was observed at parietooccipital sensors and ended at about 500 ms. In addition, stronger activity for memory load 3 than for memory load 1 was also present at left temporal (290–370 ms), central (370–470 ms), and left frontal and temporal sensors (440–500 and 740–870 ms). Thus, at a corresponding uncorrected threshold, effects of memory load and similarity were both evident to a similar degree. The FDR results should hence not be taken to state that sample-probe similarity played a more dominant role in recognition than memory load.
There was no significant memory load × similarity interaction for any time point at any sensor.
Spatial and Temporal Separability of the Components
So far our results have shown that the sample-probe similarity effect changed its order over time, from similarity-based ordering (match > similar > dissimilar, C1) via difficulty-related sorting (dissimilar > match > similar, C2) to the categorical difference between matches and nonmatches (match > similar = dissimilar, C3). In addition, we also observed that these different response patterns reached significance at different sensors and time intervals, for example, component C1 (280–440 ms) at left frontotemporal sensors and component C3 (600–880 ms) at right frontotemporal sensors. However, visual inspection of activity does not yet allow for conclusions concerning spatial and temporal selectivity.
We therefore tested directly the spatial and temporal separability of the identified components. In these analyses, we used a leave-one-subject-out procedure to ensure that the statistical tests were independent of the selection of SOIs and TOIs. Here, the SOIs and TOIs were defined for each subject using the other subjects' data. That is, the data of all but one subject were analyzed using ANOVAs comprising all sensors and time points to identify significant effects and to define the corresponding SOIs and TOIs. Finally, we extracted the left-out subject's data from these SOIs and TOIs. This process was repeated in a round robin fashion until all subjects' data were extracted. To ensure that across the 17 analyses, the individual data were aggregated from broadly comparable spatial and temporal positions, we prespecified 3 SOIs (all left anterior sensors, SOI1; all right posterior sensors, SOI2; and all right anterior sensors, SOI3), within each of which only significant sensors were selected and their average activation extracted for 3 TOIs (300–450 ms, TOI1; 450–600 ms, TOI2; and 600–750 ms, TOI3).
The resulting data of the independently defined SOI/TOIs were then entered into a 3 (SOIs) × 3 (TOIs) × 2 (memory load) × 3 (sample-probe similarity) repeated-measures ANOVA (including Greenhouse–Geisser correction for nonsphericity when indicated). The assumed temporal and spatial selectivity of component activity should be reflected in a significant third-order interaction term of SOI, TOI, and sample-probe similarity. This is exactly what we found (F3.54,56.69 = 10.82, P < 0.001): the similarity-related activation pattern changed dependent on the SOI and TOI under consideration, revealing both spatial and temporal selectivity of the components C1–C3. A complete overview of all results can be found in Supplementary Table 2 and Supplementary Figure 1.
Correlations with Behavior
Correlations between brain activity and behavior may reveal activations that display direct behavioral relevance. We therefore sought to assess whether individual component activity would be predictive of behavioral performance.
We used the nonparametric discrimination index A′ (Grier 1971) as a measure of the individual capability to differentiate match from similar trials. A′ is a sensitivity score calculated from the observed probabilities for “hits” and “false alarms,” that ranges from 0.5 (discrimination performance at chance level) to 1 (perfect discrimination) (Grier 1971). The A′ index was then correlated (Spearman's rho, ρ) with the difference between the ERF amplitudes to match minus similar probes for each component.
We found a significant brain–behavior association for component C3 only (ρ = 0.527, P = 0.032); all other correlations were far from significance (C1: ρ = −0.245, P = 0.342; C2: ρ = 0.127, P = 0.625). Thus, a strong signal difference between match and similar trials over right frontal sensors related to good discrimination performance.
Apart from discrimination as such, participants may use a liberal versus conservative criterion level of summed similarity, producing a larger versus smaller number of “yes/match” responses. These interindividual variations of criterion level are captured by indices of “bias” toward match or nonmatch responses. Here, we used the nonparametric index B″ (Grier 1971), that ranges from −1 to 1, in our case with lower values reflecting a predominance of “yes/match” responses. Correlating brain activation to the relative bias, we observed an association between C3 and behavioral choice (C3: ρ = −0.706, P = 0.002). This means that participants who responded “yes/match” more frequently displayed a stronger difference signal between match and similar trials. Again, no other correlation reached significance (C1: ρ = −0.017, P = 0.951; C2: ρ = −0.311, P = 0.223).
Finally, we also related the individual component's activity to the observed RT. In order to take into account differences in individual RTs, for each participant, the RT differences between matches and similar nonmatches were normalized by their mean. Like accuracy, RT was solely associated with component C3 (ρ = −0.777, P < 0.001). While there was a trend for component C2 (ρ = −0.490, P = 0.048; not significant after Bonferroni correction), the correlation for the component C1 was far from significance (C1: ρ = 0.032, P = 0.910). Thus, a strong signal difference between match and similar trials at right frontal sensors related not only to good discrimination performance but also to faster RTs to matches compared with similar nonmatches.
At the model level, similarity summation and criterion testing are conceived of as independent processes. In consequence, interindividually, their potential neural representations should not be correlated. Exploratory correlations between components C1 and C3 conformed with this assumption: they were rather small (r = −0.297, P = 0.246). We acknowledge, however, that this analysis—aiming at the absence of effects—should be interpreted with caution.
Component C2 showed an ordering of physiological response strengths by the ease of recognition (dissimilar > match > similar). This activity did not seem to directly relate to the summation of similarity or the test against a criterion. To explore whether it may still play a role in memory matching or alternatively rather reflects task demands different from memory search and choice, we computed correlations between C2 and the other components. Results revealed that component C2 activation strongly correlated with the activity of C3 occurring later in time (r = 0.627, P = 0.007) but was not associated with C1 (r = −0.348, P = 0.170).
Encoding and Retention-Related ERFs
Finally, as most working memory studies have examined activity during encoding and retention, we evaluated ERF activity with respect to these task phases. The results are presented in Supplementary Figure 2.
Working memory provides a mental workspace for cognitive operation and for recognition of objects in the environment. While its capacity and the mechanisms underlying the maintenance of information have been fruitfully studied with neuroimaging, the neural basis of working memory recognition has received only little investigation. From a cognitive perspective, recognition has been proposed to rely on 2 separate processes: extracting the degree to which the samples and the probe match (similarity summation) and the evaluation of the resultant similarity with respect to a critical level that serves as a criterion for deciding on “match” or “nonmatch” (criterion testing). Here, we systematically varied sample-probe similarity to identify the neural correlates of these processes in the time course of recognition. Our study revealed 2 separable ERF components (C1 and C3) with specific temporospatial profiles that were associated with similarity summation and criterion testing. In addition, we also observed a third temporospatially distinct component (C2) that reflected the difficulty of matching in the process of recognition. Simultaneous variation of memory load did not affect these 3 components significantly but led to a fourth component that showed a stronger activation under increased memory load. Our results thus provide evidence for dissociable neural bases reflecting the different hypothesized subprocesses of item recognition in working memory.
Summed similarity and in consequence, the probability of a “yes/match” response increases with greater similarity between the probe and the memory samples (Kahana and Sekuler 2002; Zhou et al. 2004). As predicted by the model, we observed a left frontotemporal component C1 (280–440 ms) of the MEG signal that was systematically related to sample-probe similarity. From about 280 ms after the probe onset, the brain differentiated between probes that were perceptually close (similar and match) and perceptually distant (dissimilar) to the sample(s). Only later, at about 350 ms, the ERF started to diverge between matches and similar nonmatches too and then fully reflected the expected gradual variation of neural activity. Thus, matches yielded the strongest activation, even though similar probes were more difficult to classify, which is strong evidence that it was indeed the degree of matching that evoked these signals and not the difficulty or duration of the process of computing and summing similarity. Assuming that this component represents a neural implementation of summed similarity, notably, our data suggest that summation does not proceed in a uniform manner but rather follows a temporal gradient: neural signals of a clear nonmatch (matches vs. dissimilar probes) are seen first, with neural signals related to the distinction between similar nonmatches and matches appearing only later. We suggest that similarity summation follows a coarse-to-fine pattern of progression, as has been proposed for visual information processing (Hegde 2008; Goffaux et al. 2011; Peyrin et al. 2010). As differentiation between match and similar probes may require a more detailed perceptual analysis, it should appear only after an initial phase of coarse analysis, which in turn could be sufficient to separate both from dissimilar probes. The exact content of coarse-to-fine processing may differ with respect to the specific materials used, for example, spatial frequencies and contrast among others or in our case color categories: whereas dissimilar probes always entailed a color category that was different from all samples, both matches and similar probes were “matches” when viewed in terms of coarse color categories. For example, when a probe was presented that ought to be similar to a bluish memory sample, that similar probe would have differed in hue but still would have matched the coarse color category “blue,” whereas a dissimilar probe most likely fell into another category. Whether differing color categories would invoke other cognitive processes than when using other strongly differing stimuli that would not be assigned to different categories remains an open question. Potentially, in either case, the same matching process could be used, but its progression would depend on the evidence gathered from the probes and may be modulated by categorical information.
Alternatively, the 2-phasic pattern may represent the operation of 2 distinct processes, for example, an attentional and a mnemonic subcomponent. Assuming that items are maintained in the focus of attention, both matches and similar probes may be attentionally gated with high priority, producing the relatively early differentiation of matches and similar as opposed to dissimilar nonmatches. Discriminating similar probes from matches may in contrast require exactness in matching, perhaps based on mnemonic search, that may run slower or be activated only after attentional processes (see also Hyun et al. 2009). Thus, on the level of cognitive processes, the observed 2-phasic pattern could be due to coarse-to-fine processing by a single matching process but also by several processes operating in parallel or in series. Our data indicate that notably similarity summation does not proceed uniformly but shows a temporal gradient, yet, they do not allow for a clear separation of these 2 alternative explanations.
Later in time, critically, we found a categorization-specific modulation of activity in a different right frontal–temporal component (C3) of the MEG signal. This component was sensitive to the distinction between match and nonmatch trials, regardless of whether the probes for the latter were similar or dissimilar to the sample(s), and thus complies with expectations for neural activity involved in “criterion testing.” Significant differences between match and nonmatch trials were observed from 600 ms to about 900 ms after probe onset over right frontal sensors. In line with the presumed function of criterion testing for classification of probes as match or nonmatch, the amount of differentiation between match and similar trials on the neural level significantly predicted behavioral discrimination performance. This association was exclusive to the criterion-testing component C3 but not evident at earlier components (C1 and C2). Criterion testing includes the evaluation of summed similarity against a criterion level. Behaviorally, the criterion is expressed in a bias toward responding “yes/match” or “no/nonmatch” for liberal versus conservative criterion levels, respectively, that is independent of discrimination performance. Correlation analysis of individual bias toward responding “yes/match” as computed from match and similar trials revealed a significant association with the differentiation between match and similar trials in this component. Thus, applying a conservative criterion for classifying a probe as matching was accompanied by a relatively smaller difference between match and similar and lenient criteria by larger differences. These associations further support the assumption that criterion-related operation is reflected in this component of neural activity.
The offset of the similarity summation C1 component at about 440 ms clearly preceded the onset of the criterion-testing component C3 at about 600 ms. Even though the divergence of matches and nonmatches may have started considerably earlier than indexed by the statistical significance of trial-averaged signals, these results may indicate that similarity summation and criterion testing are largely segregated in time, lending further support for the assumption that the 2 components, C1 and C3, represent 2 different processes. Hence, our data strongly support the existence of similarity summation and criterion testing as 2 independent processes of recognition on the neural level.
Yet, these processes are likely not limited to the recognition of items held in working memory. Indeed similarity summation has been conceived as the basis of “familiarity” of items in episodic memory (McElree 2006). In long-term memory research, familiarity/similarity and recollection are thought to be reflected by 2 dissociable ERP components, a frontal familiarity component at around 300–500 ms termed FN400 and a parietal late positive component (LPC) from about 400 to 800 ms reflecting recollection (Rugg and Curran 2007). The long-term memory study perhaps best matching our paradigm came from Curran (2000). He used a recognition task that required discrimination between previously studied words, similar words that changed plurality between study and test (i.e., a plural “-s” was appended), and new words. He showed that the FN400 varied with the familiarity of words (new > studied = similar), while the LPC was associated with the recollection of plurality (studied > similar = new). Despite differences in stimulus material and the overall paradigm, in our study, the similarity summation component C1 was identified in a time range and location that is quite compatible with the frontal component observed in episodic memory. Remarkable confirmation of both the timing and the localization of familiarity-based processing in working memory to lateral prefrontal cortex came from a study using transcranial magnetic stimulation (TMS): Feredoes and Postle (2010) perturbed activation in inferior frontal cortex at 200 and 500 ms after probe onset in a typical delayed match-to-sample task. TMS displayed an effect on performance only when applied early (200 ms), and this effect was highly selective for negative probes that were familiar (as they had been samples on the previous trial) but now irrelevant.
Together, different strands of evidence from behavioral, electro-, and magnetoencephalographic TMS and fMRI studies point to a role for left lateral prefrontal cortex in familiarity processing in both episodic and working memory. However, to the best of our knowledge, direct evidence for this claim remains to be established. Also, without a formal source localization of the MEG components observed in this study, their putative correspondence to the brain regions observed in fMRI studies (see also below) should be treated with caution.
In contrast to the close correspondence between our results and the familiarity-related FN400, we did not observe a parietal component in the designated time range of “recollection” (except for the component C2; see below). Instead, we found categorization-related activity at right prefrontal sensors, that is best conceived as criterion testing, and thus may represent an evaluation of the results of memory retrieval/matching rather than a mnemonic operation itself. Following the assumption implicit in many long-term memory studies that there exists a correspondence between the cognitive process of “recollection” and parietal activity, these results may suggest that recollection need not be engaged in working memory recognition. Rather, recollection may come into play in more specific circumstances only, for example, when subjects have to disambiguate a presented probe's match status on the basis of episodic/contextual information. For example, LPC-compatible activation in working memory was observed by Danker et al. (2008). In their paradigm, samples were encoded serially, which may have produced higher levels of interitem interference (i.e., masking and efforts to counteract masking), and made each sample attributable to a specific temporal event/episode, which may have led to a stronger involvement of recollection-related processes.
In addition, we have also observed a right lateral parietal component C2 that was temporally situated between the similarity summation component C1 and the criterion-testing component C3. This component showed the strongest signal for dissimilar probes, an intermediate amplitude for matches, and the weakest response to similar probes. This pattern suggests a close relationship to the ease or difficulty of memory matching, as dissimilar probes constituted the easiest, and similar probes the most difficult trials. However, we did not observe any relationship to behavioral discrimination performance or criterion setting as indexed by response bias. Nevertheless, C2 did not seem to relate to global workload either but rather to the efforts imposed by sample-probe similarity. This suggestion is based on 2 observations: First, memory load that is known to increase the workload of a widely distributed frontoparietal set of brain regions (Postle 2006; Linden 2007) did not exert any noticeable influence on component C2 (see also Fig. 2e, second row). Second, analysis of correlations between the ERF components and performance revealed that activity of this component was substantially related to the expression of the criterion-testing component C3 occurring later in the time course of recognition. Specifically, a high discrimination between matches and similar probes in C2 was associated with a strong signal difference between match and similar trials in the criterion-testing component C3. In consequence, we speculate that this activity reflects the ease or difficulty of matching rather than a completely process-unspecific difficulty signal.
This interpretation is in line with recent studies on perceptual decision making showing that difficulty-related neuronal activity can be reflected by signals that are separable from discrimination-related activities (Grinband et al. 2006; Philiastides et al. 2006). For example, Philiastides et al. (2006) identified 2 ERP components (“early” and “late”) reflecting the accuracy of a perceptual discrimination. Similar to our study, a third component was temporally located in between the 2 discrimination components and reflected difficulty.
During the recognition phase, we also found increased MEG signals when a probe was compared with 3 versus 1 retained sample. At a liberal statistical threshold (see Materials and Methods), load-dependent activity was observed between about 200 and 600 ms after probe onset over frontal, temporal, and particularly parietooccipital sensors. However, significance was attained only for a very confined time range and number of sensors. The timing and spatial distribution of component C4 overlapped partly with the sample-probe similarity–related putatively difficulty-associated component C2 (see their F -value maps in Fig. 2e). Whereas there is some evidence that C2 relates to the difficulty of matching, C4 may rather reflect global attentional task demands due to increased memory load. In support of this assumption, local maxima of C4 were observed over posterior parietal areas known to relate to the number of items held in working memory (Todd and Marois 2004; Xu and Chun 2005; Mitchell and Cusack 2007). In line with our MEG findings, other studies using electroencephalography have reported recognition-related activity changes with memory load that lasted from 300 to 600 ms and were observed at parietal and frontal sites (Wijers et al. 1989; Strayer and Kramer 1990; Talsma et al. 2001; Bledowski et al. 2006; Morgan et al. 2008). Yet, the exact cognitive processes responsible for the recognition-related memory load effect remain unspecified. There are several potential origins for these effects like 1) differences in the neuronal representation of the sample stimuli under high load due to influences at encoding and/or during retention and 2) processes at recognition, for example, diminished deployment of attentional resources to the probe, reduced or slowed probe encoding, or demands on minimizing interference between samples and probe. The present study was not designed to distinguish between these potential sources, and further investigation is warranted to elucidate their underlying cognitive and neural origins. Nevertheless, the independent variation of memory load and sample-probe similarity served to rule out that the effect of memory load might be explained by an uncontrolled influence of sample-probe similarity. Given a set of stimuli that vary along a limited number of dimensions, for example, faces or colors, the more samples are randomly selected from this set, the higher the probability that a randomly drawn probe will be similar to one of the samples. Thus, with random sampling, load effects at recognition are confounded by sample-probe similarity. We eliminated this effect by ensuring that all samples were dissimilar to each other and that the probe was similar to maximally one of the samples, independent of load.
In conclusion, the present study provides neurophysiological evidence for the assumption that similarity summation and criterion testing are implemented on the neural level and represent distinct subprocesses of working memory recognition. These processes were proposed to be shared between both working and episodic memory and may partly overlap with perceptual decision making and hence may be ubiquitously involved in many domains of behavior.
This study was supported by the Frankfurt Medical Faculty intramural young investigator program to C.B.
Conflict of Interest : None declared.