The capacity of visual working memory for faces is extremely limited, but the reasons for these limitations remain unknown. We employed event-related brain potential measures to demonstrate that individual faces have to be focally attended in order to be maintained in working memory, and that attention is allocated to only a single face at a time. When 2 faces have to be memorized simultaneously in a face identity-matching task, the focus of spatial attention during encoding predicts which of these faces can be successfully maintained in working memory and matched to a subsequent test face. We also show that memory representations of attended faces are maintained in a position-dependent fashion. These findings demonstrate that the limited capacity of face memory is directly linked to capacity limits of spatial attention during the encoding and maintenance of individual face representations. We suggest that the capacity and distribution of selective spatial attention is a dynamic resource that constrains the capacity and fidelity of working memory for faces.
Our ability to maintain individual faces in working memory is surprisingly limited. While 3 or 4 simple objects such as colored squares can be simultaneously held in memory (Luck and Vogel 1997), working memory capacity is lower for more complex objects (Alvarez and Cavanagh 2004; Eng et al. 2005), and in particular for faces. When observers have to memorize a set of briefly presented individual faces, only a single face can be successfully maintained on most trials (Eng et al. 2005; Curby and Gauthier 2007). What is responsible for these extreme capacity limitations of visual face memory? In the current study, we demonstrate that the difficulty of simultaneously maintaining multiple faces in working memory is directly linked to the limited capacity of selective attention.
Current models of working memory postulate that visual objects are stored in the same posterior cortical areas that are also involved in the visual processing of these objects [the “sensory recruitment” hypothesis; see Postle (2006), D'Esposito (2007), Harrison and Tong (2009), and Sreenivasan et al. (2014)]. Attentional mechanisms are critically involved not only in the initial selection of visual objects for encoding into working memory, but also in their subsequent short-term storage. This selective retention of an object in sensory-perceptual areas that are recruited for working memory is assumed to be mediated by the allocation and maintenance of focal spatial attention [Awh et al. 2000, 2006; see also Chun et al. (2011)]. Because working memory depends on attention, the capacity limitations of visual face memory could be caused by an attentional bottleneck. The successful encoding and retention of an individual face representation in working memory may require a single undivided focus of attention that can only be allocated to one particular face at a time. When 2 or more faces have to be encoded and maintained simultaneously, they will compete for focal attentional processing, and only the winner of this competition can be successfully retained for subsequent recall.
While the attentional competition during encoding account for the limited capacity of visual face memory is in line with emerging ideas about sensory recruitment and attentional selectivity in working memory, there is so far no direct evidence that the focus of spatial attention during the encoding and short-term retention of individual faces determines whether a particular face can be successfully maintained. In fact, Awh et al. (2007) have argued that the performance deficits which are usually interpreted as evidence of the limited memory capacity for complex objects such as faces do not arise during encoding and maintenance, but instead at a later stage where working memory representations are compared with test items. These authors demonstrated that when sample and test stimuli were perceptually dissimilar, thereby minimizing the probability of comparison errors, working memory capacity estimates were equivalent for simple features and complex objects. According to Awh et al. (2007), the number of objects that can be simultaneously represented in working memory is independent of their complexity, but the resolution of these representations decreases with increasing memory load. As a result, sample-test comparison errors are more frequent for more complex objects.
The aim of the present study was to find out whether the limited capacity of visual working memory for individual faces is caused by attentional limitations during their encoding and retention, or is generated at a later memory comparison stage. We recorded event-related potentials while participants performed a face identity-matching task, where they had to report whether a face in a memory display (S1) was repeated in a subsequent test display (S2). Participants had to press one response button when a face repetition was detected, and a different button when the S2 face was not present in the S1 display. Memory displays contained 2 objects on opposite sides, and test displays always contained a single face at fixation (Fig. 1). In the Load One condition, memory displays showed a task-relevant face and an irrelevant distractor object (a house). On identity repetition trials, the S1 face was repeated in the test display. On identity change trials, a different face appeared as S2. In the critical Load Two condition, the memory display contained 2 different faces. Both of these faces had to be memorized, because either of them was equally likely to reappear as S2 on identity repetition trials. To minimize the time demands of face memory maintenance, the interval between the memory and test displays was very brief (200 ms).
Participants were expected to detect the presence or absence of a face repetition on almost all trials in the Load One condition. In this condition, attention can be immediately allocated to the single face in the memory display, which will then be selectively encoded and maintained, and successfully compared with the test face. If maintaining an individual face in working memory requires the full allocation of focal attention to this face, identity-matching performance should be strongly impaired in the Load Two condition. In this condition, the 2 faces in the memory displays will compete for attentional processing, resulting in the allocation of focal attention to only one of them. This attended face will be encoded into working memory and can be successfully detected if it reappears at test. In contrast, the repetition of the other (unattended) face in the memory display is likely to go undetected, which should result in an overall poor identity-matching performance in the Load Two condition.
To track the allocation of spatial attention and the spatially selective activation of visual face memory in the interval after a memory display has been presented, we measured the N2pc component and the contralateral delay activity (CDA) in response to these displays in the Load One and Two conditions. The N2pc is an enhanced negativity that is elicited around 200 ms after stimulus onset at posterior electrodes contralateral to task-relevant visual objects, and reflects their spatial selection in ventral visual cortex (e.g., Luck and Hillyard 1994; Eimer 1996; Hopf et al. 2000). The CDA is a sustained posterior negativity that emerges approximately 300 ms after the presentation of a memory display over extrastriate visual cortex contralateral to the side where memorized items have been presented (Vogel and Machizawa 2004). The CDA is sensitive to the number of memorized objects and to individual differences in working memory capacity (e.g., Anderson et al. 2011), suggesting that this component reflects the recruitment of visual–perceptual brain areas for the short-term storage of visual objects (Vogel and Machizawa 2004) that is mediated by focal spatial attention (LaRocque et al. 2013).
For the face/house memory displays in the Load One condition, the allocation of spatial attention to the face and its subsequent encoding into working memory should be reflected by distinct contralateral N2pc and CDA components. Critically, we employed the same 2 components to track the focus of spatial attention and the spatially selective activation of visual face memory in response to the face/face memory displays in the Load Two condition. If the focus of attention determines which of these 2 faces will be encoded and retained in working memory, the polarity of N2pc and CDA components should predict the success or failure of the face matching process on specific Load Two identity repetition trials. On trials where attention is allocated to the “wrong” (i.e., non-repeated) face in the memory display, this face will be encoded and retained in working memory, at the expense of the other (repeated) face. Therefore, N2pc and CDA components should emerge contralateral to the non-repeated face in the memory display on trials where observers fail to report a face identity repetition. In contrast, when attention is directed to the “correct” face (i.e., the face that later reappears as S2) in the memory display, as reflected by N2pc and CDA components over the hemisphere contralateral to this face, a face repetition should be correctly reported. Variations in the degree to which attention is selectively allocated to the repeated face in Load Two memory displays may affect the degree to which a corresponding working memory representation is selectively activated, and thus the efficiency of the subsequent face identity-matching process on individual trials. To test this hypothesis, we compared N2pc and CDA components when a face repetition was correctly reported between trials with fast or slow response times (RTs). If focal attention determines the activation level of visual face memory representations, these components should be larger on trials with fast correct identity repetition responses. In contrast, if 2 faces can be simultaneously represented in working memory, and if capacity limitations of visual face memory only arise during the later sample-test comparison process (Awh et al. 2007), there should be no lateralized N2pc and CDA components in response to Load Two memory displays, as both face representations will be activated concurrently.
In addition to tracking the allocation of attention during the encoding and retention of memory displays, we also studied how the spatially selective activation of visual face memory affects the subsequent processing of centrally presented test faces. If working memory representations of individual faces depend on focal attention, these representations should be position-dependent, because attention operates in a space-based fashion. The selective allocation of attention to one face in the memory display will activate a corresponding visual face representation in the contralateral hemisphere (as reflected by the polarity of CDA components to these displays). To investigate how the perceptual and identity-related processing of test faces was affected by the represented location of an attended face in the preceding memory display, we measured N170 and N250r components to test faces. The face-sensitive N170 component is generated during the perceptual structural encoding of faces (e.g., Eimer 2011; Rossion and Jacques 2011). When 2 faces appear in rapid succession, N170 amplitudes to the second face are reduced (e.g., Jacques and Rossion 2004, 2006; Eimer et al. 2010), because both faces activate overlapping neural populations. Importantly, such N170 adaptation effects are position-dependent (Kovács et al. 2005). If working memory representations of attended faces are stored in the contralateral hemisphere, spatially specific N170 adaptation effects should be found to test faces in the present study, in spite of the fact that these faces always appeared at fixation. More specifically, N170 amplitudes should be attenuated over the hemisphere contralateral to the attended (i.e., memorized) face, reflecting a selective reduction of the sensory response to a test face in this hemisphere.
The activation of position-dependent working memory representations of attended memory display faces in the contralateral hemisphere may also affect the face identity-matching process itself. To test this hypothesis, we measured N250r components on trials where an identity repetition was successfully detected. The N250r component is an enhanced posterior negativity elicited by repetitions of same individual face relative to face identity changes (e.g., Schweinberger et al. 2002, 2004), and reflects the match between a perceived face and a stored working memory representation of the same face. If this identity-matching process is sensitive to the represented location of an attended memorized face, N250r components to test faces should be larger over the hemisphere contralateral to the side where the repeated face appeared in the preceding memory display.
Materials and Methods
Sixteen paid volunteers (6 females, mean age 30.8 years and 1 left-handed) were tested. All had normal or corrected-to normal vision, and gave written and verbal informed consent prior to testing.
Stimuli and Procedure
The stimulus set consisted of 10 unfamiliar Caucasian male faces and 10 images of houses. Faces were obtained from the PUT Face Database (Kasinski et al. 2008), and house images were selected from Google Images. All images were converted to grayscale, and were edited using Adobe Photoshop to homogenize overall luminance, and (for faces) skin tone and hair. Distinguishing characteristics (e.g., piercings or blemishes) were removed from the face images. All stimuli were presented on a CRT monitor against a dark gray background (0.4 cd/m2) at a viewing distance of 100 cm. They occupied a visual angle of 5.8° × 8°, and their average luminance was 21 cd/m2.
Stimulus presentation, timing, and response recording were controlled by the Cogent 2000 toolbox (http://www.vislab.ucl.ac.uk/cogent.php) for MATLAB (Mathworks). On each trial, a bilateral stimulus display (S1) was followed in rapid succession by a second display (S2) that contained a single face image at fixation (Fig. 1). In the Load One condition, the S1 display contained one face image and one house image that were presented simultaneously for 200 ms to the left and right of fixation at an eccentricity of 4° (measured relative to the center of each image). Face and house images were presented with equal probability and unpredictably in the left visual field (LVF) and right visual field (RVF), or vice versa. Each S1 display was followed after a 200-ms interstimulus interval by an S2 display (200 ms duration) that contained a single face image at fixation. The intertrial interval between the offset of S2 and the onset of S1 on the next trial was 1500 ms. The Load Two condition was identical to Load One, except that S1 displays contained images of 2 different faces on opposite sides.
In the Load One condition, participants' task was to decide whether the face that was presented in the S1 display together with a house was repeated as S2. Ten successive blocks were run, with 40 trials per block. On 20 trials, the S1 face was repeated as S2 (identity repetition trials). On the other 20 trials, the S1 and S2 faces showed 2 different individuals (identity change trials). There were 10 trials for each of the 4 possible combinations of S1 face location (LVF and RVF) and trial type (identity repetition and identity change). In the Load Two condition, participants' task was to decide whether the S2 face matched 1 of the 2 faces that appeared in the preceding S1 display. They were explicitly instructed to attend to both faces in the memory display, because either of them was equally likely to reappear as S2. Ten successive blocks with 30 trials per block were run. On 10 trials, the S2 face matched neither of the 2 S1 faces (identity change trials). On 10 trials, the S2 face matched the S1 face that was presented in the LVF, and on another 10 trials, it matched the S1 face in the RVF (LVF and RVF identity repetition trials, respectively).
Participants were instructed to maintain central fixation throughout each trial, and to press a response button with the index finger of one hand when they detected an identity repetition, and another button with the middle finger of the same hand when there was no identity repetition. Response hand was counterbalanced across participants, as was the order of the 2 load conditions.
EEG Recording and Data Analysis
EEG was DC-recorded with a BrainAmps DC amplifier (upper cut-off frequency 40 Hz, 500 Hz sampling rate) and Ag–AgCI electrodes mounted on an elastic cap from 25 scalp sites (Fpz, F7, F3, Fz, F4, F8, FC5, FC6, T7, C3, Cz, C4, T8, CP5, CP6, P7, P9, P3, Pz, P4, P8, PO7, PO8, P10, and Oz, according to the extended international 10–20 system). Bipolar horizontal electrooculogram (HEOG) was recorded from the outer canthi of both eyes. An electrode placed on the left earlobe served as reference for online recording, and EEG was re-referenced offline to the common average of all scalp electrodes. Electrode impedances were kept below 5 kΩ. No additional offline filters were applied. To obtain event-related brain potentials (ERPs) to memory displays (S1), EEG was segmented offline from 100 ms before to 500 ms after S1 onset, relative to a 100-ms pre-stimulus baseline. ERPs in response to test face displays (S2) were computed on the basis of EEG epochs obtained between 50 ms before and 500 ms after stimulus onset, relative to a 100-ms baseline from 50 ms before to 50 ms after S2 onset. Epochs with activity exceeding ±30 μV in the HEOG channel (reflecting horizontal eye movements) or ±60 μV at Fpz (indicating eye blinks or vertical eye movements) were excluded from all analyses, as were epochs with voltages exceeding ±80 μV at any other electrode.
Following artifact rejection, EEG waveforms were averaged separately for memory and test displays (S1 and S2). In the Load One condition, ERPs were obtained for trials with correct responses only. For S1 displays, separate ERPs were computed for trials where the face appeared in the LVF or RVF. ERPs to S2 displays were computed separately for the 4 combinations of trial type (identity repetition and identity change) and S1 face location (LVF and RVF). For the Load Two condition, ERPs on trials with correct responses were computed separately for trials with fast and slow reaction times, based on RT median splits performed for each individual participant. Mean RTs for trials with fast versus slow responses (averaged across all participants) were 467 versus 658 ms (identity repetition trials), and 530 versus 702 ms (identity change trials), respectively. ERPs were also obtained for identity repetition trials with incorrect responses (i.e., trials where a face identity repetition was missed). ERPs to S1 and S2 displays in the Load Two condition were computed for each combination of 3 types of identity repetition trials (repetition detected—fast RT and repetition detected—slow RT and repetition undetected) and the location of repeated face in the memory display (LVF and RVF). For S2 displays in the Load Two condition, ERPs were also computed for identity change trials with fast and slow correct responses, separately for trials where the S1 face appeared in the LVF or RVF. ERP mean amplitudes were measured at 3 lateral posterior electrode sites over the left hemisphere (P7, PO7, and P9), and at the corresponding electrodes over the right hemisphere (P8, PO8, and P10), and were averaged across these 3 electrode locations on either side. For ERPs to S1 displays, mean amplitudes were analyzed with repeated-measures analyses of variance (ANOVAs) for 3 post-stimulus time intervals, which correspond to the latencies of the N170 (130–190 ms), N2pc (190–290 ms), and CDA (300–500 ms) components. For ERPs to S2 display, ANOVAs were conducted for ERP mean amplitudes obtained within post-stimulus time windows centered on the N170 (130–190 ms) and N250r (250–320 ms) components. Additional ANOVAs were conducted for RTs and error rates. For both performance and ERP measures, paired t-tests were employed for specific comparisons between experimental conditions.
RTs on trials with correct responses were faster in Load One relative to Load Two (482 vs. 573 ms), and faster on identity repetition when compared with identity change trials (501 vs. 554 ms). An ANOVA conducted on RTs for the factors load (One vs. Two) and trial type (identity repetition vs. change) revealed main effects of load, F1,15 = 105.8, P < 0.001, and trial type, F1,15 = 13.7, P < 0.002, There was no interaction between these 2 factors, F < 1. RTs on identity repetition trials did not differ between trials where the repeated face appeared on the left or right visual side in the S1 display, and this was the case both for Load One and Load Two, both t < 1. Errors were more frequent with Load Two relative to Load One (22.1% vs. 3.4%), resulting in a main effect of load, F1,15 = 336.4, P < 0.001, in the ANOVA conducted for error rates. There was also an interaction between load and trial type, F1,15 = 31.9, P < 0.001, For Load One, error rates did not differ between identity repetition and identity change trials (3.1% vs. 3.7%; t < 1). In Load Two, failures to report a face repetition were more frequent than incorrect reports of a repetition on identity change trials [30.4% vs. 13.8%; t(15) = 4.71, P < 0.001]. The position of a repeated face on the left or right side of S1 displays did not affect the probability that this repetition was missed in either Load One or Load Two, both t < 1. Face working memory capacity in Load Two, as determined by Cowan's formula (memory capacity K = [hit rate + correct rejection rate − 1] × memory set size; Cowan 2001), yielded a value for K of 1.12, demonstrating that only 1 of the 2 faces in the memory display was successfully maintained on most Load Two trials.
Lateralized ERP Components to Bilateral Memory Displays
Figure 2 shows ERPs triggered at lateral posterior electrodes in response to memory displays in the Load One and Load Two conditions. For bilateral face/house S1 memory displays in Load One, ERPs are shown for electrodes contralateral to the face and electrodes contralateral to the house in the memory displays, collapsed across identity repetition and identity change trials. As can be seen in the difference wave generated by subtracting these 2 ERP waveforms (Fig. 2, top right panel), the face-sensitive N170 component was larger over the hemisphere contralateral to the face, and was followed by a contralateral N2pc and a CDA. An ANOVA was conducted for N170 mean amplitudes (measured during a 130- to 190-ms post-stimulus window) for the factors laterality (electrodes contralateral vs. ipsilateral to the side of the face in the memory display) and hemisphere (electrodes over the left vs. right hemisphere). This analysis showed that the N170 was enhanced at electrodes contralateral to the side of the face relative to electrodes contralateral to the house, as reflected by a main effect of laterality, F1,15 = 38.52, P < 0.001, This confirms previous observations that N170 components elicited in response to bilateral face/nonface displays are confined to the contralateral hemisphere (Towler and Eimer 2015). Significant main effects of laterality were also obtained in corresponding ANOVAs for the 2 subsequent time windows centered in the N2pc and CDA components (190–290 and 300–500 ms; F1,15 = 14.8, and F1,15 = 45.3, respectively, both P < 0.001), demonstrating that attention was rapidly allocated to the face, and that this face was then encoded into working memory in a position-dependent fashion. N2pc and CDA amplitudes did not differ between the left and right hemispheres, both F < 1.
Figure 2 (bottom panel) shows ERPs elicited by memory displays on identity repetition trials in the Load Two condition at electrodes contralateral to the face that would later be repeated as S2 and at electrodes contralateral to the other (non-repeated) face. Separate ERPs are shown for trials with fast and correct identity repetition responses, incorrect response trials where participants failed to report an identity repetition, and trials with slow correct responses. These results demonstrate that the spatial focus of attention during encoding and working memory retention determined participants' performance in the face identity-matching task. On trials with fast correct responses, N2pc and CDA components were elicited over the hemisphere contralateral to the S1 face that would later reappear as S2. On trials where participants failed to detect an identity repetition, these components were present over the opposite hemisphere, that is, contralateral to the non-repeated S1 face. For the statistical analyses of Load Two trials, the factor laterality was defined relative to the side of the repeated face in the memory display. The ANOVA for trials with incorrect responses confirmed the presence of reliable N2pc and CDA components at electrodes contralateral to the side of the face that was not repeated in the S2 display, F1,15 = 18.93, and F1,15 = 15.02, P < 0.001, respectively, demonstrating that attention was allocated to the “wrong” face on these trials. Analyses of identity repetition trials with correct responses included the additional factor response speed (fast vs. slow). For the CDA component, a main effect of laterality, F1,15 = 24.85, P < 0.001, indicated a strong tendency toward encoding the “correct” face into working memory on these trials. Importantly, there was an interaction between laterality and response speed, F1,15 = 11.11, P = 0.005, due to the fact that the CDA was much larger on trials where an identity repetition was reported rapidly than on trials with slow RTs. Analyses conducted separately for identity repetition trials with fast or slow correct responses revealed that a significant CDA component was elicited contralateral to the side of the repeated face on trials with fast correct responses, t(15) = 4.8, P < 0.001, whereas no reliable CDA was present on trials with slow RTs, t < 1.2. A similar pattern was observed for the N2pc component that preceded the CDA. There was a main effect of laterality, F1,15 = 6.96, P < 0.05, but also, critically, an interaction between laterality and response speed, F1,15 = 6.16, P < 0.05, A significant N2pc was elicited contralateral to the S1 face that was repeated as S2 on trials with fast correct responses, t(15) = 2.97, P < 0.01, but not on trials with slow RTs, t < 1. No significant N2pc and CDA amplitude differences between the left and right hemisphere were obtained in any of these analyses conducted for the Load Two condition.
ERPs to Centrally Presented Test Faces
Figure 3 (top panel) shows N170 components triggered by test face displays in the Load One condition (collapsed across identity repetition and change trials). N170 amplitudes to test faces were strongly attenuated at electrodes contralateral to the visual field of the face in the preceding face/house memory display. This location-specific N170 adaptation effect was confirmed in an ANOVA by a main effect of laterality, F1,15 = 17.07, P < 0.001, on N170 mean amplitudes, demonstrating that the spatially selective maintenance of a face representation in one hemisphere reduced the neural response to a centrally presented test face in the same hemisphere. Figure 3 (bottom panel) shows N170 components to test face displays on identity repetition trials in the Load Two condition, at electrodes contralateral to the repeated S1 face and at electrodes contralateral to the other non-repeated S1 face. N170 components are shown separately for trials with fast or slow correct responses and for trials with incorrect responses. N170 adaptation effects reflected which of the 2 faces in the S1 displays was selectively attended. On trials where participants failed to detect a face identity repetition, the N170 component to test faces was attenuated at electrodes contralateral to the non-repeated S1 face, F1,15 = 21.66, P < 0.001, which was the face that was selectively maintained in working memory on these trials (see above). For trials where an identity repetition was correctly reported, an ANOVA was conducted for the factors laterality and response speed. There was a main effect of laterality, F1,15 = 9.52, P < 0.01, and a significant interaction between laterality and response speed, F1,15 = 11.67, P < 0.004, On trials with fast correct responses, N170 amplitude was reliably reduced at electrodes contralateral to the repeated S1 face, t(15) = 3.82, P = 0.002, which was the face that was selectively retained on these trials (see above). In contrast, there was no lateralized N170 adaptation effect on trials with slow correct responses, t < 1.
Figure 4 shows N250r components triggered by test faces on identity repetition when compared with identity change trials in the Load One and Load Two conditions. Face identity-matching processes, as reflected by the N250r, were affected by the side where a repeated face was encountered in the S1 memory display, with larger N250r amplitudes over the contralateral hemisphere. For the Load One condition, an analysis of N250r mean amplitudes (measured in the 250- to 320-ms post-stimulus time window) revealed a main effect of trial type (identity repetition vs. identity change), F1,15 = 21.94, P < 0.001, confirming the presence of a reliable N250r component. Critically, a significant interaction between trial type and laterality, F1,15 = 7.43, P < 0.02, confirmed that this N250r was larger at electrodes contralateral to the visual field of the S1 face. Follow-up analyses showed that an N250r was reliably present not only at contralateral electrodes, F1,15 = 24.40, P < 0.001, but also ipsilaterally, F1,15 = 19.17, P < 0.001, Figure 4 (middle panel) shows Load Two ERPs on identity repetition trials contralateral and ipsilateral to the repeated S1 face and on identity change trials, separately for trials with fast and slow correct responses. Because both S1 faces differed from the S2 face on identity change trials, test face ERPs for these trials cannot be classified as ipsilateral versus contralateral. The same identity change ERP waveforms were therefore compared with contralateral and ipsilateral ERPs on identity repetition trials for Load Two. Contralateral and ipsilateral N250r components were assessed statistically in 2 separate ANOVAs with the factors trial type (identity repetition vs. identity change) and response speed (fast vs. slow). There were main effects of trial type at contralateral electrodes, F1,15 = 36.50, P < 0.001, as well as ipsilateral electrodes, F1,15 = 12.83, P < 0.01, confirming that N250r components were present both contralaterally and ipsilaterally. Interactions between trial type and response speed at both contralateral and ipsilateral electrodes, F1,15 = 4.77, P < 0.05, and F1,15 = 6.61, P < 0.05, reflected the fact that N250r amplitudes were larger on trials with fast RTs. To confirm that N250r components in the Load Two condition were larger over the hemisphere contralateral to the side where the matching face had appeared in the memory display, N250r mean amplitudes measured on identity repetition trials only were analyzed with the factors laterality (contralateral vs. ipsilateral to the repeated memory display face) and response speed. There was a main effect of response speed, F1,15 = 9.68, P < 0.01, again demonstrating that N250r amplitudes were larger on trials with fast identity-matching responses. Importantly, a main effect of laterality, F1,15 = 7.24, P < 0.02, confirmed that the N250r component was larger contralaterally. There was no interaction between laterality and response speed, F < 1.5. The contralateral dominance of the N250r component in both Load conditions (as illustrated in the contralateral–ipsilateral difference waveforms in Fig. 4, bottom panel) demonstrates that face identity-matching processes are sensitive to the represented location of an individual face in visual working memory.
The current study has provided real-time electrophysiological evidence that the limited capacity of visual memory for individual faces directly reflects the capacity limitations of focal spatial attention. Our results show that focal attention is critical for the successful encoding and maintenance of individual faces, and that attention can only be allocated to one face at a time. In the Load One condition, N2pc and CDA components were elicited contralateral to the side of the face in the face/house memory displays, indicating that attention was rapidly allocated to this face, which was then encoded in working memory. As a result, this face was successfully matched to the centrally presented test face on almost all Load One trials. In the Load Two condition, where 2 different faces had to be simultaneously encoded and retained, face identity-matching performance was strongly impaired, in spite of the fact that the interval between memory and test displays was very brief (200 ms). The estimated working memory capacity in Load Two was close to 1, which implies that only 1 of the 2 memory display faces could be selectively maintained on most trials. This is in line with an attentional competition scenario, where these 2 faces compete for focal attention, and only the winner of this competition is successfully encoded and retained in working memory.
The ERP results observed for Load Two memory displays provide strong support for this attentional competition account. They demonstrate that the success or failure of detecting identity repetitions was determined by the allocation of spatial attention to 1 of the 2 faces in the memory display. On trials where a face repetition went undetected, ERP markers of attentional object selection (N2pc) and working memory maintenance (CDA) revealed that the “wrong” non-repeated face was selectively attended and encoded into working memory. In other words, participants were unable to report that an S1 face reappeared as S2 when this face had failed to attract focal attention during encoding, and no memory representation of this face was available for an identity match with the subsequent test face. On Load Two trials where an identity repetition was rapidly detected (i.e., trials with fast correct responses), N2pc and CDA components were elicited contralateral to the repeated face, demonstrating that the “correct” face had been attentionally selected and encoded into working memory. These results are incompatible with the hypothesis that performance costs in working memory tasks observed for complex objects such as faces do not reflect limitations in the number of objects that can be simultaneously retained, but only arise during the comparison between sample and test items (Awh et al. 2007). If this was the case, participants should have been able to simultaneously select and maintain both faces in Load Two memory displays, and no lateralized N2pc and CDA components should have been elicited in response to these displays.
Interestingly, there were no reliable N2pc and CDA components in response to Load Two memory displays on identity repetition trials with slow correct responses. This suggests that attention was not selectively focused on either of the 2 memory display faces on these trials, in spite of the fact that participants were still able to detect an identity repetition. While this may seem inconsistent with the hypothesis that focal attention is necessary for the successful short-term retention of an individual face representation, 2 factors are likely to be jointly responsible for the absence of lateralized ERP components on these trials. The fact that face memory capacity was slightly above 1 in the Load Two condition suggests that a face identity repetition could sometimes be reported when spatial attention was divided between both faces in the memory display, although not as rapidly as with fully focused attention. In addition, the relatively high false alarm rate on identity change trials in the Load Two condition (14%) shows that a considerable number of all face repetition responses were merely guesses that were made when no matching spatially focused visual face memory representation was available.
There was no evidence for any hemispheric asymmetries in the attentional selection and subsequent encoding of faces in the Load One and Load Two conditions. Identity-matching performance did not differ between trials where the repeated face appeared on the left or right side of a memory display. N2pc and CDA components to these memory displays were equally large over the left and right hemisphere in both Load conditions, suggesting that there was no bias toward one hemisphere during the attention-based maintenance of a particular face for a subsequent memory match.
The allocation of spatial attention to one face in the memory display, and the resulting activation of a position-dependent working memory representation in the contralateral hemisphere, also affected the subsequent perceptual and identity-related processing of test faces, in spite of the fact that these faces always appeared at central fixation. Face-sensitive N170 components to test faces were attenuated contralateral to the side of the attended face in the memory display. In the Load One condition, such N170 adaptation effects were observed contralateral to the face in the face/house memory displays. In the Load Two condition, N170 adaptation effects reflected the focus of attention on 1 of the 2 faces in the preceding memory displays. On Load Two identity repetition trials with fast correct responses, N170 amplitude was reduced contralateral to the face that was then repeated as S2. On Load Two trials where an identity repetition was missed, N170 adaptation was observed contralateral to the other non-repeated face. On identity repetition trials with slow correct RTs, where spatial attention was not selectively focused (see above), no lateralized N170 adaptation was found. This very systematic pattern of position-specific N170 adaptation effects demonstrates that working memory representations of an attended face were maintained in the hemisphere contralateral to the side where this face was encountered during encoding, and that the presence of an active face memory representation in this hemisphere attenuated the sensory response to a subsequent centrally presented face in the same hemisphere.
In addition to modulating perceptual face processing, as reflected by N170 adaptation effects, the attentional activation of face memory representations in the contralateral hemisphere also affected subsequent face identity-matching processes. In both Load conditions, N250r components triggered by face repetitions were larger over the hemisphere contralateral to the side of the repeated face in the preceding memory display. The N250r reflects a match between a seen face and a stored visual representation of this face (e.g., Schweinberger et al. 2002). This match could result in an enhanced activation of online perceptual representations, of working memory representations, or both. The contralateral dominance of N250r components shows that position-dependent visual face memory representations were selectively activated by a face identity match. The presence of a reliable N250r over the ipsilateral hemisphere is likely to reflect the additional match-induced activation of perceptual representations of test faces. Interestingly, N250r components to face repetitions in the Load Two condition were larger on trials where these repetitions were detected rapidly (and a working memory representation of the repeated face was selectively activated, see above) relative to trials with slow responses (and spatial attention was divided across both memory display faces). This observation supports the hypothesis that focal attention determines the activation level of individual face representations. The neural response triggered by a match between a stored face representation and a perceptual representation of the same face, as reflected by N250r amplitudes, is likely to be modulated by the degree to which the matching face memory representation is selectively activated through the allocation of spatial attention. When 2 face representations are activated in parallel, the process of matching one of them to a test face is impaired, resulting in smaller N250r components and delayed identity-matching responses. In line with earlier findings by Awh et al. (2007), this demonstrates that attentional capacity limitations during the encoding and retention of complex objects also affect the subsequent comparison between memorized and test objects.
In previous ERP research, identity-sensitive N250r components were found to be strongly reduced when a competitor face was simultaneously present during face encoding, and this was interpreted as evidence for face-specific attentional resource limitations that allow only one face to be processed at a time (e.g., Neumann and Schweinberger 2009). The results from the present study confirm and extend these observations by demonstrating that the encoding and retention of individual faces depends on focal attention, and that only a single face representation can be maintained through the selective allocation of spatial attention. In this context, it is important to note that the participants in the present study were explicitly instructed to focus their attention on both faces in the Load Two memory display, because either of them was equally likely to be repeated as S2. Instead of dividing attentional resources equally between both faces, the pattern of N2pc and CDA components observed in response to Load Two memory displays demonstrated that participants selectively directed attention to one of these faces on the majority of all trials. This choice is likely to directly reflect the attentional capacity limitations of visual face memory: Participants opted to attend to one particular face rather than both faces because focal attention was necessary to successfully encode and retain at least 1 of the 2 faces in the memory display.
The current findings also have implications for whether working memory should be conceptualized as being composed of discrete fixed slots or as a flexible dynamic resource where precision of stored items is variable (e.g., Luck and Vogel 2013; Ma et al. 2014). The observation that participants selected and maintained a single face on the majority of Load Two trials is consistent with the idea that face representations occupy discrete slots in working memory, and that only a single slot is available for the retention of individual faces. However, we also found that attention was divided between both memory display faces on some Load Two trials. Discrete slot-based accounts acknowledge that the number of slots may vary from trial to trial, but also assume that when an object representation occupies one of these slots, its precision will be uniformly high. However, our results show that encoding 2 faces simultaneously incurs a substantial cost. Identity-matching responses were delayed and N250r components were attenuated on trials where attention was divided between 2 faces, suggesting that the resolution of face memory representations was lower than on trials where a single face was focally attended. In line with variable precision models of working memory (Ma et al. 2014), this trade-off between the number of items represented and the precision of working memory representations suggests that the distribution of selective spatial attention is a dynamic resource that constrains both the capacity and the resolution of visual working memory.
The limited capacity of visual face memory could be a direct consequence of the position-dependence of visual face representations. According to the sensory recruitment account of working memory, visual objects are maintained in posterior visual areas that are also responsible for the perceptual processing of these objects (e.g., Postle 2006). Because object-selective visual cortex is organized topographically (e.g., Kravitz et al. 2013), working memory representations in these areas should be position-dependent, particularly if their maintenance is mediated by selective spatial attention. The limited capacity of working memory reflects a general limitation in the ability to simultaneously select and maintain spatially distinct object representations in topographic visual cortical maps, and these limitations are particularly pronounced for more complex objects (Franconeri et al. 2013). Maintaining multiple representations of individual faces is especially challenging because faces are complex, and because faces of different individuals share visual features and are similar in terms of their global spatial configurations. This may be the principle reason why only a single individuated representation of a particular face can be attentionally selected, encoded, and retained in working memory at any given time.
This research was supported by grant ES/K002457/1 from the Economic and Social Sciences Research Council (ESRC), UK.
Our thanks to Sue Nicholas for help with programming this experiment, and Fintan Nagle for MATLAB assistance. Conflict of Interest: None declared.