To recognize visual objects, our sensory perceptions are transformed through dynamic neural interactions into meaningful representations of the world but exactly how visual inputs invoke object meaning remains unclear. To address this issue, we apply a regression approach to magnetoencephalography data, modeling perceptual and conceptual variables. Key conceptual measures were derived from semantic feature–based models claiming shared features (e.g., has eyes) provide broad category information, while distinctive features (e.g., has a hump) are additionally required for more specific object identification. Our results show initial perceptual effects in visual cortex that are rapidly followed by semantic feature effects throughout ventral temporal cortex within the first 120 ms. Moreover, these early semantic effects reflect shared semantic feature information supporting coarse category-type distinctions. Post-200 ms, we observed the effects along the extent of ventral temporal cortex for both shared and distinctive features, which together allow for conceptual differentiation and object identification. By relating spatiotemporal neural activity to statistical feature–based measures of semantic knowledge, we demonstrate that qualitatively different kinds of perceptual and semantic information are extracted from visual objects over time, with rapid activation of shared object features followed by concomitant activation of distinctive features that together enable meaningful visual object recognition.
Recognizing visual objects is an effortless and subjectively instantaneous cognitive ability, the details of which are poorly understood. Identifying an object requires some degree of stimulus-based visual processing before the emerging representation becomes increasingly abstract and semantic over time. However, little is known about how meaningful semantic information is extracted from perceptual inputs. Responses sensitive to coarse-grained category-level information (e.g., knowing an object is an animal or vehicle) have been observed at latencies within 150 ms (VanRullen and Thorpe 2001; Kirchner and Thorpe 2006; Liu et al. 2009), suggesting that coarse semantic information is rapidly accessed. More fine-grained semantic information, such as that required to identify an animal as a dog (known as basic-level recognition), is associated with additional processes, which take place after 150 ms (Martinovic et al. 2007; Schendan and Maher 2009; Clarke et al. 2011). While these findings suggest that increasingly detailed semantic information rapidly emerges across time, core aspects of this process remain unclear. To understand how meaningful object representations emerge from visual percepts, it is necessary to determine the kind of information available in neural signals over time and the brain regions which process this information. The aim of the current study is to address these fundamental issues. Specifically, we investigate the nature of the semantic information that drives the transition from the rapid coarse-grained representations to the emergence of more detailed semantic representations, and the neuroanatomical regions supporting this transition. These core issues are addressed in the current study using magnetoencephalography (MEG), which enables us to track the time course of perceptual and conceptual processes during the recognition of meaningful objects.
Visual object recognition is known to rely on a hierarchically organized processing stream through occipital and ventral temporal cortices, where increasingly complex information is processed in progressively more anterior regions (Ungerleider and Mishkin 1982; Felleman and Van Essen 1991; Bussey et al. 2005). The posterior aspects of the ventral temporal lobes process both perceptual and category-level semantic information about visual objects (Haxby et al. 2001; Vuilleumier et al. 2002; Kriegeskorte et al. 2008), with the anteromedial temporal cortex, at the endpoint of the visual hierarchy, supporting the most fine-grained semantic processes (Tyler et al. 2004; Moss, Rodd, et al. 2005; Taylor et al. 2006; Barense et al. 2007). Temporally, visual object processing is hypothesized to progress from a coarse-to-fine–grained analysis of object identity across time (Hochstein and Ahissar 2002; Hegdé 2008)—a cognitive feat underpinned by both feedforward and recurrent processing mechanisms (Bar et al. 2006; Clarke et al. 2011). Within the first 100 ms, the cortical responses generated within the visual cortex reflect perceptual stimulus–based properties of the image, including the complexity of the visual image, object color, texture, and natural image statistics (Tarkiainen et al. 2002; Martinovic et al. 2008; Scholte et al. 2009). These initial responses propagate anteriorially along the ventral axis of the temporal lobe in a feedforward manner—characterized as the initial feedforward sweep (Lamme and Roelfsema 2000; Bullier 2001), where information is integrated and accumulated to support coarse category-type decisions. For example, in both human and nonhuman primates, neural responses have been recorded in ventral temporal and prefrontal cortices with a latency of 100–150 ms that were related to the category of the visual object (Freedman et al. 2001; Liu et al. 2009), and electroencephalography (EEG) measurements show category-related activity after 150 ms (Thorpe et al. 1996; VanRullen and Thorpe 2001). Furthermore, this information appears to be behaviorally relevant as category-based decisions can be performed within 100–150 ms of picture onset as measured by eye movement latencies (Kirchner and Thorpe 2006; Crouzet et al. 2010). The implication of such studies is that during this initial feedforward sweep, category-related information is rapidly extracted from the visual percept, and this information is reflected in responses throughout ventral temporal and prefrontal cortices (e.g., Liu et al. 2009).
Extracting a more detailed representation of an object—as in the case of basic-level recognition, requires additional fine-grained analyses supported by more anterior temporal regions (Tyler et al. 2004; Moss, Rodd, et al. 2005) and recurrent processing mechanisms (Schendan and Maher 2009; Clarke et al. 2011). For example, evidence that recurrent processes support the formation of more detailed semantic representations comes from Clarke et al. (2011), who showed that recurrent activity increased between anterior and posterior sites in the ventral temporal cortex from 150 to 250 ms as a function of the need to form detailed semantic representations. This time frame, during which these fine-grained semantic processes occurred, is consistent with observations by Martinovic et al. (2007), who reported that neural activity between 200 and 300 ms covaried with the time required to determine the specific name of visual objects. Taken together, these studies suggest that recurrent processes in the ventral stream within the first 300 ms of stimulus presentation support the rapid emergence of detailed semantic knowledge about objects.
Beyond this coarse-to-fine trajectory underpinning such emerging semantic representations, it remains unclear what kinds of semantic information become available at different latencies, and which neural regions support them. The investigation of these questions requires a cognitive account of semantic knowledge that incorporates various kinds of semantic information about objects. Here, we focus on a feature-based account of semantic knowledge that claims that the meaning of a concept is composed of its constituent semantic features (e.g., <has ears>, <is small>, and <is played>; e.g., McRae et al. 1997; Tyler and Moss 2001; Moss et al. 2007). The statistical regularities derived from such semantic features have been shown to predict behavioral performance on conceptual tasks (McRae et al. 1997; Randall et al. 2004; Cree et al. 2006; Taylor et al. 2008), while recent research shows that statistical semantic feature data correlate with brain activity (Chang et al. 2011). Here, we aim to determine the extent to which the spatiotemporal neural activity measured with MEG is related to the statistical properties of semantic features, which capture different aspects of object meaning.
Two key statistical measures that influence conceptual processing are feature distinctiveness and the extent to which features are correlated (McRae et al. 1997; Randall et al. 2004; Moss et al. 2007; Taylor et al. 2011). Feature distinctiveness measures the degree to which a specific semantic feature is shared across concepts (e.g., has ears) or is more distinctive for a particular concept (e.g., has a hump). Shared features tend to be distributed across many different category or domain members (e.g., many animals have ears) and so provide coarse information about what type of thing the concept is likely to be. Identifying an object (and so differentiating between similar objects—such as a horse and a cow) requires access to more fine-grained semantic information, which is provided by distinctive features. Moreover, according to one feature-based model of semantics—the conceptual structure account (Tyler and Moss 2001; Moss et al. 2007; Taylor et al. 2007)—distinctive features are ultimately only informative for basic-level identification in combination with shared features. Concepts, which share many features, generate conceptual ambiguity in which many concepts are activated. This ambiguity can be resolved by information about the distinctive features of a concept, which serve to disambiguate the concept from its semantic competitors. For example, a distinctive feature of a camel is that it has a hump. Knowledge of the feature has a hump in isolation may not be informative about the identity of the concept; instead, this information must be combined with the concept's shared features (e.g., has eyes, has ears, has 4 legs, etc.) in order to identify the concept as a camel. Thus, identifying objects at the basic-level requires the integration of distinctive and shared information. Given that the coarse-grained or categorical information emerges before fine-grained information, we hypothesize that the effects of shared semantic information will be apparent within the first 200 ms, while effects of distinctive features will occur post-200 ms. Moreover, we predict that the early processing of shared feature information will be associated with more posterior ventral temporal regions than the later processing of shared combined with distinctive feature information, which will be associated with the anterior temporal lobes.
In addition to feature distinctiveness, the extent to which a concept's features tend to co-occur, correlational strength, is claimed to be a crucial factor in accessing conceptual meaning (McRae et al. 1997; Taylor et al. 2008). Correlation between a concept's features is hypothesized to strengthen the links between features, speeding their coactivation within a distributed semantic network, and thereby the integration of semantic information (McRae et al. 1997; Randall et al. 2004). This account predicts that the effects of highly correlated features will occur rapidly, while effects associated with the processing of weakly correlated features will occur during later stages of conceptual processing (i.e., post-200 ms). Moreover, concepts with weakly correlated features require more effortful processing to activate and integrate those features, which are weakly correlated and thus do not benefit from mutual coactivation. This effect may be underpinned by the increased involvement of inferior frontal lobe structures associated with accessing conceptual information (Thompson-Schill et al. 1997; Badre and Wagner 2002; Moss, Abdallah, et al. 2005). Therefore, the measures of feature distinctiveness and correlational strength capture how shared or distinctive the semantic features are, and the relationship between features, respectively.
The aim of the current study was to directly investigate how the meaning of an object emerges over time through charting the temporal relationship between perceptual and conceptual processes that underlie visual object recognition. As our primary interest was to investigate the rapid emergence of meaningful information from visual inputs, our analyses focus on the first 300 ms. To provide an optimal analytical approach to this issue, we related single-trial MEG responses to concept-specific perceptual and semantic feature–based measures. An increasing number of studies have used a regression approach to analyze M/EEG data (Dien et al. 2003; Hauk et al. 2006; Rousselet et al. 2008), which enables the characterization of how multiple variables influence neural activity within the same data set. In the current study, we apply the linear regression approach of Hauk et al. (2006) to examine the extent to which a variety of perceptual and semantic feature–based statistical measures are reflected in neural activity during the basic-level identification of objects before estimating the cortical underpinnings of these effects (Fig. 1). Integrating cognitive accounts of semantic knowledge and the neurobiology of visual object processing, we predict that neural signals recorded with MEG will show a rapid progression from perceptual stimulus–based information in the visual cortex to more semantically based variables across time. We predict that early semantic effects within the first 200 ms will be related to shared semantic features and be associated with more posterior occipitotemporal regions. Critically, these effects are predicted to occur prior to those associated with the combined effects of shared and distinctive features required for basic-level concept identification, which we predict will engage more anterior regions of the ventral stream. Finally, MEG responses to concepts with strongly correlated features are predicted to occur before responses to concepts with weakly correlated features, whereby the latter concepts may additionally be associated with more effortful semantic access processes involving the inferior frontal lobe. To test these predictions, MEG signals were recorded during the basic-level naming of pictures depicting concepts in the McRae et al. (2005) feature production norms.
Materials and Methods
Eleven healthy participants (9 males, 2 females) took part in the study. All were right handed and had normal or corrected-to-normal vision. The average age was 23.2 years (range 19–31 years). All participants gave informed consent, and the study was approved by the Cambridge Psychology Research Ethics Committee.
The study used images of meaningful objects that represented concepts taken from a large property generation study conducted by McRae et al. (2005). Since these norms were collected from North American English speakers, the concept and semantic feature data were modified for use with native British English speakers, resulting in an anglicized version of the norms (Taylor et al. 2011). We selected colored images for 350 concepts, which could be represented as single objects independent of context, and 50 meaningless sculpture images as filler items that were not analyzed. All images (from various sources including internet searches) were presented in isolation on a white background. For each concept, we obtained 13 measures that captured visual attributes of the picture, general conceptual properties (such as familiarity and exemplarity) as well as feature-based statistical measures derived from the anglicized property norms.
Perceptual and Conceptual Variables
As objective measures of image complexity, the number of nonwhite pixels in the image and the jpg file size (Székely and Bates 2000) were calculated from the pictures used in the study. Before calculating these measures, all images were saved at a resolution of 72 pixels per inch and were copied onto a plain white background of equal size. Concept familiarity and picture exemplarity ratings (7-point scale) reflecting how good an example the picture is of the intended concept and how familiar the concept is, respectively, were collected from an independent group of 17 healthy participants who did not participate in the MEG study.
Semantic feature–based variables were calculated from the anglicized version of the McRae norms (Taylor et al. 2011). We obtained the number of features (NoF) associated with each concept, which indexes how much semantic information is associated with the concept. The proportion of visual features was calculated as [the number of visual features]/[the total number of features] where features were classified as “visual” if they related to visual information in the feature norms (Cree and McRae 2003; McRae et al. 2005). As mentioned in the Introduction, semantic features vary in the extent to which they are shared by many concepts or are distinctive to a particular concept. Feature distinctiveness was estimated as [1/number of concepts the feature occurs in], and 3 concept-specific measures captured how much shared or distinctive information was associated with each concept: the relative proportion of shared to distinctive features (where shared features occur in 3 or more concepts and distinctive features occur in 1 or 2 concepts; Randall et al. 2004), the mean distinctiveness of all features within a concept, and the skew of the distribution of the feature distinctiveness values within a concept, where a positive skew indicates relatively more shared to distinctive features and a negative skew more distinctive than shared features.
Correlational strength measures the regularity with which 2 features co-occur (for details, see Randall et al. 2004; Taylor et al. 2008) and is calculated between each feature and all other features. The mean correlational strength value for a concept was calculated as the mean of all feature correlational strength values for all features in that concept. Four correlational strength variables were calculated. First, the mean correlational strength of all the shared features within the concept (mean correlational strength − shared features/within concept) provides a measure of how correlated the concept's shared features are, and only includes correlations between features associated with that concept. Second, the mean correlational strength of all the distinctive features within the concept (mean correlational strength − D features/within concept) reflects how correlated the concept's distinctive features are. Since it is assumed that semantic knowledge is represented in a distributed semantic system and that a given feature will strongly activate all associated features (regardless of whether they occur in the same concept or not), corresponding correlational strength measures were also calculated using all features (i.e., mean correlational strength − S features/across concept and mean correlational strength − D features/across concept).
Since many of these 13 variables are highly correlated, we performed a principle components analysis (PCA) to orthogonalize the variables while reducing the number of variables to an analytically manageable set.
Principle Components Analysis
A PCA was performed using data from 412 concepts in the anglicized norms. The PCA was conducted on a larger range of items than presented to participants (350) so that the resulting components would be representative of the structure given by the largest data set possible. The PCA used varimax rotation and resulted in 6 orthogonal components accounting for 85.8% of the overall variance (Supplementary Table S1). The resulting components were interpreted as follows.
The first component, relative distinctiveness, incorporated variables reflecting how much distinctive information was associated with the concept, the relative amount of shared and distinctive information, and the correlational strength of the distinctive features. Therefore, it primarily captured whether a concept has relatively more shared or more distinctive features. The second component, image complexity, incorporated visual complexity (objectively measured by the jpg file size) and the number of nonwhite pixels in the image. The correlational strength component reflected the correlational strength of shared features both within and across all concepts. Therefore, it captured how correlated (likely to co-occur) a concept's shared features were. The component termed familiarity largely weighted for concept familiarity and picture exemplarity, while the visual features component reflected the proportion of the concept's features that could be visually depicted. Finally, the NoF component encompassed the number of features associated with a concept (Table 1).
|Complexity and size of the image|
|Captures the relative degree of shared and distinctive features associated with the concept and the correlation of the distinctive features|
|Correlational strength of a concepts shared features|
|Total number of semantic features for a concept|
|Proportion of concept's features that were visual features (e.g., is round)|
|Concept familiarity and picture exemplarity|
|Complexity and size of the image|
|Captures the relative degree of shared and distinctive features associated with the concept and the correlation of the distinctive features|
|Correlational strength of a concepts shared features|
|Total number of semantic features for a concept|
|Proportion of concept's features that were visual features (e.g., is round)|
|Concept familiarity and picture exemplarity|
Each trial consisted of a centrally presented black fixation cross on a white background for 600 ms, followed by a picture lasting 500 ms, then a blank white screen lasting between 2400 and 2700 ms. The participants' task was to overtly name each object at the basic-level (e.g., “tiger”) and to respond with “object” if they were unsure of the identity. Basic-level naming was used because it requires access to detailed conceptual representations. Participants were instructed to name the object as accurately as possible, while keeping movements to a minimum to prevent excessive muscular artifacts appearing in the MEG recordings. The order of stimuli was fixed such that consecutive stimuli were neither semantically nor phonologically related. Semantic relatedness was defined as membership in the same object category (e.g., animals), while phonological relatedness referred to object names sharing an initial phoneme. The stimuli were presented in 5 blocks, counterbalanced across subjects, with a short rest period between each block. Each block contained 80 items and lasted approximately 5 min. The presentation and timing of stimuli was controlled using Eprime version 1 (Psychology Software Tools, Pittsburgh, PA). Naming accuracy was recorded by the experimenter during data acquisition.
MEG/Magnetic Resonance Imaging Recording
Continuous MEG data were recorded using a whole-head 306-channel (102 magnetometers, 204 planar gradiometers) Vector-view system (Elekta Neuromag, Helsinki, Finland) located at the MRC Cognition and Brain Sciences Unit, Cambridge, UK. Eye movements and blinks were monitored with electrooculogram (EOG) electrodes placed around the eyes, and 4 head-position indicator (HPI) coils were used to record the head position (every 200 ms) within the MEG helmet. The participants' head shape was digitally recorded using a 3D digitizer (Fastrak Polhemus Inc., Colchester, VA), along with the positions of the EOG electrodes, HPI coils, and fiducial points (nasion, left, and right periaricular). MEG signals were recorded at a sampling rate of 1000 Hz, with a band-pass filter from 0.03 to 125 Hz. To facilitate source reconstruction, high-resolution (i.e., 1 × 1 × 1 mm) T1-weighted magnetization prepared rapid gradient echo scans were acquired during a separate session with a Siemens 3-T Tim Trio scanner (Siemens Medical Solutions, Camberley, UK) located at the MRC Cognition and Brain Sciences Unit, Cambridge, UK.
Initial processing of the raw data used MaxFilter (Elektra-Neuromag) to detect static bad channels that were subsequently reconstructed by interpolating neighboring channels. The data were also visually inspected to identify bad channels containing long periods of high amplitude or noisy signals that were reconstructed through interpolation. Head movement compensation (using data from the HPI coils) was performed, and head position was transformed into a common head position to facilitate group sensor analyses. The temporal extension of the signal-space separation technique (Taulu et al. 2005) was applied to the data every 4 s in order to segregate the signals originating from within the participants' head from those generated by external sources of noise. The cleaned MEG data were low-pass filtered at 40 Hz and epoched from −100 to 300 ms with respect to picture onset. Baseline correction was applied using the 100-ms prestimulus interval.
Naming responses were considered incorrect if the name given by the participant did not exactly match the name in the anglicized version of the McRae norms. In addition, pictures with less than 70% name agreement, as determined by an independent group of 20 healthy individuals, were excluded as were trials that were incorrectly named by more than 50% of participants (213 items remained). These criteria were employed to ensure that the objects were maximally related to the intended concepts and therefore the conceptual variables. Finally, trials were excluded if they elicited an EOG amplitude exceeding 200 μV or if the value on any gradiometer channel exceeded 2000 fT/cm. All further analyses were conducted on the remaining items (mean: 177 items, range: 154–192 items).
In a departure from conventional MEG analyses, we used a multiple linear regression approach following methods described by Hauk et al. (2006). The multiple linear regression approach constructs an evoked waveform based on correlation coefficients rather than an averaged data point and reflects the extent to which each variable of interest modulates the MEG signal over time and space.
At each MEG sensor, and for each time point (s,t), multiple linear regressions were performed using a robust regression approach where the recorded MEG signals for all items were the outcome vectors (Y) and the component scores (from the PCA) for those items on each component (n components) were entered as predictor vectors (X), with associated coefficients (b), as in
The length of each vector (Y and X) is equal to the number of items entered into the regression, while e is the error term and C is the constant (in this case, the constant equals the mean, as the component scores have a mean of zero and unit standard deviation). The regression coefficient for each component (bc) can be considered a summary value that captures the relationship between a particular variable and the recorded MEG signal across items. Positive coefficients indicate a positive relationship between the values recorded at that sensor/time point and the component scores, while negative coefficients indicate a negative relationship between the values recorded at that sensor/time point and the component scores. Coefficients near zero indicate no consistent relationship between the MEG signals and the component scores. The regression coefficients (b) were calculated at each time point between −100 and 300 ms and at each of the 102 magnetometer sensors. The resulting coefficients, termed event-related regression coefficients (ERRCs), are summary values and can be treated in the same way as evoked data in typical MEG analyses including source localization (Hauk et al. 2006).
MEG Sensor Analysis
Only the magnetometers were used for the ERRC analysis. To test whether any of the ERRCs (one for each component) show consistent effects across participants, a 3D (topography × time) sensor SPM mass-univariate analysis was conducted using SPM5 (Wellcome Institute of Cognitive Neurology, London, UK) across space and time. The topographic distribution of the magnetometer sensor data was transformed into a 2D space by linear interpolation to a 32 × 32 pixel grid, which extended through time. The 3D space-time data were written out as an image, entered into a one-way analysis of variance and tested against zero (zero signifying no effect of a variable) using a one-sample t-test. The resulting t-statistic images were thresholded at a “pixelwise” level of P < 0.005 and a cluster extent of P < 0.05, using random field theory. This procedure reveals significant effects of each component on the magnetometer data.
To interpret the directionality of significant effects, the ERRC topographies were visually compared with the topography of the grand-mean data (Hauk et al. 2009). For example, if significant positive ERRC values spatially coincide with a positive peak in the grand-mean topography, then the ERRC effect can be interpreted as showing that increasing values of the variable are associated with an increasing magnitude of the peak response in the grand-mean (positive relationship; therefore, this interpretation also holds when the ERRC values and the grand-mean peak both have negative signs). Alternately, if the significant positive ERRC values coincide with a negative peak in the grand-mean topography, then the ERRC effect can be interpreted as showing that increasing values of the variable are associated with a decreasing magnitude of peak responses in the grand-mean (negative relationship; therefore, this interpretation also holds when the ERRC values are negative and the grand-mean peak is positive). However, this approach assumes that the same underlying neural sources produce both the topographic distributions of the grand-mean and ERRC effects. Therefore, the relationship between the mean activity underlying the MEG responses and the variable can only be inferred when there is a spatial correspondence between the ERRC and grand-mean topographies.
MEG Source Analysis
Source localization of the ERRCs was performed using data from the magnetometer sensors. Magnetic resonance imaging (MRI) images were segmented and spatially normalized to a Montreal Neurological Institute (MNI) template brain in Talairach space using SPM5. A template cortical mesh with 7004 vertices was inverse normalized to the individual's specific MRI space (Mattout et al. 2007). Each vertex location corresponded to a dipole with a fixed orientation perpendicular to the surface, with a mean intervertex spacing of ∼5 mm. MEG sensor locations were coregistered to the subject-specific MRI space using the digitized head points and aligning the fiducial points obtained during acquisition. Brainstorm was used to fit a boundary element model (Mosher et al. 1999) to the inner-skull mesh and to calculate the lead fields for the sources. The data were inverted to estimate activity at each cortical source using a multiple sparse priors approach (Friston et al. 2008) and the default options in SPM5 (with the exception that a Hanning window was not used). The estimated cortical activity was averaged across a time window (statistically identified using the sensor SPM analysis approach as described above) and written out as an intensity image in MNI space. Images were smoothed with a 12 mm FWHM Gaussian smoothing kernel, before averaging the resulting ERRC source images across participants. The resulting maps therefore show the location of the greatest activity associated with the ERRC and therefore the location of the neural sources contributing to the effects. Results are displayed on an inflated cortical surface created with FreeSurfer (Dale and Sereno 1993; Dale et al. 1999; Fischl et al. 1999).
The current study tested 4 central predictions concerning the time course and location of perceptual and semantic effects during object processing. First, that neural signals will show a rapid progression from perceptual information in the visual cortex to more semantically based variables across time. Second, early semantic effects will relate to shared semantic features and be associated with more posterior ventral temporal regions. Third, that effects of shared features will occur prior to those associated with the combined effects of shared and distinctive features required for basic-level concept identification, which we predict will engage more anterior regions of the ventral stream. Finally, we predicted effects of weakly correlated features will engage inferior frontal regions to aid the mutual coactivation and integration of features that benefit less from mutual coactivation. To test these predictions, an ERRC (Hauk et al. 2006) analysis was performed at the sensor level to determine whether neural processing is significantly modulated by the perceptual and semantic factors, and when different types of information are expressed in the MEG signals (Table 2 and Fig. 2). Significant sensor-level effects were then localized in the cortex (Fig. 3).
|Time window||P (corrected)||Peak time|
|Time window||P (corrected)||Peak time|
The earliest significant effects of image complexity were between 74 and 116 ms. Furthermore, the tight correspondence between the ERRC topography for the image complexity effects and the topography of the grand-mean data indicates that inferences about the directional covariation of image complexity and the mean data can be made. As shown in Figure 2, the peak magnetometer effect after 74 ms displays a positive covariation between MEG signals indexed by the grand-mean and increasing values of image complexity shown by the ERRC. Therefore, increasing values of image complexity were associated with an increasing magnitude of MEG signals (for both positive and negative polarities), revealing a positive relationship between image complexity and the magnitude of magnetometer signals beginning after 74 ms. Source localization shows these initial effects of image complexity were localized in bilateral occipital cortex (Fig. 3).
Later effects of image complexity were found after 180 and 234 ms that were also localized primarily to bilateral occipital cortex. Significant positive ERRCs were observed after 180 ms over right posterior sensors; however, the significant ERRC effect did not spatially correspond to a discernable peak in the grand-mean data. The lack of correspondence between the ERRC topography and the grand-mean suggests that different neural sources produced the 2 topographic distributions and thus that the mean data cannot be used to infer the direction of the current ERRC effect. A third effect of image complexity after 234 ms showed negative ERRCs located over a negative peak in the grand-mean data, such that images with greater image complexity values led to more negative values in the grand mean (i.e., larger magnitudes). Together, these findings show the recurring influence of image complexity on neural activity generated within early visual regions, and a general pattern whereby increasingly complex visual images give rise to an increased magnitude of responses recorded by MEG.
We saw rapid semantic effects captured by the relative distinctiveness measure that primarily captures whether a concept has relatively more shared or distinctive features. Significant negative ERRC values from 84 to 120 ms coincided with a positive peak in the grand-mean data, indicating that decreasing values on the relative distinctiveness measure—more shared relative to distinctive information, resulted in increasing MEG signals. The neural underpinning of this rapid semantic effect was localized along the extent of the left ventral temporal cortex extending into the anterior temporal lobe. This effect shows that general shared semantic information is rapidly extracted from the visual input with an onset shortly after the initial visual effects. Furthermore, this early effect of relative distinctiveness is underpinned by cortical regions at higher levels of the visual hierarchy than the initial visual effects located in more posterior regions. As such, the rapidly evoked representations generated prior to 150 ms by the initial feedforward pass of activity along ventral temporal cortex reflects both perceptual and shared semantic information that together provide coarse information sufficient for category (e.g., animals and vehicles) and domain (i.e., living or nonliving) level decisions. These effects are consistent with our first 2 predictions that initial effects are associated with visual processing in occipital regions and that shared feature information becomes available early, rapidly after the onset of perceptual analyses.
Two subsequent effects of relative distinctiveness between 170–210 and 240–300 ms also reflected increasing MEG signals associated with concepts with relatively more shared information, as indicated by significant negative ERRC values coinciding with positive peaks in the grand-mean data that were localized within the left ventral stream. Importantly, a further effect of relative distinctiveness was found between 240 and 300 ms in which negative ERRC values coincided with a negative peak in the grand-mean data (see Fig. 2), indicating that concepts with more distinctive relative to shared information were associated with increasing MEG responses. Thus, after 240 ms, neural activity distributed along the extent of the left ventral temporal cortex was sensitive to both the shared and the distinctive aspects of a concept's meaning. These effects of shared and distinctive semantic information, whose integration enables basic-level identification, further show that more fine-grained semantic processes occur between 200 and 300 ms and supports our third prediction that combined effects of shared and distinctive features will occur after the initial effects of shared features.
The final measure to show an effect was correlational strength that measures whether a concept's features co-occur with other features. After 224 ms, positive ERRC values coincided with a negative peak in the grand-mean data. The opposing signs of the topographic distribution across the posterior sensors in the ERRC and grand-mean data indicate that increasing correlational strength leads to decreased MEG responses. Therefore, concepts with more weakly correlated features were associated with increasing MEG responses that were localized to the right ventral and anterior temporal regions as well as in bilateral prefrontal cortex.
These results suggest that activity increases in ventral and anterior temporal as well as bilateral prefrontal cortices when the semantic information to be integrated does not benefit from mutual coactivation (conferred through strongly correlated features), and additional processing is required to mutually activate and integrate features. This effect partly supports our fourth prediction, that effects relating to strongly correlated features occur before effects of weakly correlated features, as we find effects of weakly correlated features but not the preceding effects of strongly correlated features. In addition, the effect of weakly correlated features reported here was localized within ventral temporal, anterior temporal, and prefrontal cortices, again consistent with our prediction. There were no significant effects of the proportion of visual features, familiarity, or NoF measures.
The current study aimed to determine the time course of perceptual and semantic effects associated with the rapid formation of detailed meaningful visual object representations. Using a linear regression approach to analyze MEG data (Hauk et al. 2006), we determined the extent to which selected perceptual and semantic feature–based statistical variables modulated neural activity during the early stages of object recognition. We predicted that neural signals will show a rapid progression from the initial perceptual stimulus–based effects to responses reflecting more semantically based information across time. We also predicted that early semantic information will be related to measures associated with shared semantic features and that these will be reflected primarily by responses within the ventral stream. Critically, these effects were predicted to occur prior to those associated with fine-grained semantic processes that require information about both shared and distinctive features. Finally, we predicted that the effects of weakly correlated features may occur later and additionally engage inferior frontal regions to aid the mutual coactivation and integration of features that benefit less from mutual coactivation.
Early Effects (pre-200 ms)
The first cortical signatures of visual processing are known to arise from early visual cortex before neural activity propagates through the ventral temporal cortex (Lamme and Roelfsema 2000; Bullier 2001). Accordingly, our results showed that the initial effects, starting at 74 ms, were driven by the complexity of the images and were localized to bilateral occipital cortex. While corroborating the known neural dynamics during visual object processing, this result further replicates previous findings that initial activity over the occipital lobe is highly correlated with purely visual measures (Tarkiainen et al. 2002; Martinovic et al. 2008).
We observed rapid semantic effects between 84 and 120 ms along the extent of the left ventral temporal cortex into the anterior temporal lobes. Analyses of the sensor data revealed that the magnitude of MEG signals increased as a function of an increasing proportion of shared relative to distinctive features reflecting more general, shared, information about the concept (e.g., has eyes, has ears, has 4 legs are shared, general features associated with many animals). This rapid effect of semantics occurred within the time frame of the initial feedforward sweep and along the entire ventral temporal cortex and involved increasingly anterior regions compared with the initial perceptual effects. These results show that the initial transition from perceptual to semantic processing occurs very rapidly and emerges as neural activity propagates along the ventral temporal cortex into the anterior temporal lobes. Furthermore, we show that early semantic processing reflects more shared semantic properties suggesting that the representation established during this initial feedforward sweep is informed by both perceptual and shared semantic factors sufficient to support coarse-grained or categorical dissociations but not a more differentiated representation of the object.
The notion that object representations established within the initial feedforward sweep are based upon both perceptual and semantic information suggests that effects reported in ultrarapid visual categorization tasks are based on more than just stimulus-based visual information. Ultrarapid visual categorization tasks consistently report that coarse or categorical distinctions can be made within the first 100–150 ms of neural activity and are presumably underpinned by predominantly feedforward activity (Thorpe et al. 1996; VanRullen and Thorpe 2001; Kirchner and Thorpe 2006; Liu et al. 2009; Crouzet et al. 2010). The results of the current study are consistent with the conjecture of VanRullen and Thorpe (2001) that such rapid distinctions are based on more than low-level visual properties of the stimulus, and highlight that the nature of this additional information consists of more abstract, semantic measures capturing the type of thing the object is. Here, the presence of such representations was unveiled using feature-based statistical measures capturing information about shared semantic features.
The rapid effect of shared semantic features was prominent throughout the left ventral temporal cortex extending into the anterior temporal lobes. The anterior temporal lobes are hypothesized to integrate more complex semantic information (Tyler et al. 2004; Moss, Rodd, et al. 2005; Taylor et al. 2006; Barense et al. 2007). Thus, this rapid effect of shared features in the anterior temporal lobe may reflect the engagement of more complex processing required for concepts whose many shared features render them more semantically confusable or ambiguous. However, the fast responses in the anterior temporal lobes may also be a consequence of the automatic initial feedforward sweep of neural responses through occipital and ventral temporal cortices into the anterior temporal lobes (Felleman and Van Essen 1991), as opposed to heightened semantic integration demands per se. Neural representations accumulated through the automatic predominantly feedforward processing mechanism may reflect nonspecific semantic information that is true of many similar exemplars. For example, Liu et al. (2009) reported that neural responses during the initial feedforward sweep (i.e., after ca. 100 ms) in both the posterior and the anterior temporal lobes were equally reflective of object category, indicating that coarse information was coded throughout the stream at this time point. In any case, the current results demonstrate that during the initial feedforward sweep through occipital and ventral temporal cortices, neural responses appear to become increasingly abstracted from a perceptual to a perceptual–semantic representation suited to supporting coarse categorical distinctions.
Such coarse, rapidly formed representations are unable to support more differentiated representations that require additional processing (Fei-Fei et al. 2007; Mace et al. 2009; Clarke et al. 2011). Beyond 150 ms, dynamic long-range recurrent processing mechanisms are claimed to support more complex visual object processing (Lamme and Roelfsema 2000; Hochstein and Ahissar 2002; Bar et al. 2006; Schendan and Maher 2009; Clarke et al. 2011). In the current study, we found recurring effects of image complexity and increased sharedness of semantic features between 150 and 200 ms poststimulus onset, which appear to reflect an additional phase of processing for objects which are more visually complex and are more semantically ambiguous, that is, have a greater proportion of shared semantic features. That is, basic-level identification of concepts with more shared relative to distinctive features was associated with greater posterior and middle ventral stream activity than the basic-level identification of concepts with more distinctive relative to shared features. This increased processing may be required to disambiguate concepts with many overlapping (i.e., shared) features. The progression from coarse semantic processing during the initial feedforward sweep to recurrent processing of more visually complex and semantically ambiguous objects is consistent with the notion that feedforward processing along ventral temporal cortex supports vision at a glance, while feedback in the reverse direction supports vision with scrutiny (Hochstein and Ahissar 2002) and is also consistent with more iterative, recurrent accounts which claim that recurrent processing supports the formation of increasingly complex semantic representations (Schendan and Maher 2009; Clarke et al. 2011).
Fine-grained effects (200–300 ms)
The 200–300 ms time frame is claimed to be critical for the formation of higher level meaningful object representations (Bar et al. 2006; Martinovic et al. 2007). In agreement with such claims, we find temporally and spatially overlapping effects of semantic feature–based effects between 200 and 300 ms concerning the correlation of semantic features, shared semantic features, and distinctive semantic features, whose combined information is essential for more fine-grained differentiation and identification.
Effects of the feature-based statistical measure of correlational strength were observed beginning after 200 ms, overlapping with effects for both the sharedness and the distinctiveness of concepts' features. Specifically, MEG signals showed greater responses for concepts whose features were less highly correlated, and these were localized along the extent of the ventral temporal lobe into the anterior temporal cortex and in bilateral prefrontal cortices. These results show increased activity in ventral and anterior temporal as well as bilateral prefrontal cortices when the semantic information to be integrated does not benefit from mutual coactivation (conferred through strongly correlated features), and so the integration of weakly correlated features into an emerging conceptual representation will require increased processing by virtue of the decreased correlation between features. The association of this effect with bilateral prefrontal cortices is consistent with previous studies showing that activity in inferior frontal structures is sensitive to semantic retrieval and selection demands. Increases in left prefrontal cortex activity are observed during semantic decisions about associated items (Thompson-Schill et al. 1997; Badre and Wagner 2002; Moss, Abdallah, et al. 2005), and prefrontal activity becomes increasingly bilateral when retrieval demands increase (Wagner et al. 2001). Similarly, we suggest that the selection and retrieval of weakly correlated semantic information places greater demands on the conceptual system, driving the bilateral prefrontal cortex responses between 224 and 260 ms. This suggests that increased activity in prefrontal cortex for concepts with more weakly correlated semantic features may reflect the increased involvement of controlled semantic retrieval mechanisms that may only be weakly engaged by concepts with strong intrinsic feature correlations.
Within the same time frame, the MEG signals were also sensitive to the relative distinctiveness measure reflecting dual effects of both increased responses for concepts with a greater degree of shared feature information and a separate increase in responses for concepts with a greater degree of distinctive feature information. Source localization estimated that the effects of relative distinctiveness were generated in the left ventral temporal cortex. Taken together, these results show that beginning after 200 ms, processing increases for weakly correlated semantic features, and that both shared and distinctive semantic feature information was processed in parallel, information which together supports the fine-grained recognition of an object as a meaningful thing. These results highlight a transition from early processing of primarily shared information, to later effects of weakly correlated features along with shared-general and distinctive object–specific information. The assimilation of distinctive and shared features into the emerging representation, initially based on shared features, allows for conceptual differentiation supporting basic-level identification.
During this time frame, continuing recurrent processing mechanisms support the processing of the fine-grained details required for basic-level recognition. Schendan and Maher (2009) propose that recurrent processes after 200 ms support more fine-grained object-specific knowledge, while recurrent activity has also been shown to be modulated according to the degree of semantic integration that is required for recognition (Clarke et al. 2011). Recurrent interactions between the anterior temporal and more posterior fusiform may underpin this semantic integration, supporting fine-grained differentiations (Clarke et al. 2011). The anterior temporal lobes, specifically the perirhinal cortex, is hypothesized to support visual object processing of confusable and ambiguous objects, that is, those with many shared features and has been shown to support the fine-grained semantic processing of visual objects, especially those which share many features with one another (Tyler et al. 2004; Moss, Rodd, et al. 2005; Taylor et al. 2006). Here, we find semantic feature effects pertaining to the processing of weakly correlated features as well as shared and distinctive features in the anterior and more posterior temporal lobes. These findings support the notion that ongoing recurrent processes support semantic differentiation and that recurrence increases when the demands to integrate semantic information increase.
Our results showed effects of relative distinctiveness lateralized to the left ventral stream and effects of correlational strength lateralized to the right ventral stream. These lateralized effects are consistent with a model of object recognition that posits that the left hemisphere is better suited for processing feature information and the right hemisphere for processing feature configurations (Marsolek 1999; Dien 2009). The relative distinctiveness measure employed here captures the degree to which a concept's features are more shared or more distinctive, and as such is a semantic measure reflecting the characteristics of individual features. In contrast, correlational strength captures the degree to which a concept's features are likely to co-occur and thus captures the relationship between features (i.e., a property of their configuration). Specifically, our results show increased right hemisphere ventral temporal responses when a concept's features are more weakly correlated, that is, concepts that require additional processing of feature relationships because the automatic coactivation of their features is reduced compared with concepts with strongly correlated features. This further suggests that highly correlated features may in fact be coded as unitary features by virtue of their high co-occurrence and therefore require less configural processing of feature relationships supported by the right hemisphere. The parallel, but lateralized, effects we find between 200 and 300 ms may therefore reflect processing in the 2 hemispheres that is differentially sensitive to different aspects of conceptual representations, although it is also likely that both hemispheres are able to support these aspects of conceptual processing.
One final note concerns the degree to which our observed effects of relative distinctiveness and correlational strength truly reflect semantic processes or simply reflect visual characteristics of the objects. Our feature-based statistical measures were calculated using both visual and nonvisual semantic feature information on the assumption that both types of semantic information are rapidly activated by perceptual information. This position is consistent with the hierarchical interactive theory (Humphreys and Forde 2001) of visual object processing that predicts a cascade-like sequence of effects, where perceptual processing rapidly activates the associated (semantic) information related to the object. In this manner, some degree of semantic information about the object, including nonvisual information, is rapidly accessed and in turn interacts with ongoing perceptual processes. Additional experimental evidence for the rapid activation of semantic information comes from an EEG study employing a picture-word interference paradigm. Dell'Acqua et al. (2010) compared EEG signals with semantically related words written on object images with semantically unrelated words written on object images and found an early effect of semantic relatedness peaking at 106 ms. Since this semantic relatedness effect depends on the semantic processing of both the picture and the word, this result indicates that more abstract, semantic aspects of objects are processed rapidly. The present findings are consistent with both views above but importantly provide a more elaborate account of the earliest stages of meaningful object recognition by identifying the underlying nature of the rapidly accessed semantic information.
Our analysis captures the evoked phase-locked aspects of meaningful visual object recognition but not the induced effects. It may be that some aspects of recurrent processing are not phase locked, however, previous MEG studies highlighting early top-down and recurrent processes show such effects can be captured with evoked analyses (Bar et al. 2006; Clarke et al. 2011). However, although our analyses may capture many aspects of recurrent activity, it is possible that there are additional high frequency and nonphased-locked aspects of the MEG signals that may not have been captured.
Our results show that the statistical regularities of our semantic knowledge are reflected in neural processes underlying the basic-level identification of visual objects. Moreover, we have been able to go beyond previous accounts by identifying the nature of the semantic information that is rapidly accessed (for similar findings using visual words, see Hauk et al. 2006), while incorporating the findings with the known neurobiological mechanisms that support visual object processing. Critically, we have shown that dynamic neural responses underpinning visual object recognition are related to various forms of semantic knowledge and are accomplished within the context of feature-based statistics, which provide a framework within which to operationalize and quantify different forms of semantic knowledge. However, our results do show that feature-based statistical measures incorporating the sharedness and distinctiveness of features, and the correlation between features, are crucial factors underpinning the conceptual processing of objects.
The current study is one of a growing number of studies that highlight the advantage of regression approaches to analyzing M/EEG data, enabling the characterization of how multiple variables influence neural activity within the same data set. The results reported here show a rapid transition from perceptual to conceptual processing as activity spreads along the ventral temporal lobe. The rapid semantic effects related to shared semantic features that are informative about what type of thing the object is. In contrast, responses beginning after 200 ms throughout the ventral stream into inferior frontal regions were associated with weakly correlated features and both shared and distinctive features, suggesting that the emerging representation is becoming more fine-grained incorporating the more distinctive semantic attributes of the object for basic-level recognition. Incorporating the current findings with neurobiological processing mechanisms suggests that initial coarse representations based on perceptual and shared semantic information are predominantly underpinned by the initial feedforward processing, while recurrent activity largely involving the anterior and posterior temporal lobes was associated with integrating the concept's more distinctive features. These findings support a feature-based account of meaningful object representations as well as an account whereby there is a continued interplay of perceptual and conceptual processes, while the emerging conceptual knowledge evolves from a coarse-to-fine–grained representation.
This work was supported by funding from the European Research Council under the European Community's Seventh Framework Programme (FP7/2007–2013)/ERC Grant agreement no. 249640 to L.K.T., a Medical Research Council (UK) programme grant (G0500842 to L.K.T.), a Swiss National Science Foundation Ambizione Fellowship to K.I.T., and a Medical Research Council (UK) Doctoral Training Grant (G0700025 to A.C.).
We would like to thank Dr Olaf Hauk for his help and advice during data analysis. Conflict of Interest : None declared.