## Abstract

The modular variation of organismal form during evolution or development can be viewed as the result of dissociated local developmental processes. Whereas studies of modularity often are experimental, morphological integration is a more descriptive concept applying to groups of correlated phenotypic characters. Using simple path models, this paper shows that the classic underlying assumption of modularity (high correlations within modules, lower correlations between modules) is met only when local developmental factors have high effects on the traits relative to all factors' variations of effect (i.e., allometry). Accordingly, many classic approaches to morphological integration are meaningful only when local as well as common growth factors are nearly isometric. We show that morphometric modules might instead be defined more generally as sets of variables with non-zero within-module covariances, even when the covariances due to common factors have been removed, so that the residual between-module covariances are all near zero. Although it is still inherently unreliable to identify modules from phenotypic covariances in nonexperimental data, patterns of integration can sometimes be estimated based on prior identification of the modules themselves. We outline a simple approach for this case using Wright-style factor analysis and demonstrate the relation of its algebra to the more familiar approach via partial least squares.

Modularity is a property of complex structures or processes that are experimentally or conceptually separable into several “near-decomposable” (Simon, 1962, 1969) modules. This notion of modularity is entertained in a wide range of disciplines (e.g., biology, psychology, computer science, economics), for which diverse operational criteria of what is a module have been devised. Some authors also reasoned that it is useful to construe the general concept of modularity as the decomposition of the explanatory task itself (e.g., von Dassow and Munro, 1999; Callebaut, 2005). For biological modules, Bolker (2000:773) outlined three general requisites: (1) a module is a biological entity (a structure, a process, or a pathway) characterized by more internal than external integration; (2) modules are biological individuals that can be delineated from their context, and the behavior or function of which reflects the integration of their sum; and (3) a module can be delineated from other entities with which it interacts in some way. Modules of a biological system consequently exhibit some degree of independence as they vary over development, evolution, populations, or experimentally induced conditions. Further definitions and examples of biological modules can be found in Raff (1996), Wagner (1996), von Dassow and Munro (1999), Winther (2001), Schlosser and Wagner (2004), Brandon (2005), and Callebaut and Rasskin-Gutman (2005). Related concepts are the notion of “quasi-independent” characters by Lewontin (1978) and the idea of “gene nets” proposed by Bonner (1988).

There is wide agreement among biologists that modularity is a prerequisite for the hierarchical phenotypic organization of higher organisms as well as for the appearance of complex adaptations. Quantitative genetic models show that traits serving a common functional role tend to evolve a common genetic basis that is more or less independent from other gene nets (Riedl, 1978; Cheverud, 1994, 1996; Wagner, 1996; Wagner and Altenberg, 1996). Conversely, genetic and developmental interdependencies specifically constrain the direction of evolution because such correlated traits frequently evolve together (Lande, 1979; Cheverud, 1994). Wagner and Altenberg thus underscored the importance of a *modular genotype-phenotype map* to increase evolvability—the speed and capacity of evolution.

The notion of modularity—recognized as the *dissociability* of developmental processes—was already implicit at the founding of experimental embryology by Wilhelm Roux in the late 19th century (Raff, 1996). In his attempt to unveil “developmental mechanics,” Roux (1894) perturbed individual components or particular processes in an embryo, expecting that other parts of the embryo would proceed in their development. The first explicit definition of dissociability was provided by Joseph Needham in 1933 in the first extended treatise on this topic.

In the development of an animal embryo, proceeding normally under optimal conditions, the fundamental processes are seen as constituting a perfectly integrated whole. They fit in each other in such a way that the final product comes into being by means of a precise co-operation of reactions and events. But it seems to be a very important, if perhaps insufficiently appreciated, fact, that these fundamental processes are not separable only in thought; that on the contrary they can be dissociated experimentally or thrown out of gear with one another. This conception of out-of-gearishness still lacks a satisfactory name, but in the absence of better words, dissociability or disengagement will be used in what follows. (Needham, 1933:180.)

Needham, as a “chemical embryologist,” provided a series of examples about the experimental dissociation of processes like growth, cell division and differentiation, hormone activities, or metabolic processes. Heterochrony, too, is a frequently cited example of dissociation and refers to the *temporal dissociation* of growth and development or of particular developmental processes (Gould, 1977; Alberch et al., 1979; Zelditch, 2001; Mitteroecker et al., 2005). Ernst Haeckel (1866), who invented heterochrony as a class of exceptions to his theory of recapitulation, contrasted it to heterotopy, which can be regarded as a form of *regional dissociation* (Gould, 1977; Zelditch and Fink, 1996; Mitteroecker et al., 2004a).

Needham regarded it as important—even surprising—that the fundamental developmental processes in an organism that are usually “seen as constituting a perfectly integrated whole” are in fact separable. Evolutionary biologists and paleontologists, in contrast, often emphasize variations among, and differences in patterns of change (Winther, 2001). For instance, cladistics, as a central method in systematics, formally assumes that all measured phenotypic traits are uncorrelated and that they provide independent evidence for phylogenetic history. In the 1950s Everett Olson and Robert Miller, two paleontologists, therefore devoted a series of papers and one book to yet another “surprise”:

It seemed evident to the writers, as it has to others, that character changes occurring in evolution of species could not be considered to be independent of each other and that studies which did not consider this dependency ignored a significant aspect of change. Not only should the interrelationships of changing characters be a primary point of interest, but the nature and intensity of the relationships should remain evident at all stages of their study. (Olson and Miller, 1958:1.)

Since then, the morphometric study of those interrelationships among characters is usually referred to as *morphological integration* and is based on the comparison of covariances or correlations among morphological measurements from specimens of a single or several distinct natural populations. The fundamental idea is that within the set of measured variables there exist subsets (“ρ-sets” in Olson and Miller's terminology) exhibiting relatively high mutual phenotypic correlations, whereas correlations among variables pertaining to different sets are comparatively low. Elements of those subsets are often assumed to share functional, developmental, or evolutionary properties.

Even though this concept of morphological integration is usually attributed to Olson and Miller, the very same idea had already been formulated more than 20 years earlier in a German article in *Biometrika* by the Russian Paul Terentjev (1931:26):

Nehmen wir ein Objekt als eine Gesamtheit von Merkmalen und berücksichtigen wir alle denkbaren Kombinationen ihrer korrelativen Zusammenhänge. Schon die oberflächliche Beobachtung zeigt, dass die Gleich mässigkeit solcher Korrelation der Merkmale einen ziemlich seltenen Fall bildet. In der Regel gruppieren sich die Merkmale in einigen “Gesell schaften”, die “Korrelationsplejaden” genannt werden können. Die Hin- und Herbezüge zwischen den Gliedern der verschiedenen Plejaden zeigen die Neigung zum Mindestmass. [Let us take an object as a totality of characteristics and consider all their mutual correlations. Even a superficial observation shows that the evenness of those correlations would be a quite rare case. Typically, traits are grouped into some “societies” that can be called “correlation pleiades”. The relationships among elements from different pleiades show a tendency to be minimal.]

Terentjev argued that traits are usually grouped according to their correlations in “correlation pleiades” where correlations among elements of *different* pleiades are low. In his 1931 paper, he proceeds by applying a battery of statistical and graphical techniques to identify such correlation pleiades in measurements on frogs. In contrast to Terentjev, Olson and Miller (1951) explicitly related such sets of correlated variables to biological hypotheses—that is, to a grouping of the variables (F-sets) according to their developmental or functional relationships. Chernoff and Magwene (1999:319) consequently defined morphological integration as “the correspondence of patterns of covariation among traits to a priori or a posteriori biological hypotheses.” Yet, in their 1958 book, Olson and Miller discarded the a priori construction of F-sets to derive hypotheses a posteriori from ρ-sets only. This reformulation of the original approach has not remained uncriticized (see Chernoff and Magwene, 1999). Also, the original statistical maneuvers used by Olson and Miller have been regarded as complicated and partially inadequate (e.g., Bock, 1960; Van Valen, 1965).

Referring to Olson and Miller's seminal treatise, a series of statistically more elaborate methods to study morphological integration have been suggested meanwhile (for reviews see, e.g., Chernoff and Magwene, 1999; Mitteroecker, 2007). These methods vary in their purpose and strategy. Whereas several techniques are concerned to empirically *test* a priori formulated models of integration, often based on specific *measures of integration*, other methods are designed to *identify* modules in an exploratory fashion. Another bundle of techniques *compares* integration patterns among different taxa or patterns attributable to different causes. Finally, some approaches, intended to *model* integration and modularity of several anatomical parts, usually incorporate both a priori knowledge and empirical morphometric data. All these developments seem to have taken place without the recognition of Terentjev's work. Still, the concept of correlation pleiades has been taken up by several Russian biologists, particularly botanists (Berg, 1960; Koloslava, 1971; Rostava, 1985), and it has influenced research in the modularity of plants (for a recent review see Murren, 2002). Recently, some authors also addressed morphological integration in the context of phylogenetic inference (e.g., O'Keefe and Wagner, 2001; Strait, 2001; Athreya and Glantz, 2003).

Modularity and morphological integration appear as complementary concepts, and the current literature frequently equates groups of correlated variables to modules. When considered in more detail, however, the two concepts exhibit practical and conceptual differences. In biology, modularity typically denotes the (experimental) dissociability of developmental, functional, or evolutionary processes. Morphological integration, in contrast, is about the decomposition of morphological traits according to their (phenotypic) covariance structure compared across adults and/or subadults of a single taxon or across several taxa. Hence, modularity is typically regarded as a property of processes, whereas integration is a specific pattern of variation and covariation among parts that these processes brought about. Berg (1959) wrote: “The presence of correlations between, for instance, the dimensions of some parts of an organism and the absence of correlations between the dimensions of these and the other parts of the same organism indicate the independence of certain developmental processes with respect to other processes within the organism.” The fundamental assumption in most studies of morphological integration is hence that ρ-groups or correlation pleiades match or reflect the dissociation of local developmental processes.

In the present paper we attempt to formalize the intuitive relationship between morphological integration and modularity by a heuristic use of path models simulating particular developmental systems. We will show that groups of correlated characters *do not* reflect developmental modules unless certain conditions are met. Yet we demonstrate that under given notions of modularity, the *pattern* of morphological integration can be estimated. Furthermore, we will explore statistical properties of several published methods to assess morphological integration, and we will discuss the conditions under which they may reflect biological modularity. We will thereby focus on methods for testing, identifying, and measuring integration; the comparison of integration patterns will be the subject of a forthcoming paper. The following three sections present these arguments in an algebraic form, whereas the subsequent examples illustrate most of them via modeled covariance structures. The final discussion summarizes the arguments and their consequences for the morphometric assessment of modularity.

## Path Models of Developmental Processes

In the early 20th century, Sewall Wright (1918, 1921, 1934) introduced the formalism of path models to quantify causal relationships among observed variables (for a recent review on path models and causality, see Pearl, 2000). Frequently applied in the social sciences, path models are less often encountered nowadays in biology and morphometrics. They allow for a conceptualization and computation of the effects of different (measured) factors on several response variables.

Consider first the simple path model of Figure 1, where it is assumed that the variable *A* affects *B* and *C*. Without loss of generality, take *A* to have a mean of 0 and a variance of 1. The path coefficient from *A* to *B*, i.e., the causal effect of *A* upon *B*, is *b* and the path from *A* to *C* is *c*. Furthermore, to both variables *B* and *C* are attributed error variables representing unexplained variance that have means of zero and are uncorrelated with all other variables. Algebraically, this model may be expressed as *B* = *bA*+*e*_{1} and *C* = *cA*+*e*_{2}, where the *e*'s are “errors” around the two predictions. The variance of *B* is then *b*^{2}+Var(*e*_{1}) and that for *C* is *c*^{2}+Var(*e*_{2}). As the error terms are uncorrelated among themselves and with *B* and *C* and as Var(*A*) = 1, the covariance between *B* and *C* is simply *bc*. The correlation between *B* and *C* is

In Sewall Wright's classic expositions, the deviation of a variable from its mean is expressed as a multiple of its standard deviation. Instead of covariances, then, he addresses correlation coefficients, and path coefficients describe the explained variance for the standardized responsive variables. In his approach, the three variables *A*, *B*, and *C* in Figure 1 would all have unit variance and the *correlation* (instead of the covariance) is *bc*. This requires that Var(*e*_{1}) = 1−*b*^{2} and Var(*e*_{2}) = 1−*c*^{2} so that the denominator in Equation (1) cancels out.

Accordingly, both correlation and covariances can be interpreted in the light of path models. In traditional morphometrics, which is frequently applied to variables having different units, analyses are often based on correlations (see, e.g., Blackith and Reyment, 1971; Marcus, 1990). Similarly, most published methods for studying morphological integration rely on correlations. In geometric morphometrics, instead, the variables all possess the same units and analysis proceeds via covariance matrices (Bookstein, 1991). Therefore, most of the following discussion is about covariances instead of correlations—variables are left unstandardized. Also, the implicit error model when working with covariances is more consistent with other morphometric models than that of correlations. Nonetheless, the arguments of this paper apply to correlations without substantial change, and hence to the published methods on integration, whenever the conventional error model is assumed.

### Common and Local Factors

The morphology of an organism is determined by a variety of factors (sensu lato) that unfold their morphogenetic effects during ontogeny. In defining different kinds of integration, Cheverud (1996) distinguished among developmental, functional, genetic, and evolutionary causes, all of which lead to statistical associations among morphometric variables. Others are particularly concerned to differentiate between genetic developmental factors and epigenetic factors (e.g., Müller and Olsson, 2003) such as tissue interactions or environmental factors. In the following approach, these diverse kinds of developmental factors will not be specifically distinguished—they are assumed simply to affect the measured morphological variables in an additive way. Empirical identification of the sources of these effects can be achieved only by appropriate experimental designs. Yet, certainly many developmental factors do not affect the organism in a quantitative and additive way. We know, for example, that developmental events are often threshold-dependent (such as tissue induction) and may cause the emergence of entirely new structures instead of solely quantitative variation. Studies of morphological integration, however, are typically based on the comparison of adult or postnatal specimens from closely related species. The observed interindividual *differences* are assumed to be due to differences in such quantitative and additive factors, insofar as more fundamental developmental processes were identical in all compared organisms.

Fundamental for the following arguments will be the distinction between common and local factors. Global or common factors (such as circulating growth hormones, early developmental events, or environmental influences) are those affecting all the variables under study. In contrast, local factors affect only a subset of the given variables. Many genes, for example, are expressed only in specific tissues or organs; the effect of many tissue interactions might be primarily local. Also, structural or topographical separation of anatomical parts will contribute to the isolation from developmental events at other sites. In the following models, in order to keep the arguments as clear as possible, we assume that local factors are non-overlapping and not nested. However, it is straightforward to extend these models to more complex and realistic factor structures—the crucial properties and the ensuing conclusions remain the same.

Let us assume that nine variables *X*_{1}, …, *X*_{9} are influenced by one common factor *A* and two local factors *M* and *N* affecting four and five variables, respectively (Fig. 2). The factors are assumed to be mutually uncorrelated as indicated by the absence of any direct paths relating the factors in the path model. The covariance between two variables *X*_{i} and *X*_{j} due to the common factor is *a*_{i}*a*_{j}, so that the contribution of *A* to the covariance matrix Σ is the outer product *a**a*^{t}, where ** a** is a column vector containing all path coefficients

*a*

_{1},

*a*

_{2}, …,

*a*

_{9}from

*A*to the variables

*X*

_{1}through

*X*

_{9}. The local factors contribute to the respective parts of the covariance matrix

**Σ**

^{I}= Σ

_{1… 4,1… 4}and

**Σ**

^{II}= Σ

_{5… 9,5… 9}only. The additional covariances due to the local factors are

*m*

*m*^{t}and

*n*

*n*^{t}, where

*m*^{t}= (

*m*

_{1},

*m*

_{2},

*m*

_{3},

*m*

_{4},0,0,0,0,0) and

*n*^{t}= (0,0,0,0,

*n*

_{1},

*n*

_{2},

*n*

_{3},

*n*

_{4},

*n*

_{5}) so that

In the present approach, the factor structure of the path model shall represent known or hypothesized developmental processes that cause a particular covariance structure among the phenotypic variables in a sample of organisms. The subsets of variables that are separately influenced by local factors (*X*_{1}, …, *X*_{4} and *X*_{5}, …, *X*_{9} in Fig. 2) will be called *morphometric modules* or simply *modules* in the remaining text. They are subject to separate and dissociated developmental control and thus may vary to some degree independently. The “near-decomposable” character of the modules can be characterized effectively by the residual covariance structure when the effect of the common factor(s) is removed from the raw covariances:

*** =**

*m**m*

_{1… 4,1… 4},

*** =**

*n**n*

_{5… 9,5… 9}, and

**0**is a matrix of zeros. That is, the within-module variables exhibit a non-zero covariance structure even when the common factors have been eliminated so that the cross-block covariance matrix Σ

^{I,II}becomes a zero matrix. Modular shape variation, therefore, is the residual variation after common factors have been correctly accounted for (see also Hansen, 2003). Modules are those blocks of variables that exhibit a zero cross-block covariance structure after adjustment for global factors. (For an observed sample, of course, these covariances are not exactly zero and (3) is only an approximate equality.) After accounting for common factors, the variables within one module remain correlated but are uncorrelated with those from other modules. Note that there exist several different approaches to account for factors; see the next section for more details.

## Estimating Common and Local Factors

There exists an extensive literature on the statistical estimation of unmeasured factors underlying measured variables. Such techniques, usually referred to as exploratory factor analysis, were invented in the course of early 20th century attempts by Karl Pearson, Charles Spearman, and others to measure human intelligence (Johnson and Wichern, 1998). They assumed that a correlated set of measured variables *X*_{1},*X*_{2},…,*X*_{p} could be usefully modeled as linearly dependent on a smaller number *k* of common factors *F*_{1},*F*_{2},…,*F*_{k} plus *p* additional sources of variation *e*_{1},*e*_{2},…,*e*_{k} that are uncorrelated with each other and with the factor scores so that

**X**is the

*n*×

*p*data matrix,

**F**the

*n*×

*k*matrix of factor scores,

**A**the

*p*×

*k*matrix of factor loadings,

**E**the

*n*×

*p*matrix of residuals, and the superscript

*t*denotes a matrix transpose. The residual term is usually taken to be the sum of two uncorrelated parts, the specific part and measurement error. The specific part contains that part of

*X*

_{i}unaccounted for by the factors and not due to measurement error (Reyment and Jöreskog, 1993). The coefficient

*a*

_{ij}is called the

*loading*of the

*i*th variable on the

*j*th factor—the matrix

**A**is called the matrix of factor loadings. The expected covariance matrix is then where Φ is the covariance matrix of factor scores and Ψ is the diagonal covariance matrix of residuals. In order to estimate the factor loadings

**A**, most approaches assume that the factors are uncorrelated and of the same unit variance, so that Φ =

**and**

*I*Factor analysis provides estimates of **A** best accounting for the covariances (or correlations) among the variables under a wide range of meanings for the idea of “best.” One classic criterion for fitting the factor model to the data is to minimize the sum of squares of all the elements of **S**−**AA**^{t}−Ψ, where **S** is the sample covariance matrix. If Ψ were known, one could fit **AA**^{t} to **S**−Ψ by an eigenvalue decomposition as familiar from principal component analysis. In practice, Ψ is not known and thus is estimated from the data, e.g., as the squared multiple correlation coefficient.

The *iterated principal factor method* avoids an explicit estimation of Ψ: Start with any first guess ** Â**, perhaps the first principal component of

**S**, and compute = diag(

**−**

*S*

*Â*Â^{t}). The new estimate

**Â**is then

*u*_{1}√λ

_{1}where

*u*_{1}and λ

_{1}are the first eigenvector and first eigenvalue, respectively, of

**−. Update and repeat these steps until convergence. This classic method has meanwhile been superseded by other, maximum likelihood–based approaches (for reviews see, e.g., Harman, 1976; Bartholomew and Knott, 1999), but for the present purpose we will stay with principal factor analysis because it closely relates to other methods in the context of morphological integration.**

*S*It is well known that when neither the factors nor the additional variation are measurable, a unique solution to Equation (4) is not possible without additional constraints. This indeterminacy is due to the fact that any set of factors may be transformed linearly into another set of factors having the same fit. In practice, an initial factor solution is rotated until some formal criterion or a better scientific interpretability of the factors is achieved. A satisfactory solution is often called a *simple structure*: According to Thurstone (1947), a variable should not depend on all factors but only on a small sublist of them. Moreover, one factor should only be connected with a small sublist of the variables. These criteria closely resemble the definition of modularity in Figure 2 and Equation (3), yet without any further constraints or scientific input such a simple structure is more a convenient representation than a biologically meaningful factor. The application of exploratory factor analysis to morphometric data (e.g., in Sokal, 1961, or Gould and Garwood, 1969) cannot identify modules from morphometric measurements alone. See also Zelditch (1987) for a discussion of exploratory and confirmatory factor analysis in that context.

### Wright-Style Factor Analysis

Sewall Wright (1932) suggested a distinctive conceptual approach to the analysis of covariance structures, even though its computation is largely identical to principal factor analysis. Wright required that his factors be “consistent”; that is, that the loadings of a factor simultaneously be the coefficients in the corresponding path diagram of their scientific relevance. In his 1932 paper, he applied his method to six measures on Leghorn chicken bones, identifying one common factor and three local factors. As his data consisted of measured distances, he specifically interpreted his results as primary and secondary *size factors*.

Wright begins his analysis by iteratively computing a first estimate of the vector of path coefficients ** a** for a common factor where

_{ii}, the variance of the

*i*th variable is taken as

*a*

_{i}

^{2}(Bookstein, 1991:40). Bookstein et al. (1985) explained that, at convergence, the vector

**is also identical to the first eigenvector (scaled by the square root of the first eigenvalue) of the correlation matrix when the respective diagonal elements are replaced by**

*a**a*

_{i}

^{2}. This, in turn, is the limit of convergence for the principal factor algorithm described above, so that Wright's first step in estimating the common factor is numerically identical to that approach.

Both algorithmically and conceptually, the following steps distinguish Wright's method from other factor analysis approaches. Based on the pattern of residual correlations after subtracting the effect of the common factor, he explicitly states speculations about sets of variables that are separately affected by specific or secondary factors. He specified sets or blocks of variables that are each responsive to one such group factor. Those correlations among variables pertaining to the same group are sequestered from the analysis of an updated estimate of the common factor (as had been the diagonal elements of Σ in the step before). For his data of six variables, Wright ended up with one common size factor (estimated only from the correlations *between* the specified subsets of variables) and three local size factors (estimated from the residual correlations *within* the subsets of variables), each influencing two variables. In Bookstein's reformulation of Wright's algorithm, the vector of loadings on the first common factor **a** is the scaled first eigenvector of the appropriately sequestered correlation matrix; it follows that the products *a*_{i}*a*_{j} are a least squares fit to the elements σ_{ij} for all *i* and *j* not belonging to the same subset of variables; that is, to all between-block correlations or covariances (for a proof see Mitteroecker, 2007).

Wright's algorithm to extract and rotate factors conveniently enables the incorporation of empirical or prior knowledge about morphological modules in order to estimate common factors. In (5) or in Bookstein's algorithm, all covariances σ_{ik}*within* one module will be replaced by *a*_{i}*a*_{k}. The iterative estimation of **a** is therefore solely based on the *between*-module covariances. Local factors are computed subsequently from the within-module covariances after the covariance attributable to the common factor has been subtracted so that

*calibrate*(Martens and Naes, 1989) vague notions of common and local factors.

### Partial Least Squares

For a two-block or two-module design, the variables can be ordered so that the between-module covariances are all contained in one off-diagonal block of the covariance matrix (the cross-block covariance matrix). Sampson et al. (1989) and Bookstein et al. (1996) employ a singular value decomposition (SVD) of this matrix to arrive at a two-block partial least squares (PLS) analysis (Wold, 1966). Let Σ^{I,II} be the cross-block covariance matrix between the first and second blocks of variables **X**^{I} and **X**^{II} and decompose this matrix such that

**U**and

**V**are the matrices of left and right singular vectors and Λ is a diagonal matrix of singular values in descending order of absolute value. The first pair of these vectors,

**u**

_{1}and

**v**

_{1}, maximizes Cov(

**X**

^{I}

**u**

_{1},

**X**

^{II}

**v**

_{1}); that is, the covariance between the scores along the left and right first singular vectors. The outer product

**u**

_{1}

**v**

_{1}

^{t}scaled by the first singular value λ

_{1}is the rank-1 least squares fit of Σ

^{I,II}. It follows that and further that where

*a*^{I}are the elements of

**a**corresponding to the first block and

*a*^{II}those for the second block. The first common factor, therefore, differs from the two-block PLS solution only by virtue of this separate scaling block by block. For more blocks of variables, this identity is not exact any more. There is no single generalization of the two-block PLS algorithm, but instead there exist a range that yield slightly different results (Bookstein et al., 2003). Our numerical experiments show that Wright's algorithm yields results that are very close to all these PLS generalizations.

PLS has recently been suggested as a method to model the pattern of integration among two or more blocks of morphometric variables (Rohlf and Corti, 2000; Bookstein et al., 2003). The numerical identity between Wright's factor approach and PLS when both are applied to a two-block design and their close resemblance for more blocks is somewhat surprising inasmuch as these approaches were invented for rather different purposes. PLS is a method for multivariate prediction or calibration optimizing the covariance between the blockwise scores. Wright's approach is a kind of factor analysis modeling correlations among the original variables. Their numerical similarity, however, sheds light on the interpretation of PLS when applied to study morphological integration and modularity. The PLS loading vectors may serve as (separately scaled) loadings on a common factor assuming a specific local factor structure (i.e., a specific notion of modularity). When they are visualized or interpreted as one joint shape deformation—when they serve as singular warps—they have to be rescaled appropriately because the *amount* of shape change depicted by one (unit-length) PLS loading vector does not necessarily correspond to the associated amount of shape change depicted by the other (unit-length) loading vector.

### Scaling of Common Factor Loadings

For the common factor **a** in the two-block case, the products *a*_{i}*a*_{j}, for all *i*,*j* from different blocks, are minimizing the squared deviations from σ_{ij}. There is, however, no unique criterion for scaling **a**^{I} and **a**^{II}, or equivalently for **u** and **v** in the PLS algorithm, because

*f*and

*g*are any two arbitrary scalars. As the least squares algorithm is not based on any within-block covariance, the separate scales of these vectors cannot be meaningfully constrained. In PLS with any number of blocks, by convention they are standardized to vector length one while in Wright's iterative algorithm, when applied to a two-block design, the relative scaling is affected by the initial guess of

**. The common factor loadings can thus be interpreted only relative to the other loadings of the same block but not to those of the other block(s). A joint visualization of the unit length PLS loading vectors as done in Bookstein et al. (2003) is not guaranteed to be meaningful. For PLS, this problem remains for multiple-block designs, whereas Wright's algorithm scales the parts of**

*a***a**correctly whenever more than two subsets of variables are used in the algorithm.

For three-block PLS, a simple solution to scale the loading vectors is obtainable by extending the rationale behind the SVD approach for two blocks. Regard the three PLS vectors **u**, **v**, and **w** for a three-block design as approximations of the three cross-block covariance matrices when scaled appropriately. Let *r*^{1,2}, *r*^{1,3}, and *r*^{2,3} be the regression coefficients of the cross-block covariances on their respective estimates through the PLS vectors, so that *r*^{1,2} is the regression coefficient of the elements of
σ^{I,II}_{ij} on the elements of *u*_{i}*v*_{j} and so forth. In two-block PLS, this regression coefficient is equal to the covariance between the PLS scores. This property holds true for more than two blocks and allows another computation of these coefficients:

*d*_

**,**

*u**d*_

**, and**

*v**d*_

**relate to those regression coefficients or covariances as and the solution to these equations is**

*w*If **u**, **v**, and **w** are vectors of shape coordinates, the composite vector (*d**u*u^{t}|*d**v*v^{t}|*d**w*w^{t}) can then be visualized and interpreted jointly as one shape deformation.

The loadings of the first principal component of the block-wise common factor scores may serve as approximations of the scaling factors that are applicable for any number of blocks. Let **S** be the *n* × *q* matrix of block-wise common factor scores (*X*^{I}*u*^{I}| *X*^{II}*u*^{II}|…| *X*^{q}*u*^{q}), where *q* is the number of blocks or modules and *u*^{i} the first singular vector for the *i*th block, and decompose it as ** S** =

**Γ**

*C*

*D*^{t}where

**D**is the

*q*×

*q*matrix of right singular vectors or eigenvectors and Γ the diagonal matrix of singular values. The elements of the first column of

**D**, denoted as

*d*

_{1}…

*d*

_{q}, now contain the weightings for the PLS vectors so that

### Factor Scores

In his published work, Wright did not compute scores for his factors, whereas the computation of scores is implicit in the PLS algorithm. The computation of scores allows one to compare specimens or other aspects of the data such as group means or regressions. In exploratory factor analysis, scores are often estimated by a weighted least squares method or through regression, whereas PLS scores are computed as linear combinations **X**^{I}**U** and **X**^{II}**V** (and similarly for additional blocks). These linear combinations can be conceived as the result of an unweighted least squares procedure where the error variances are treated as equal and where the factors are orthogonal (Johnson and Wichern, 1998). It is thus convenient to compute scores for the common factor simply as linear combinations **Xa**^{s} where **a**^{s} = **a***/|**a***| and |…| denotes the vector norm. As is customary in PLS studies on morphological integration, it might also be interesting to plot **X**^{I}** u** versus

**X**

^{II}

**and similarly for more dimensions or blocks.**

*v*Wright (1932) did not extract more than one common factor for his data, whereas the estimation of subsequent factors is ubiquitous in both factor analysis and PLS. Several published variants of PLS, however, differ in how dimensions are removed from the data in order to estimate subsequent dimensions. In Wright's algorithm and many other factor analysis approaches, further factors (like local factors) are computed from the residual covariances/correlations after the effect attributable to the first factor has been subtracted from the raw covariances/correlations (“matrix deflation”). Alternatively, factor scores or PLS latent variable scores can be regressed out from the data, or dimensions might be projected out (as in two-block PLS when performed by an SVD). Although the first approach is consistent with the idea of path models, orthogonal or uncorrelated factors are more convenient for further statistical procedures.

## Clustering Variables by Observed Correlations

In order to identify correlation pleiades, Terentjev (1931) plotted the distribution of all pairwise correlation coefficients and interpreted the lower range of them as the correlations *between* pleiades and the higher correlations as those among traits *within* the same pleiade. He then graphed the higher range of correlations in a “correlation ring” that visually supports the identification of clusters of variables. In Olson and Miller's (1951, 1958) approach for exploring morphological integration, ρ-sets are constructed from the correlation matrix of morphological measures. Formally, they define ρ-groups as “all maximal classes of measures such that within each class all pairs have ρ greater than the selected level” (1958:46). Of course, Olson and Miller recognized that such groups often overlap and that some variables may have “bonds” (i.e., high correlations) to several groups.

The rationale underlying both approaches is that all within-module correlations are higher in their absolute value than correlations between modules. (Terentjev as well as Olson and Miller took all correlations as positive values.) But this is a very demanding assumption, particularly in applications to shape coordinates. In Figure 2, the modeled correlation *r*_{i,j} is *a*_{i}*a*_{j} if *i*,*j* are from different modules, and *a*_{k}*a*_{l}+*m*_{k}*m*_{l} if *k*,*l* are from the same module. Evidently, *r*_{k,l} exceeds *r*_{i,j} only if the path coefficients **m** for the local factors are non-zero and of the same sign as the common factor loadings (compare also Cheverud, 1994; Gromko, 1995; Pigliucci, 2006). Furthermore, for

*all*corresponding pairs of correlations, the path coefficients

**a**of the common factor need to be relatively homogeneous as compared to the local effects. If the path coefficients of all factors are equal, then all within-module correlations are indeed higher than the ones between—but this implies that all factors induce isometric growth. For real data this is very unlikely; owing to growth allometries and the geometry of the measurements (see below), not all variables are affected to the same amount by a developmental factor. Shape coordinates, in particular, cannot all increase together. It might then happen, e.g., that two variables from different modules are relatively strongly affected by a common factor inducing high covariance between those variables. Simultaneously, variables within the same module might only be marginally affected by common factors so that their correlation is lower even in the presence of local factors that increase the correlation. The success of Olson and Miller's approach hence depends on the variation of all the path coefficients. The variation of these coefficients is, in fact, an estimate of the

*allometry*of a growth factor: the more the coefficients vary, the more allometric is the shape or form change induced by this factor.

An approach exploiting the same classic assumptions was presented by Leamy (1977) and Cheverud (1982). They applied Q-mode cluster analysis to morphometric variables and to their loadings in multidimensional scaling or principal component analysis. Functionally related measures, so they argued, would end up in one cluster. Q-mode cluster analysis seeks to cluster variables in an *n*-dimensional space, often called dual space, based on a particular distance function. In dual space, variables are usually taken as vectors from the origin and the angle α between two vectors is one typical distance function between the two variables. The correlation *r* between two variables can be expressed in terms of this angle as *r* = cos α and, when the variables are standardized, the Euclidian distance *d* among two variables in dual space is *d* = 2 sin (α/2), so that those two distance functions are closely related. The assertion that variables within the same module might cluster in a Q-mode cluster analysis is therefore equivalent to Olson and Miller's classic assertion that variables within modules are mutually highly correlated and less correlated with variables in other modules. The loadings of the variables on a subset of principal components (scaled by the square root of the respective eigenvalue) relate to a projection of full dual space onto fewer dimensions.

### Partial Correlations

In Terentjev's analysis of 22 measurements on 203 living frogs (*Rana ridibunda*), he selected the snout-anus distance as being representative for overall size and computed partial correlations conditioned on this size measure. Terentjev found that the variables clustered much more clearly based on the partial correlations than on the raw correlations. Because allometric size is usually a dominant factor in morphometric data, this approach is similar to partialing out a common factor. As we set out already (Equation (3)), a modular structure of the covariance matrix is more apparent from these residual covariances or correlations than from the raw correlations. Similarly, Zelditch (1987, 1988) elaborately compared a one-factor model of allometric size with more complex models of morphological integration. Hallgrímsson et al. (2002) tested whether the correlations of traits within tentative modules are higher than expected for integration due to size alone.

For one correlation pleiade that Terentjev identified, he proceeded by searching for subpleiades within that group of traits. He partialed out another variable that had the highest average correlation to all other variables in this group—the variable that he regarded as the most effective indicator for that pleiade. The partial correlations that were conditional on both size and this indicator exhibited two subpleiades, one for the bony elements of the head and one for the bony elements of the limbs. Terentjev's method obviously relates to the factor approach that we presented above when taking his size variable and his indicator variable as rough common factor scores estimates—which they were apparently intended to be.

Magwene (2001) drew conclusions about modularity by grouping variables according to the elements of the inverted covariance matrix (and several derived statistics) that relate to partial correlations conditional on *all* other variables. As indicated by Equation (3), within-module covariances conditional on those variables *outside* the module relate to the residual covariances in (6) after the common factors have been partialed out. But when correlations are conditional on variables from the same module, that partial correlation will be near zero because, per definition, the variables within one module are highly correlated and even the local factors are removed. In Magwene's example, the chicken data from Wright, his approach accidentally worked because all modules consisted of two variables only. The pairwise correlations conditional on all variables, then, are simultaneously conditional only on the variables outside of the modules because there are no further variables in one module. But for any data with more than two variables per module this approach must yield uninterpretable results.

### Geometric Properties of Morphometric Variables

The above discussion as well as the majority of published methods for studying integration implicitly assume a null model of no integration for which the correlations among the variables are zero. Non-zero correlations are expected for integrated variables or those within one module. We have just shown that these assumptions are met only for specific cases. It turns out that the geometric properties of morphometric variables make this classic model of integration even more problematic. For *p* landmarks in *k* dimensions, shape space is of *kp*−*k*−*k*(*k*−1)/2−1 dimensions—the matrix of shape variables thus has *k*+*k*(*k*−1)/2+1 degrees of freedom fewer than it has variables. As a consequence, the *pk* shape coordinates are necessarily correlated. This is true to an even greater extent when describing shape or form by all possible linear distances. For details on the joint distribution of morphometric variables see Bookstein (1991) or Dryden and Mardia (1998).

Closely adjacent structures in an organism are likely to covary in shape and size even if they do not share a common genetic or developmental basis. Roth (1996) argued that regardless of any genetic contingency, structures that occupy adjacent regions in space must be mutually accommodating in volume (“spatial packaging”). Similarly, Enlow et al. (1969) emphasized that in a functional assemblage of bones, certain key dimensions must necessarily correspond between these bones in order to provide proper fit (the “counterpart principle”). Such an association between phenotypic correlation and spatial contiguity, which is also called “Pearson's rule of neighbourhood” (Whiteley and Pearson, 1899; Lewenz and Whiteley, 1902), has been documented for a range of organisms. Sawin et al. (1970), for example, reported an approximately linear decline in the correlation among dimensions of rabbit bones with their spatial distance.

Most statistical and morphometric maneuvers (also those suggested above) do not take into account the spatial structure of the measured variables or their geometric properties and may not be able to separate interesting signals from geometric necessities. In particular, covariances among variables from two distant parts would likely be lower than covariances among more adjacent variables within one part—but this would be true for any spatially coherent parts even in the absence of dissociated genetic control.

Conversely, completely integrated shape variables do not necessarily exhibit non-zero correlations. Imagine a configuration of four landmarks constituting a square that is sheared into a parallelogram. When these forms are superimposed and projected into tangent space, the covariances among Procrustes shape coordinates are all non-zero—as one would expect for this (geometrically) completely integrated shape change (the uniform shape change involves all landmarks). Alternatively, the same forms may be described by the four side lengths and the two diagonals. The side lengths all remain identical across the forms (and are hence mutually of covariance 0); only the diagonals change in a perfectly correlated fashion. Yet, an interpretation in terms of one module consisting of the two diagonals would be greatly misleading. In general, for any landmark configuration there exist geometrically integrated shape changes that leave some variables uncorrelated. Or, conversely, for a given shape transformation that is large scale and thus involves all parts of that shape, there exist sets of descriptors that are arbitrarily correlated. Imagine an unmeasured theoretical space of all possible shape descriptors so that the space of actual measured variables can be regarded as a projection of that. As in PCA, the full space can be rotated and projected so that all or some resulting linear combinations are uncorrelated. The pattern of observed correlation is hence also a function of the measures chosen to describe differences among particular forms (compare also Bookstein, 1991: chapter 6.3).

## Measures of Integration

As morphological integration and modularity are each a matter of degree, several authors have suggested different methods to quantify the extent to which two or more sets of variables are integrated in morphometric data. Such indices can be compared among groups or may serve to identify modules when comparing the actual index to those of other hypothesized modularizations or even random partitionings of variables. Measures of overall integration (without specifying groups of variables) are not considered here.

### Average Correlations and Covariances

Cheverud (1982) suggested comparing the average correlation or average squared correlation within the tentative modules to the average (squared) correlation between the modules. Similarly, Klingenberg and Zaklan (2000) used the average squared covariances to quantify integration among two sets of variables (see also Rohlf and Corti, 2000).

For the path model in Figure 2, the submatrix of covariances within the first module is a sum of the two outer products Σ^{I} = *a*^{I} (*a*^{I})^{t}+ *m*^{I}(*m*^{I})^{t}, where *a*^{I} = (*a*_{1},*a*_{2},*a*_{3},*a*_{4})^{t} and *m*^{I} = (*m*_{1},*m*_{2},*m*_{3},*m*_{4})^{t} (for simplicity we ignore individual error variances). The mean of the elements of the outer product *a*^{I}(*a*^{I})^{t}, denoted as M(*a*_{i}*a*_{j}), equals the squared mean of the elements of **a**, denoted as M^{2}(*a*_{i}), and accordingly for **m** and **n**, so that

^{I,II}, depends only on the common factor so that Σ

^{I,II}=

*a*^{I}(

*a*^{II})

^{t}. The mean of these matrix elements is

The square root of the average between-module covariance is thus an estimate of the average path coefficient of the common factor. When taking M(*a*^{I}_{i})≈ M(*a*^{II}_{i}), the difference between the average within-module covariance and the average between-module covariance is an estimate of the average path coefficient of all local factors:

The situation is slightly more complicated with the *squared*covariances/correlations. It turns out that

*a*

*a*^{t}is a function of both the mean

*and*the variance of the elements of

**a**. This gives similarly for the second module, and

Equation (8) and Equation (9) show that averages of the squared covariances are a function of both the means and the variation of the path coefficients. The variation of the path coefficients is a measure of the allometry induced by this growth factor. Although average covariances or correlations can be related to average path coefficients, that is, to the average effect of factors, the comparisons of average squared covariances/correlations are quite difficult to interpret.

Cheverud et al. (1989) introduced a related approach to evaluate models of morphological integration based on the element-wise matrix correlation between an empirical correlation matrix and a “theoretical matrix” or “connectivity matrix” that formalizes a priori notions of integration. A matrix correlation is the sum of the products of corresponding matrix elements

where*t*

_{ij}and

*r*

_{ij}are the

*i*,

*j*th elements of the theoretical and the correlation matrices, respectively, and

*M*is the number of variables. For the typical theoretical matrices,

*t*

_{ij}is 1 when the

*i*th and

*j*th variables both pertain to the same module or functional unit and is 0 otherwise. It follows that

*r*

_{ij}

*t*

_{ij}is

*r*

_{ij}within the modules and zero between them so that the matrix correlation Γ

_{RT}is the sum of all within-module correlations. Further, Cheverud and colleagues suggested testing the matrix correlation by a quadratic assignment test, also called Mantel's test (Mantel, 1967). This algorithm jointly permutes the rows and columns of one matrix, recomputing the matrix correlation each time. The proportion of permuted matrix correlations exceeding the observed one is a measure of the strength of association between the two matrices. In this application, the original matrix correlation is higher than the permuted ones only if the between-module correlations are lower in average than the ones within. This Mantel's test is therefore a measure of the difference between the average correlation within the modules and the average correlation between them. But it is not a valid significance test as it permutes the variables instead of the cases (cf. Good, 2000; Mantel's original paper compared

*n*×

*n*distance matrices and not

*p*×

*p*covariance matrices).

The above approaches assume that the average correlation or covariance among functionally or developmentally related variables—those within the same module—are higher than the average correlation between such modules, and equivalently for the squared correlations. As both within and between average covariances depend solely on the average path coefficients, this basic assumption is met only if at least the local factors have positive average path coefficients. But this assumption is not necessarily guaranteed to be the case—path coefficients can take on mixed signs. Only for measured interlandmark distances and factors that include a marked size increase will the coefficients necessarily have a positive central tendency. Certainly, this assumption is not met for shape variables and specifically not for shape coordinates.

### Trace Correlation and Correlation among Singular Warp Scores

Recently, two multivariate methods to quantify the degree of integration have been proposed. Klingenberg et al. (2003) applied the trace correlation, which was introduced by Hooper (1959) as one multivariate extension of the coefficient of determination among two sets of variables (see also Mardia et al., 1979:170–171, and Hallgrímsson et al., 2006). The trace correlation ranges from 0, when no part of the variation in one block can be explained by the other block, to 1, when everything can be explained. It is therefore a measure of the amount of variation that is attributable to the common factors: When all variation within one block can be predicted by the other block, there are no local factors and no modular residual variance (at least for that partitioning of the variables). When, in turn, no variation at all can be predicted across the blocks, the variation is completely driven by local factors—the highest degree of modularity. Another published measure of integration among two or more blocks of variables is the correlation between the first PLS scores (or singular warp scores) of those blocks (e.g., in Bookstein et al., 2003; Bastir and Rosas, 2005; Gunz and Harvati, 2007).

Both suggested measures can be directly related to the path model of common and local factors but it turns out that they possess several undesirable properties. First, the trace correlation *r*^{2}_{T} is—unlike the univariate *R*^{2}—not symmetric; that is, *r*^{2}_{T}(** X**,

**)≠**

*Y**r*

^{2}

_{T}(

**,**

*Y***), except when both blocks have the same number of variables. Maybe most seriously, the trace correlation depends on the number of predictor variables. Similarly, the correlation between PLS scores increases with the number of variables and decreases with the number of cases (Fig. 3). When using these two methods to compare the integration among several anatomical regions, those regions should be captured by the same number of variables; when comparing integration in different taxa, these groups should have the same sample size. Yet, a significance test for the correlation among PLS scores as suggested by Rohlf and Corti (2000) is expected to be independent of the variable count.**

*X*## Example 1: Estimating Common Factor Loadings

We present two different examples to further explore some of the methods described above. In this first example, we model the covariances among 30 variables via additive global and local factors according to Equation (3). As shown in Figure 4, we assume one common factor and three non-overlapping local factors that affect modules of 10 variables each. The actual path coefficients—the ‘causal’ effects of the factors on each variable—are drawn from a normal distribution, and then specific variance of 1 is added to each variable. Because the following methods to estimate the common factor are applied to a covariance matrix without error, they are expected to estimate the true path coefficients exactly.

Figure 5a confirms that the Wright-style factor solution correctly matches the true common factor path coefficients. The loadings of the first three-block PLS vectors do not adequately reflect the true path coefficients when all three vectors are taken together. Each vector separately exhibits a linear relationship with the true common factor—they are identical up to separate scaling (Fig. 5b). When the scaling factors for the PLS solution are estimated by Equation (7), they exactly match the path coefficients of the common factor. Figure 5c compares the true path coefficients with the loadings on the first principal component that could be regarded as an approximation of a single common factor in the absence of any further local factors (i.e., when the covariance matrix is of rank one). Figure 5d presents the loadings of a Wright-style factor analysis with the wrong notion of modularity (module 1: *X*_{1}, …, *X*_{5} and *X*_{26}, …, *X*_{30}; module 2: *X*_{6},…, *X*_{15}; module 3: *X*_{16}, …, *X*_{25}). In both cases, the wrong assumptions lead to incorrect estimation of the common factor.

## Example 2: Identifying Modules

Olson and Miller as well as Terentjev assumed that the absolute values of within-module correlations exceed the correlations between modules. Based on the same assumption, other authors argued that modules should be seen as clusters of morphometric variables in a Q-mode cluster analysis. As a measure of integration, several authors use the difference of the average covariance or correlation (or their mean squares) within all modules and the average covariation/correlation among the modules. If the average within-module covariance is assumed to exceed the average covariance between the modules, a Mantel's test between the covariance/correlation matrix and a design matrix specifying the modules is frequently used as a measure of modularity. To further explore these approaches we modeled three different covariance structures that all correspond to the path model in Figure 4.

Model 1: The path coefficients for the common factor as well as for the three local factors have a mean of 0 and a standard deviation of 1—the average effect of the factors is small as compared to their variation or allometry.

Model 2: Same as simulation 1 but the coefficients have a mean of 3. The path coefficients are higher on average than in simulation 1 but their variation is identical.

Model 3: The coefficients of the common factor range evenly from 2 to 7.8, modeling a markedly allometric growth gradient. This model consists of no local factors, so that no notion of modules is warranted. For the exemplary analysis the first 10 values are nonetheless treated as belonging to the first tentative module, the second 10 to the second, and the highest 10 coefficients to the third.

Figure 6a, Figure 6b, Figure 6c shows frequency plots of the within-module covariances (dark gray) and the between-module covariances (light gray) separately for each model. As we showed algebraically, the two distributions of covariances differ clearly in model 2 where the local path coefficients are high relative to their variation (i.e., allometry), whereas the distributions are identical in model 1. When assuming the same modules for model 3 (although this is formally incorrect because model 3 has no local factors), the distribution of within-module covariances consists of three clusters and differs from the distribution of between-module covariances.

The allometries of the factors—the variation of their effects—complicate the identification of modules from phenotypic covariances. It is therefore expected that after the removal of allometric size (the consequences of overall size on shape), estimated either as regression on overall size or as the first principal component, modules are more clearly apparent in the covariance structure. When doing so, it turns out that the distributions of within-module covariances and between-module covariances become narrower; i.e., their variation decreases, but their central tendencies remain unaltered. For module 2 there is thus nearly no overlap among the two distributions after removing allometry (Fig. 6d), and within-module covariances clearly exceed the between-module residual covariances. When removing allometry from model 1, the distributions become narrower as well but their central tendencies do not change—they remain completely overlapping. The success of removing allometry depends, of course, on the overall size increase caused by common factors and on the amount of induced shape variation relative to local factors.

Also, the average covariances fulfill the classic expectations only for model 2 due to the positive average path coefficients (average covariance within/between: 19.70/9.47; average squared covariance within/between: 436.14/113.39). In model 1 the average squared covariance within (1.44) exceeds the one between (0.70), whereas this is not the case for the average covariances (−0.12 versus −0.03). For model 3, which does not include any local factors, both the average covariance within (26.6) and squared covariance within (985.10) exceed the respective values between the assumed “modules” (22.67, 599.53). A Mantel's test (1000 permutations) identifies modules for models 2 and 3 but not for model 1 (0.826, 0.001, and 0.001 for the three models).

For models 1 and 2, Figure 7 plots the loadings of all 30 variables on the first three eigenvectors. Clusters of variables that correspond to modules would be apparent in these plots if they existed. Because the necessary separation of path coefficients from 0 is not the case, we do not see modules for model 1, whereas the variables apparently cluster corresponding to the modules in model 2. Because model 3 consists of a single factor, only one meaningful eigenvector exists, and dual space is one-dimensional. In this space the variables are ordered according to their common factor path coefficient, that is, along the growth gradient, and so they cluster “correctly” here, even though the notion of modules is formally incorrect.

From the three different models, therefore, those classic approaches can identify correctly only model 2 with positive average path coefficients. The modules in model 1 cannot be identified, and even in the absence of any local factors a modular structure is incorrectly attributed in model 3. A growth gradient or a strongly allometric common factor may induce a covariance structure that appears modular when assessed by these indices. Similar results ensue when correlations instead of covariances are analyzed for the three simulations but they depend partly on the amount of error variance added to the variables.

## Discussion

Modularity has become a key concept in the contemporary study of evolution and development. Morphological modules can be construed as parts of an organism that are separately affected by local genetic, developmental, evolutionary, or environmental factors so that they exhibit some degree of independence as they vary over ontogenetic or evolutionary time scales, individuals, or experimentally induced conditions. While developmental biologists typically intend to demonstrate the independence of organismal parts or processes under experimental conditions, comparative biologists and paleobiologists often seek to identify groups of correlated characters from adult specimens. Evidently, those two approaches differ fundamentally in their design. Although both intend to identify dissociated local developmental factors, in an experimental setting one explicitly *controls* some of those factors (e.g., through manipulations of the genotype or through direct manipulation of developmental pathways) and observes their effects. Even when factors are not under direct control, they might still be observable, such as levels of growth hormones, alleles of relevant genes, etc. Most classical studies of morphological integration, however, do not have any information on such factors, so that they must be inferred from (adult) phenotypic variables.

When the developmental factors are known, estimating their quantitative effects can be done by regression analysis and conclusions about modularity can be drawn from those results directly. Recent approaches based on the effects of quantitative trait loci (QTL) appear promising in that context (Cheverud et al., 1997; Mezey et al., 2000; Klingenberg et al, 2004). Hallgrímsson et al. (2004, 2006) explored morphological integration in different transgenic mice. Badyaev and Foresman (2004) compared patterns of integration in species raised under different environments. Smith (1996), Mitteroecker et al. (2004b, 2005), and Bastir et al. (2006) analyzed regional differences of developmental timing and drew conclusions about regional dissociation and modularity. When the effects of observed factors are discrete traits, such as the presence or absence of specific anatomical features, no statistical inference is needed at all.

When factors are completely unknown, sometimes they can be inferred from phenotypic variables by factor analysis if there is some prior notion of modularity. From the variety of published approaches to factor analysis, Sewall Wright's (1932) classic method seems to be a simple and effective approach to estimate common and local factors based on a specific geographic assumption of modularity. As we have shown, partial least squares analysis yields the same results as Wright's method when the PLS loading vectors are scaled correctly. We have outlined two methods to estimate those scaling factors in order to enable a meaningful joint visualization and interpretation of PLS loadings as common factors.

Several measures of integration among specified blocks of variables had been proposed in the literature. In general, all these measures describe different aspects of a multivariate, multiple-factor distribution of variables by one single number; it is difficult to pass from such a single measure back to any relevant biological or statistical property. Table 1 summarizes the assumptions that some of those methods require to be interpretable in terms of factor models. A meaningful comparison of average correlations or covariances between and within tentative modules assumes high average effects of local factors relative to their allometry. A matrix correlation between an empirical correlation matrix and a connectivity matrix equals the sum of all within-module correlations and thus employs the same assumptions. A Mantel's test of this matrix correlation is a measure of the average difference among within-module and between-module correlations, but is not a valid significance test. Trace correlation and correlation among PLS scores are consistent with the factor model; i.e., they may reflect modularity, but their actual values depend on sample size and the number of variables. They should thus be used to compare modules with the same variable count or groups with the same sample size only.

Publications | Method | Assumptions |
---|---|---|

Terentjev (1931), Olson and Miller (1951), Leamy (1977), Cheverud (1982) | Clustering standardized variables according to their population correlation coefficient or equivalently to their (projected) distance in dual space. | Path coefficients (i.e., effects) of the local factors must be of the same sign as the common factors and high relative to the variation of all coefficients (i.e., allometry of local and common factors). |

Cheverud (1982) | Comparison of the average correlation or average squared correlation within the modules to the average (squared) correlation between the modules. | Comparing average correlations requires that the average path coefficients for the local factors differ from zero. Average squared correlations are difficult to interpret as they are functions of both the mean and the variation (allometry) of path coefficients. The same is true for average squared covariances as, e.g., in Klingenberg and Zaklan (2000). |

Cheverud (1989) | Matrix correlation between a population correlation matrix and a connectivity matrix that codes the assignment of variables to modules. | The matrix correlation equals the sum of all within-module correlations and the suggested Mantel's test is a measure of the difference between the average correlation within modules and the average correlation between modules (but no significance test). It thus has the same assumptions as Cheverud (1982). |

Magwene (2001) | Clustering variables according to their pairwise partial correlation coefficients conditional on all other variables (elements of the inverted correlation matrix). | Requires that all modules consist of two variables only. |

Rohlf and Corti (2000), Bookstein et al. (2003) | Correlation coefficient among partial least squares scores (i.e., singular warp scores) as a measure of integration among two modules. | When the correlation between PLS scores is compared among different modules or different samples, numbers of variables as well as sample sizes need to be equal. |

Klingenberg et al. (2003) | Trace correlation as a measure of integration among two modules. | When the trace correlation is compared among different modules or different samples, numbers of variables (p) as well as sample sizes (n) need to be equal and n must be larger than p. |

Bookstein et al. (2003) | Joint visualization of PLS vectors (singular warps) as one single-shape deformation. | The vectors have to be scaled appropriately (see main text) before they can be visualized as one shape deformation. The same is true for Wright's (1932) algorithm when applied to two local factors only. |

The current paper (based on Wright, 1932, and Bookstein et al., 1985) | Estimation of common and local factors to describe and visualize integration among modules as well as dissociated local shape variability. | Requires a reasonable a priori assumption of the actual modules. Estimation of local factors further assumes additivity of the major developmental effects on the morphological variables. |

Publications | Method | Assumptions |
---|---|---|

Terentjev (1931), Olson and Miller (1951), Leamy (1977), Cheverud (1982) | Clustering standardized variables according to their population correlation coefficient or equivalently to their (projected) distance in dual space. | Path coefficients (i.e., effects) of the local factors must be of the same sign as the common factors and high relative to the variation of all coefficients (i.e., allometry of local and common factors). |

Cheverud (1982) | Comparison of the average correlation or average squared correlation within the modules to the average (squared) correlation between the modules. | Comparing average correlations requires that the average path coefficients for the local factors differ from zero. Average squared correlations are difficult to interpret as they are functions of both the mean and the variation (allometry) of path coefficients. The same is true for average squared covariances as, e.g., in Klingenberg and Zaklan (2000). |

Cheverud (1989) | Matrix correlation between a population correlation matrix and a connectivity matrix that codes the assignment of variables to modules. | The matrix correlation equals the sum of all within-module correlations and the suggested Mantel's test is a measure of the difference between the average correlation within modules and the average correlation between modules (but no significance test). It thus has the same assumptions as Cheverud (1982). |

Magwene (2001) | Clustering variables according to their pairwise partial correlation coefficients conditional on all other variables (elements of the inverted correlation matrix). | Requires that all modules consist of two variables only. |

Rohlf and Corti (2000), Bookstein et al. (2003) | Correlation coefficient among partial least squares scores (i.e., singular warp scores) as a measure of integration among two modules. | When the correlation between PLS scores is compared among different modules or different samples, numbers of variables as well as sample sizes need to be equal. |

Klingenberg et al. (2003) | Trace correlation as a measure of integration among two modules. | When the trace correlation is compared among different modules or different samples, numbers of variables (p) as well as sample sizes (n) need to be equal and n must be larger than p. |

Bookstein et al. (2003) | Joint visualization of PLS vectors (singular warps) as one single-shape deformation. | The vectors have to be scaled appropriately (see main text) before they can be visualized as one shape deformation. The same is true for Wright's (1932) algorithm when applied to two local factors only. |

The current paper (based on Wright, 1932, and Bookstein et al., 1985) | Estimation of common and local factors to describe and visualize integration among modules as well as dissociated local shape variability. | Requires a reasonable a priori assumption of the actual modules. Estimation of local factors further assumes additivity of the major developmental effects on the morphological variables. |

When neither the factors nor the modules are known, it is impossible to estimate factors of biological relevance with confidence, due to the indeterminacy of factor solutions. It is frequently assumed that groups of correlated characters (ρ-groups or correlation pleiades) equate to modules: groups of characters influenced by dissociated developmental processes. To test this assumption we simulated modular developmental systems by path models consisting of local dissociated factors and assessed the induced covariance structures. We found that groups of correlated variables reflect modules only when the effects of local factors on the measured variables are high relative to the variability of the effects of all factors. The variation of effects, i.e., the variation of path coefficients or factor loadings in a path model, can be interpreted as the allometry induced by that factor. The classic approaches to morphological integration are therefore only meaningful in the near vicinity of isometric growth factors. This is not a very realistic assumption that is certainly not met for shape variables. In the presence of strongly allometric common factors, the identification of morphological modules based on observed correlations or covariances is very unreliable. For instance, a strong global growth gradient induces a covariance structure that would be regarded as modular by most current methods. It is therefore helpful to remove common factors from the data before interpreting differences in correlations among variables in terms of newly discovered modularity. Modules appear more clearly in this residual covariance structure than in the raw correlations/covariances. The customary practice of removing allometric size from the data prior to statistical analysis is one step along these lines, but other common factors may remain. The same residual covariance structure can be arrived at when computing partial correlations/covariances that are conditional on a measure of overall size. However, partial correlations conditional on *all* other variables, that is, statistics based on the inverse correlation or covariance matrix, are meaningless except when all modules consist of two variables only. When the observed shape variation can be explained sufficiently by common factors alone (the most parsimonious explanation), no further notions of modularity appear useful.

It follows from these thoughts that modularity is a notion about the amount and pattern of residual covariance when the contribution of common factors to the covariances has been accounted for. In a multivariate language, one can thus define modules as blocks of variables that exhibit non-zero within-block covariance matrices even when the covariances due to common factors have been subtracted. In contrast, the cross-block covariances (those among variables from different modules) are near zero after partialing out the common factors (Equation (3)).

To enhance the identification of modules, some authors included further geometric information in their analysis of morphological integration. Zelditch and colleagues (Zelditch et al., 1992; Zelditch and Fink, 1995) considered the spatial scale of shape deformations during ontogeny. Although this still appears as a promising direction for further research, their actual methodology was based on partial warps (Bookstein, 1989) that, as is now commonly agreed on, lack any biological meaning (Rohlf, 2002). Klingenberg and Zaklan (2000) and Klingenberg et al. (2001) applied several techniques specifically to purely asymmetric shape variation (fluctuating asymmetry [FA]; see also Mardia et al., 2000). Because FA is generated by random perturbations of developmental processes that are assumed to arise locally and independently, they argue that integrated FA in two parts of a structure can only occur if the parts share the effects of the same perturbations. This interesting approach certainly permits further insights into the diverse origins of integration but is subject to the same methodological problems as are raised in the present paper.

Cheverud (1982) incorporated the concept of morphological integration into a quantitative genetic framework and computed “G-sets” deriving from genetic correlation matrices that in turn rely on heritability estimates of the assessed traits. He compared such groups of genetically correlated traits with groups deriving from phenotypic and environmental correlations or to functional hypotheses (see also Bailey, 1956; Leamy, 1977; Atchley, et al., 1981; Klingenberg and Leamy, 2001). Cheverud's work led to a series of novel theoretical and empirical insights, yet the discussed problems of identifying modules remain the same whether applied to phenotypic or to genetic correlations.

An important issue that most published studies on morphological integration (including ours) tend to repress are the geometric properties of morphometric variables. Usually, neither landmark coordinates nor interlandmark distances can be uncorrelated even when they might be under wholly independent biological regulation. Conversely, for any geometrically integrated shape deformation some variables will not change their value at all. Although those properties do not matter when modeling or estimating patterns of integration, they massively complicate the morphometric identification of modules and the measurement of integration. A promising direction for further research would thus be to account for the specific geometric properties of morphometric variables in the methodology of morphological integration and modularity.

## Acknowledgments

We are grateful to Philipp Gunz, Christian Huber, Johann Kim, Dennis Slice, and James Rohlf for thoughtful comments on earlier versions of the manuscript and to Benedikt Hallgrímsson, Norman McLeod, Roderic Page, and one anonymous reviewer for their constructive suggestions during the review process. We also thank Katrin Schaefer, Horst Seidler, and Gerhard Weber for their support. This study was supported by grant GZ 200.033/1-VI/I/2004 of the Austrian Council for Science and Technology to Horst Seidler, EU FP6 Marie Curie Research Training Network MRTN-CT-019564, and a fellowship from the Konrad Lorenz Institute for Evolution and Cognition Research to P.M.

## References

*Homo*: Singular warps analysis of the midsagittal plane in ontogeny and evolution

*Nesophontes micrus*(Insectivora) and

*Oryzomys couesi*(Rodentia)

*Drosophila*wing

*Pan paniscus*versus

*Pan troglodytes*

*Pemphigus populi-transversus*in eastern North America

*Rana ridibunda*Pall. (Amphibia, Salientia)

*Sigmodon fulviventer*

*Pygocentrus nattereri*(Teleostei: Ostariophysi)