Beyond waist-hip ratio: experimental multivariate evidence that average women's torsos are most attractive

One of the most iconic findings in human behavioral ecology is the fact that women with waist-hip ratios (WHRs) of approximately 0.7 are most attractive and that this ratio indicates maximum fecundity and reproductive value. However, the effects of WHR and of other indices of body shape and size on attractiveness are far from fully resolved. We adopt a recently developed method that combines multivariate manipulation of experimental stimuli with evolutionary selection analysis to test the linear and nonlinear effects of waist, hip, and shoulder width and the interactions between these traits on the attractiveness of 200 line-drawn models to 100 men. There was no general support that WHR or body mass (expressed as perimeter--area ratio) significantly influences attractiveness. There was, however, strong preference for average values of all 3 traits indicating that attractiveness is due to the tight integration of these 3 traits. We plot the mean waist and hip sizes of 8 samples of women on our response surface, including Playboy centerfolds, models from the 1920s and 1990s, Australian escorts, and Australian women in 4 different age categories (collectively we refer to this latter group as the "regular women"). The regular women in the 25- to 44-year age-group were closest to the peak attractiveness value on our response surface. Our results highlight the strong integration of and interrelationships among different parts of the body as determinants of attractiveness. Copyright 2009, Oxford University Press.

T he evolutionary and cultural significance of human attractiveness is a topic of both scientific and popular concern (Buss 1994). Recent studies of the relationship between human female body shape and attractiveness have sought to identify attractive cues and to test the prediction that such cues could act as honest signals of mate quality (reviewed by Weeden and Sabini 2005). Early findings that a waist-hip ratio (WHR) of approximately 0.7 confers maximum attractiveness (Singh 1993a(Singh , 1993bHenss 1995) stimulated an explosion in research interest. These early findings have been widely but not universally replicated, with female attractiveness optimized at values ranging from 0.6 to 0.9 in a variety of age-groups and populations (Henss 1995;Furnham et al. 2005;Rozmus-Wrzesinska and Pawlowski 2005). This range of female WHR values is also associated with good health and fecundity (Rimm et al. 1988;Zaadstra et al. 1993;Weeden and Sabini 2005), supporting the prediction from the sexual selection literature (Andersson 1994) that attractiveness may indicate fecundity and genetic quality.
Several studies have argued that the relationship between WHR and attractiveness may be a spurious consequence of the effects of unmeasured correlates of WHR such as body mass (Tassinary and Hansen 1998;Tovee et al. 1998) and cues of femininity. Although Singh (1993aSingh ( , 1993b considered the effect of body mass by using ''underweight,'' ''normal weight,'' and ''overweight'' line drawings, this manipulation is confounded with, rather than independent of, WHR (Tassinary and Hansen 1998). Subsequent correlative and experimental evidence has shown that other commonly used health indicators, particularly body mass index (BMI ¼ weight/height 2 ) and its 2D area proxy, perimeter-area ratio (Tovee et al. 2002), are also associated with female attractiveness. Overall, studies of WHR and BMI in industrialized countries show that WHR in the lower typical range of approximately 0.7 and perceived BMI in the lower normal range of 18.5-20 are both correlated with high attractiveness (Weeden and Sabini 2005).
Manipulating phenotypic indices of health and fertility such as BMI and WHR in studies of attractiveness may constrain researchers from fully characterizing the physical basis of attractiveness (e.g., if attractiveness is only weakly correlated with the dimensions studied) and, ironically, provide only limited support for the hypothesis that attractiveness indicates health and fertility. The strongest test of whether particular composite measures such as WHR or BMI are critical for attractiveness judgments would be to examine a broad range of trait combinations within the range of the traits' natural distribution, characterize selection on these traits, and assess whether these independently converge on the composite measure of interest.
The last 15 years have seen significant refinements in the use of factorial experimental designs to test hypotheses regarding physical attractiveness. In particular, Tassinary and Hansen (1998) used a multifactorial design to show that preference for WHR is influenced by absolute waist and hip size and by weight (but see Streeter and McBurney 2003). This result in particular suggests a need to study the contribution of waist, hip, and body size to attractiveness as continuous variables to understand the linear, nonlinear, and interactive effects of these traits on attractiveness. Here, we use a method developed in evolutionary studies (Brooks et al. 2005) in which we sample trait values from the existing known distribution for each trait and eliminate the correlations between them to estimate the effects of selection on the traits individually and in combination. This method has already been applied successfully to resolving nonlinear selection on cricket calls (Brooks et al. 2005;Bentsen et al. 2006) and the effect of this selection on genetic variance (Hunt et al. 2007). We created 201 line-drawn figures (i.e., models) of the human torso on which we independently manipulated shoulder, waist and hip width and then measured the attractiveness of these drawn figures in a sample of 100 men. These traits were chosen for a number of reasons: they are sexually dimorphic in humans, and, combined, they determine the shape of the female torso. Each influences one or more composite measures that have been implicated as a target of selection (Pawlowski and Grabarczyk 2003;Weeden and Sabini 2005), with waist and hip width directly affecting WHR, and shoulders, waist and hips all affecting the perceived BMI/perimeter-area ratio (PAR) of a figure (Tovee et al. 2002).
We use established statistical methods from evolutionary biology to estimate (Lande and Arnold 1983;Phillips and Arnold 1989) and visualize (Phillips and Arnold 1989;Blows and Brooks 2003) the nonlinear response surface that describes the relationship between the manipulated traits and attractiveness. We predict that if WHR of 0.7 (or any other value) is most attractive, then there will be correlational selection between waist and hip width and a ridge of high attractiveness when waist and hip size are plotted against one another. By contrast, preferences for low BMI should favor low values of all 3 traits. We then compare these response surfaces with the linear and nonlinear effects of WHR and PAR themselves on attractiveness of our models. Finally, to test the prediction that highly attractive real women will fall on or close to the area (peak or ridge) of highest attractiveness on our response surfaces, we superimpose on our response surfaces the means of a variety of samples of women, including normal Australian women of various ages (''regular women'') and 4 samples of ''superattractive'' women: Playboy centerfolds, models from the 1920s and the 1990s, and contemporary Australian escorts.

METHODS
We obtained the following mean and standard deviation (SD) values for Australian women (19-45 years old, McLennan and Podger 1998): height (163.1 6 5.2 cm), waist circumference (77.1 6 9.6 cm), hip circumference (101.0 6 9.3 cm), and shoulder width (43.3 6 2.3 cm). We constructed our control model (with mean values for all these traits, Figure 1) in Adobe Creative Suite 3 (Illustrator and Photoshop), rotating the hips, waist, and shoulders away from the viewer by 30°, 40°, and 50°, respectively, to present models in an apparently dynamic stance and to convey some depth. We then converted this model to a vector image using the Live Trace function in Adobe Illustrator, allowing the width of shoulders, waist, and hips to be altered and a consistent interpolation process to be applied in altering the model.
We drew 200 sets of 3 random numbers between 0 and 1. We used the NORMSINV function in MS Excel to obtain the inverse of the standard normal cumulative distribution corresponding to each random number. The resulting number, z, is the number of SDs above or below the mean that the observation would take. We then multiplied the value by the measured SD for the relevant trait and added the measured mean (to turn the z score into a measure in the original units). The result was 200 sets of shoulder width, waist circumference, and hip circumference that conformed to the appropriate univariate distributions but were uncorrelated with one another. We then altered our model to the 3 new trait values and saved it as a JPEG file.
We recruited 100 male students (ages 17-34 years) enrolled at the University of New South Wales (UNSW), Sydney, Australia, to participate in an anonymous survey that would take less than 10 min in return for 2 vouchers for a free beverage from a campus cafe. Participants undertook the survey alone in a quiet room, and all procedures were approved by the UNSW Human Research Ethics Advisory Panel (approval number 073007).
We drew 100 sets of 10 stimulus models at random but with the proviso that each of the 200 models was in 5 stimulus sets. Each participant, therefore, saw a unique set of 10 models as well as the control model. We presented stimulus sets and recorded responses using MediaLab software (v2006, Empirisoft). Before responses to the stimulus set were recorded, the MediaLab file showed each participant the same set of 16 models for 1 s each. To provide participants with a sense of the overall range of the models, these 16 models included 8 that deviated very little from the control and 8 that were among those that deviated most (in units of summed absolute standard deviates). We always used the same 16 models, but they were presented in a new random order to each participant. Participants were then asked to rate each of the following models using a 6-point scale with a forced choice (i.e., no neutral middle option): 6 ¼ extremely attractive, 5 ¼ attractive, 4 ¼ somewhat attractive, 3 ¼ somewhat unattractive, 2 ¼ unattractive, and 1 ¼ extremely unattractive and encouraged to use all 6 values on the scale. We presented the control model, then the 10 random stimulus models in random order, then the control model again, followed by the 10 stimulus models in a different order, and finally the control model one last time.
To obtain a single score of the attractiveness of a given model, as rated by the 5 participants who evaluated that model, we subtracted the mean score for the respondent (respondents differed significantly in mean score: F 99,860 ¼ 4.20, P , 0.001) from each observation and then obtained the means of this residual score for each model. We refer to this as the residual attractiveness score (RAS).
We used the RAS as our response variable in multiple regression-based response surface analysis (Lande and Arnold 1983;Phillips and Arnold 1989). We estimated directional preferences in a regression model that included only the 3 linear terms. We then fitted the full nonlinear response surface, comprising the 3 linear terms, the quadratic terms for the 3 traits, and the 3 cross product terms. We doubled the quadratic gradients that were provided by SPSS software because regression quadratic coefficients equal one-half the quadratic selection gradient as conceived by Lande and Arnold (1983) (Stinchcombe et al. 2008). Quadratic and cross product terms sometimes do not give the most powerful test for nonlinear selection (Phillips and Arnold 1989;Blows and Brooks 2003), and we therefore used canonical rotation of the response surface to identify the major axes of the nonlinear response surface. This approach identifies the linear combinations of traits (eigenvectors, m n ) along which selection is strongest and often serves to reduce the number of axes along which selection needs to be visualized (Blows and Brooks 2003). We visualize selection on the original trait axes using nonparametric thin-plate splines (using the fields package in R) and along the single significant canonical axis of selection using a cubic spline (Schluter 1988). Finally, we plot the mean values for 8 samples of women, including 4 samples predicted to be superattractive: Playboy centerfolds, fashion models from the 1920s and 1990s, Australian metropolitan escorts advertising their services on Web sites, and 4 samples of regular Australian women sampled from the general population in different age classes (19-24, 25-44, 45-64, and 651 years old). Playboy centerfold means come from Seifert's (2005) paper reporting height, waist circumference, hip circumference, and WHR for 559 centerfolds. Fashion model data come from a paper reporting these values from 1920-1990(Byrd-Bredbenner et al. 2005. To obtain data on professional Australian female escorts, we reviewed 2 Web-based escort directories: Australian Escorts (http://www.australian-escorts.com.au/) and Escort Pages (http://www.escortpages.com.au/). These directories were chosen because they list profiles of female escorts who are not represented by an agency. We systematically looked at every profile (n ¼ 164) and used the following criteria before including a woman's measurements; photo present; presence of the following measures: age, height, chest circumference, waist circumference, and hip circumference. A total of n ¼ 44 profiles matched our criteria. We cannot verify the accuracy of these measures, but to the extent that they are incorrect, they should be biased toward measures that the women believe to be ideal or desirable.
The Australian anthropometric data on which we based the distribution we used to create model stimuli (McLennan and Podger 1998) reported mean values for height, waist circumference, hip circumference, and WHR grouped by age. We used these data to plot Australian women on the waist-hip response surface.

RESULTS
The major feature of the response surface was stabilizing selection, with a well-characterized peak near to the mean of all 3 traits (Table 1, Figure 2). Canonical rotation revealed that there is only one major axis of this surface (m 3 ), heavily weighted to all 3 traits (Table 2, Figure 3) but especially hips. The contrasting signs of the coefficients for shoulders (negative) compared with Table 1 Vectors of standardized linear selection gradients (b) and the matrix of standardized quadratic and correlational selection gradients (g)

Figure 2
Nonparametric thin-plate spline visualizations of the response surface for each pair of traits. The unbroken line gives the WHR that intersects the fitness peak (0.75), and the broken lines is a ratio of 0.7 that previous studies have shown to be most attractive.

718
Behavioral Ecology waist and hips (positive) suggest that large-shouldered models with small waist and especially hips and small-shouldered models with large waist and hips are particularly unattractive combinations. WHR ranged from 0.52 to 1.15 (mean ¼ 0.78, SD ¼ 0.12), allowing considerable scope to test for an intermediate optimum value. The peak attractiveness is indeed intermediate, at a WHR of 0.75, close to the figure of 0.7 suggested by Singh (1993aSingh ( , 1993b and within the range of 0.6-0.9 suggested by (Weeden and Sabini 2005). However, if such a ratio were generally attractive, we would expect to see a ridge of high attractiveness scores along the 0.7 or 0.75 ratio line. The fact that the surface is a peak and larger or smaller combinations with the same WHR are less attractive indicates that the attractiveness of this ratio is not general. Indeed, the fact that intermediate shoulder widths are also strongly preferred over extremes indicates that the attractiveness of the female torso to males is a consequence of the interaction between all 3 dimensions.
Analysis of the effect of WHR on the attractiveness of models revealed no linear (b ¼ 0.04, P ¼ 0.856) or quadratic (g ¼ 1.61, P ¼ 0.226) effect. This analysis is consistent with our multivariate response surface in which we found no general support for a WHR that maximizes attractiveness. We used the sum of hip and waist circumference and twice the shoulder width as an index of overall size (and, because stimuli were of constant height, relative size). There was no linear (b ¼ 0.01, P ¼ 0.722) or quadratic (g ¼ 0.00, P ¼ 0.101) effect suggesting no preference for generally large-, small-, or intermediate sized models.
Our analysis of waist and hip widths of 8 samples of women found a range of mean WHR values from 0.7 to 0.85. Interestingly, the 4 samples of women that reflect superattractive groupings (i.e., Playboy centerfolds, 1920s models, 1990s models, and contemporary Australian escorts) fell furthest from the peak of our response surface (Figure 4), refuting our prediction that they would fall closest to peak attractiveness. These women had smaller waists and hips than the most attractive combinations in our study, and the low attractiveness of smallwaisted and -hipped drawings in our study may be due to the fact that few, if any, appeared with correspondingly small shoulders. The samples that best represent the distribution of women in Australia (regular women) fell closest to the peak, with the mean for Australian women in the 25-to 44-year agegroup directly on the peak. There is a clear increase in mean WHR with age, and Australian women older than 44 years exceed the peak values of both waist and hip width.

DISCUSSION
When shoulder, waist, and hip widths were varied independently, men found intermediate values of all 3 traits most attractive. Moreover, multivariate analysis of the fitness surface indicates that there is a single major axis of stabilizing selection on this combination of traits with the mean for normal Australian women most attractive. Although the peak of the response surface had a WHR of 0.75, the prediction of a ridge of high attractiveness at or near to a WHR of 0.7 was not upheld. Further, neither WHR nor the model's total area (our approximation of body mass) was associated directly with attractiveness. Thus, our results add to the body of work (Tassinary and Hansen The eigenvalue represents the strength of nonlinear selection along each eigenvector, and the coefficients represent the contribution of each original trait to each of the 3 new eigenvectors.  1998) that questions whether WHR is a general determinant of female attractiveness. Tassinary and Hansen (1998) provided evidence that the preference for WHR in earlier studies (Henss 1995;Singh 1993aSingh , 1993bSingh and Luis 1994) is a consequence of independent preferences acting on overall weight and on hip size. This finding has in turn been contested by a more extensive factorial manipulation of hip, waist, and chest size (Streeter and McBurney 2003). In general, findings regarding the importance of WHR to attractiveness of line drawings or computer-generated images are inconsistent between stimulus sets (Furnham et al. 2006). Part of this inconsistency might be due to the fact that even line-drawn bodies are complex stimuli. Manipulating even a single component of that stimulus alters the overall proportions of the model, such that both the absolute size and a host of proportional relationships are altered. This leads to uncertainty whether it is the absolute size or the proportions that cause any observed effects. Alternatively, other studies maintain some proportional relationships, clouding interpretation as to the contribution of specific traits. This dilemma is a special case of the debate over what a ''trait'' is in evolutionary multivariate selection analyses (see Blows 2007 and associated commentaries). The approach that we adopt here allows the simultaneous estimation of the linear and quadratic effects of single traits and of the interactions between them (correlational gradients). Both the original and the rotated response surface show that interactions between all 3 traits have a strong influence on the attractiveness of models, with models in which 1 trait was inconsistent with the other 2 probably appearing poorly integrated and therefore unattractive to our subjects. We predict that when more than 3 dimensions are varied, researchers will find an even stronger pattern of stabilizing selection favoring a more tightly integrated phenotype.
Our results suggest that men prefer the bivariate waist-hip trait distribution of reproductively active Australian women over supposedly superattractive samples such as models, escorts, and centerfolds. This may be for one or a combination of 3 reasons: First, the bodies of these women may actually be less attractive. The 4 superattractive samples all had lower absolute hip and waist circumferences than the samples of regular women. Hip circumferences and BMI of Playboy centerfolds and models have decreased over the past 4 decades (Katzmarzyk and Davis 2001;Byrd-Bredbenner et al. 2005), and it is possible that this divergence may have driven a departure from the most attractive female phenotype. This seems unlikely, but without measuring the attractiveness of the models, escorts, or centerfolds, as well as randomly sampled women, it is impossible to know whether these supposedly attractive women represent a true departure from peak attractiveness. A second alternative is that the tight integration that we document between shoulders, waist, and hips in determining attractiveness combined with our method of manipulating the 3 traits independently meant that small-waisted and -hipped stimuli were seldom paired with shoulder widths and unmanipulated dimensions that were well integrated resulting in artificially depressed attractiveness in this and other extreme regions of the surface. Last, other traits that have evolved in concert with hip and waist size (e.g., more prominent breasts or facial cues of age) may act to make the figures of women from the superattractive samples more desirable in dimensions that we did not consider in our study.
The strong integration of 3 body dimensions as determinants of the attractiveness of a simple line-drawn model illustrates the strong interrelationships between different parts of the body as determinants of attractiveness. These findings are strongly consistent with the cue compatibility model of Johnson, Tassinary, and colleagues Tassinary 2005, 2007;) in which various body shape and motion parameters combine as tightly integrated cues of sex, gender, and sexual orientation.

FUNDING
Australian Research Council fellowship (to R.C.B.).
Many thanks to Richard Ronay for advice and assistance using Media-Lab and to Matthew Hall, Felix Zajitschek, and Simon Lailvaux for their help with analyses and comments on an earlier draft.