Phenotypic characters of rice landraces reveal independent lineages of short-grain aromatic indica rice

Crop domestication is a remarkable example of evolution of wildly growing plants into cultivable forms through human selection. Following the domestication of rice almost 10,000 years ago, ancient farmers selected many rice lineages for diverse agronomic and cultural traits, like grain size, shape and colour; awn length; pest resistance; and aroma etc. In this study, examining phenotypic traits of a large collection of Indian rice landraces (all accessed from Vrihi, rice seed bank, www.cintdis.org/vrihi) we characterize the huge phenotypic diversity, and find that a few grain, panicle and leaf traits are major drivers of this diversity. We also demonstrate the existence of short grain aromatic landraces perhaps with independently evolved aroma trait; unlike introgression from japonica into indica group, as evidenced in Basmati-type long grain varieties. The independent origin of aroma in indica rice is fascinating as it explores lesser known aspects of indica rice domestication and diversification.


Introduction
Crop domestication is a complex process mediated by a series of prolific phenotypic changes to modify a wild species so that it is amenable to cultivation, harvest as well as consumption. Continuous selection of desirable phenotypic traits from the wild species Oryza rufipogon gradually gave rise to the domesticated O. sativa that now feeds billions of people globally (Khush 1997;Kovach et al. 2007). Based on several morphological and genetic markers, O. sativa is broadly divided into two varietal groups, namely japonica and indica, and these groups are further subdivided into five distinct subpopulations (Garris et al. 2005). Apart from that, considerable morphological, ecological and physiological variations exist within each varietal subpopulation owing to selection for adaptations to different agro-climatic conditions (Khush 1997).
Several independent domestication events might have occurred to establish cultivated rice in China, India and Southeast Asia (Konishi et al. 2006). Rice landraces are the groups of lineages that originated and evolved in the field over millennia through selective breeding by generations of farmers, who chose random mutants and gene combinations in domesticated rice, for better yield, grain size and other agronomic or cultural values (Deb 2005). Selective breeding, random mutation as well as frequent hybridization between the landraces and wild relatives over a long time ensured the accumulation of a high phenotypic as well as genetic diversity (Fukuoka et al. 2006;Sanni et al. 2008). Retention of immense genetic diversity is not only significant in terms of evolutionary potential to withstand diverse selection regimes, but also has important implications in rice breeding to furnish new genes for crop improvement, e.g. abiotic stress tolerance, or pest-or disease-resistance genes (Frankel and Soule 1981). However, beginning from the 1960s, a large number of these landraces have been replaced with modern varieties introduced over the past four decades (Heal et al. 2004;Deb 2009).
Indica landraces can be identified based on several morphological features, e.g. plant height, leaf length and width, grain weight, size and colour, presence/absence of awn and aroma, etc. (Deb 2005). In most cases, the selection of morphological traits is based on certain inherent qualities, e.g. adaptation to marginal environmental conditions (e.g. flood, drought and soil salinity) or specific agronomic and cultural traits (early maturity, aroma, medicinal properties and high grain yield), etc. India being a centre of rice diversity, it can be imagined that these landraces have imbibed enormous variation from recurrent mutations and from their wild ancestors through ages of agricultural practices; however, owing to the predominant use of modern high-yielding varieties (HYV), a massive proportion of the indigenous rice genetic diversity has already disappeared from farmers' fields (Chang 1984). Therefore, assessment, documentation, analysis and conservation of the extant genetic diversity are essential prerequisites to mine useful genes for the development of the new, adaptive cultivars (Chang 1984). Also importantly, these landraces represent intermediate forms that are genetically well differentiated from wild relatives but still not exploited in modern breeding experiments for cultivar development. So, presumably they possess ancient signals of domestication, e.g. specific allelic combinations that may be extremely valuable for gaining insights into early rice domestication events.
In this paper, we evaluate a suite of 29 phenotypic characters from 414 rice landraces (both aromatic and non-aromatic) to investigate the major determinants of phenotypic diversity. We further investigated the morphological distinction between aromatic and nonaromatic landraces as well as that within aromatic landraces.

Seed collection and conservation
Folk rice varieties were collected from farmers' fields of various Indian states (Deb 2005). Seeds of these farmer landraces comprise the accessions of Vrihi (www.cintdis. org/vrihi), the country's largest non-governmental seed bank conserving 920 folk rice varieties. All these varieties have been grown every year on Vrihi's conservation farms, located in the district of Bankura (West Bengal, India) and Rayagada (Odisha, India), over the past 17 years (www.cintdis.org/basudha). Genetic purity of each landrace is maintained by periodic rouging of 'off types', in addition to obviating chances of cross-pollination between varieties grown on neighbouring farm plots by employing the flowering asynchrony method (Deb 2006). From the total accession of 920 farmer landraces, 414 were selected for this study.
The germplasms of most of the rice landraces are publically available at Vrihi (www.cintdis.org/vrihi) upon request for scientific study except for commercial research affiliated to corporate sector.

Experiments and measurements of agronomic traits
Each of the rice landraces was grown every year on a 2 m × 2 m plot, from which 10 hills were sampled for characterization. Morphological and agronomic traits of each landrace were recorded every year. For this study, 29 phenotypic characters from the selected 414 landraces at different growth stages were measured following the International Rice Research Institute (IRRI 1980) and International Network for Genetic Evaluation of Rice (INGER 1996) guidelines. Parts of this documentation have been published elsewhere (Deb 2005). The rice characters selected for this study and their units of measurements are listed with abbreviations [see Supporting Information - Table S1].

Data analysis
We examined the relative locations of different landraces in the morpho-space by means of principal coordinate analysis (PCoA) using all the 29 variables in PAST (Hammer et al. 2001). Next, we generated a frequency distribution plot for all continuous variables among the 29 characters, followed by a principal component analysis (PCA) with continuous variables, to find out the overall pattern of phenotypic diversity as well as variable contributions to diversity. We initially performed PCA with 14 continuous variables with their log-transformed values and finally scaled down to eight variables based on internal tests (i.e. antiimage correlation matrix, KMO-Bartlett's test of sphericity and communality) for PCA suitability in SPSS ver. 19.0 (SPSS, Inc., Chicago, IL, USA). In the next step, we divided all the landraces into two a priori groups (aromatic and nonaromatic) based on the broad trend observed in the PCA and made an attempt to find the best variable(s) that can discriminate the groups. Subsequent analyses involved forward stepwise discriminant function analysis (DFA) with eight characters ( [HT]) to understand the combination of variables which can best explain the grouping in STATIS-TICA ver. 10 (StatSoft, Inc., Tulsa, OK, USA). We performed DFA considering F to enter as 0.01, F to remove at 0.0 and minimum tolerance at the default value of 0.01. We also obtained a classification function, which can possibly explain the grouping. This function can be used for further cross-validation. Following this, we checked whether group means are significantly different by Welch two-sample t-test in R 2.15.2 (R Development Core Team 2011).
Finally, we re-examined the morphological distinction within the aromatic group between traditional Basmati and non-Basmati aromatic landraces through 3-D scatterplot and neighbour-joining cluster analysis in PAST (Hammer et al. 2001). In doing so, we combined the published dataset (Takano-Kai et al. 2009) consisting of grain length, width and weight of several global accessions of O. sativa with our data. The rationale behind such an analysis is to check whether the distinction between two different aromatic groups, i.e. Basmati and non-Basmati, persists when a major fraction of global variation of grains is added to our analysis.

Phenotypic variation
Univariate statistics. The landraces have a wide diversity of morphological characters. Specifically, some characters like SW, GL, GW, DL, DW, PD, LL, %ST, PW and HT show varying degrees of polymorphism, whereas F, AL, LA,  0.71, respectively) and HT, LL and PD are related to component 3 (r 2 ¼ 0.84, 0.78 and 0.73, respectively) ( Fig. 1A and B). Based on the loading and biplot, it appeared that a few morphological characters are the major determinants of phenotypic diversity, among which grain length, weight and width, leaf length, panicle density and plant height play a pivotal role. The rest of the characters seem to have a very minimal contribution to variability.

Difference between aromatic and non-aromatic landraces
The first two components of the PCA scatter plot have effectively separated most of the aromatic landraces from the non-aromatic group; most of the aromatic landraces have smaller and lighter grains. However, exceptionally long-grain Basmati and Dehradun gandheswari do not fall into this group and are located far apart from the major cluster of aromatics in morpho-space, and a few lie in between the extremes (e.g. Radhashree and Parmaisal).
Nevertheless, DFA with SW, GL, HT and GW has effectively validated the a priori grouping of aromatic and nonaromatic landraces. A model consisting of four variables, out of eight selected from PCA (Table 1), was found to best explain the grouping. We did not observe absolute discrimination between the two groups, and we were able to score 88.4 % correct classification. The discriminant function is statistically significant with moderate canonical correlation R ¼ 0.541 and group separation (eigenvalue, 0.415542; canonical R, 0.541809; Wilks' lambda, 0.706443; cumulative probability, 100 %; P , 0.01). It appears that SW can best explain the grouping between aromatic and non-aromatic groups. In the model, it has the lowest partial lambda, the highest standardized coefficient, and F-remove values. By itself, SW can classify 82 % of the data correctly to their groups. The contribution of GL is also quite significant in the model, and it is more evident from its higher tolerance as SW and moderate values of other parameters. Furthermore, Basmati and Dehradun gandeshwari were misclassified into the non-aromatic group. The classification function can correctly classify 77 % of the aromatic and 89.7 % of the non-aromatic varieties. This grouping can be further used for crossvalidation and assignment of unknown samples. We then performed a Welch two-sample t-test, which showed that the group means are significantly different (t ¼ 8.2154, df ¼ 47.765, P , 10 -9 ). The non-aromatic landraces have heavier grains (mean GW 1.24 g) than the aromatic landraces in general (mean GW 0.98 g). We also performed a Hotelling T 2 test including both SW and GL and found that both have significantly different group means (data not shown).

Differences in grain dimensions among aromatic landraces
Examination of additional landraces from different globally diverse O. sativa accessions, in combination with our own data set, revealed a conspicuous intra-aromatic separation in the 3D scatter plot and cluster diagram on the basis of major variations in grain length, width and weight (Fig. 2) [see Supporting Information- Fig. S3]. Traditional Basmati-types and non-Basmati aromatic landraces from both the samples from global O. sativa accessions and our own landraces were clearly separated into two distinct groups.

Discussion
Rice landraces have evolved from their wild progenitor mostly by anthropogenic and natural selection (Zong et al. 2007) and nevertheless retain huge genetic diversity (Frankel and Soule 1981;Nguyen 2002). These indigenous farmer landraces can tolerate a wide range of environmental stress, resulting in highly stable and an intermediate yield in low-input agricultural systems (Huang et al. 2010), and can substantially enrich the gene pool of advanced cultivars (Chang 1984;Fukuoka et al. 2006). India is one of the major centres of rice diversity, but little initiative has been taken to conserve and assess the landrace diversity; rather, an aggressive campaign to promote a handful of modern HYV and hybrids has caused a rapid erosion of the indigenous rice genetic diversity, resulting in the disappearance of thousands of landraces from farm fields (Jackson 1994(Jackson , 1995Deb 2005Deb , 2009Yamasaki et al. 2005). Elaborate morphological description and analyses are rare for indigenous rice landraces. So far the only available documentation of a wide range of morphological and culturally important characters of indica rice landraces is Deb (2005); however, this work involved no further analysis of the characters. So far, all studies in rice genetic diversity have mostly dealt with small sets of rice landraces, e.g. aromatic rice (Ray Choudhury et al. 2001), lowland rice of Assam (Bhuyan et al. 2007), stress tolerant varieties (Reddy et al. 2009) and tall landraces (Neeraja et al. 2005). The present work is a first attempt to investigate the general pattern of phenotypic variation of a set of traits among a significantly large number of rare landraces (some of which are not accessed in international and national gene banks).
The current study validates the effective segregation of aromatic and non-aromatic landraces, based on grain characters chosen from multivariate analyses. It further argues for an independent origin and evolution of the aromatic lineages in indica and traces the history of rice domestication, finer details of which can be further elucidated by molecular genetics analyses.
Our analysis indicates that the quantitative grain characteristics (grain length, grain width, grain weight, decorticated grain length and decorticated grain width) along with panicle density, leaf length and plant height predominantly contribute to the observed range of phenotypic diversity. This finding is in agreement with earlier work, e.g. Sanni et al. (2008) showed prominent contributions of grain length, grain weight, panicle length, leaf width and tillering ability to the phenotypic diversity in Cô te-d'Ivore landraces. The phenotypic variability of African traditional rice is primarily driven by panicle length, grain weight, grain length, grain width, flag leaf width, tiller numbers and days to heading (Meizan 1985). Li et al. (2010) demonstrated grain weight as one of the most important characters enhancing variability. Our study also shows leaf length and plant height have a significant contribution to variability in rice landraces, a finding not reported in previous studies.
The prominent contribution of grain length, grain weight and panicle density to rice phenotypic diversity is perhaps due to the fact that the grain and panicle characters are agronomically the most important traits, which were subjected to strong directional and diversifying selection by farmers over generations (McCouch 2004). However, artificial (directional) selection during the early domestication process would have caused a drastic loss of diversity, which in no way accounts for the high degree of variation observed here. Nevertheless, unlike in other cereals, e.g. maize and wheat, selection in rice has not always operated unidirectionally to achieve a uniform increase in rice grain size. Rather, diverse grain sizes and colours were selected at various local geographic scales for specific regional cultural and utilitarian requirements ). The result is that a wide range of grain length (from 5.7 to 11.3 mm) exists in indica landraces while a narrow range prevails in its wild ancestors (e.g. 7.32 -9.95 mm) (Takano-Kai et al. 2009). Similar selection processes enhanced the cooking quality, amylose content and degree of aroma in a large number of landraces in combination with grain size and colour variations. Divergent local preferences for several morphological traits have preserved a wide range of grain width, colour, plant stature, flag leaf angle and various culinary qualities and have led to conservation of enormous allelic diversity. Therefore, we surmise that while early domestication might have eroded a relatively small amount of this diversity, recurrent mutations and farmers' selections over centuries might also have magnified a certain portion of diversity. This argument is in contrast to the prevailing hypothesis about the severe loss of diversity mediated by artificial selection during the domestication process. It is postulated that artificial selection in crops operated in a two-step fashion: the initial loss is during domestication when wild progenitors gave rise to landraces adapted to wider environmental conditions, and next when modern agricultural activities generated almost homogeneous inbred lines from the landraces through strong selection of agriculturally important traits (Yamasaki et al. 2005). We argue that the first step also involved enhancement and preservation of the allelic diversity of numerous morphological traits through various local selections and preservation of novel mutants; it is the second step (homogenization) that caused an enormous decline in crop genetic diversity, which continues to take place with the introduction of modern varieties.
Our analysis of all morphological characters and that of selected quantitative characters contradistinguish most of the aromatic varieties from the non-aromatic landraces in the scatterplot. The segregation of aromatic landraces on morpho-space may indicate that some aromatic landraces might have independently evolved into close-knit lineages. This conjecture is further supported by discriminant analysis, which clearly segregates most of the aromatic and non-aromatic landraces into two distinct groups based on grain dimensions, although it misclassified the long-grained Basmati and Dehradun gandheswari into the non-aromatic group. Additionally, the explicit morphological separation of traditional Basmati and non-Basmati aromatic landraces in the PCA scatterplot is another significant outcome of our analysis. This corroborates the geographical distinction of small-and medium-grain non-Basmati aromatics from Basmati-type long-grain aromatic rice (Sharma et al. 2000) and is confirmed by our extended analysis of grain characters of additional O. sativa accessions from across the world (Fig. 2) [see Supporting Information- Fig. S3].
Our interpretation is based entirely on the phenotypic data, which have been routinely used to delimit species, populations and varietal groups in micro-and macroevolutionary studies (Armbruster et al. 2004;Eble 2004). While genetic analysis is necessary to validate our interpretation, phenotypic data may often suffice to correlate species diversification rates and phenotypic divergence (Ricklefs 2004(Ricklefs , 2006. Furthermore, the morphological distinction between traditional Basmati and non-Basmati aromatic landraces is robust enough, even on a scale of global variance of grain characters, and is not limited to a certain geographic region. The prominent difference between long-and short-grain landraces essentially constitutes important evidence of their independent evolution and poses an intriguing question about the origin of aroma. Aroma in Basmati is presumed to have been introgressed from the japonica varietal group into the indica group (Garris et al. 2005;Kovach et al. 2007Kovach et al. , 2009). In addition, grain length of Basmati-like landraces is also presumed to be introgressed from the japonica varietal group (Takano-Kai et al. 2009). Therefore, if short-grain non-Basmati aromatic indica landraces independently evolved into a divergent lineage, it seems highly likely that an alternative/additional QTL for aroma exists in the indica group of rice (Fitzgerald et al. 2008). Likewise, the grain elongation QTL may be contributions from O. rufipogon and/or O. nivara. The discovery of a distinct lineage of annual species in peninsular India derived from O. nivara (Sharma et al. 2000) and the discovery of rare O. nivara alleles shared by Basmati 370 (Sarla et al. 2003) lend support to our hypothesis of inheritance of aroma and Basmati-type grain length in indica rice from ancestral South Asian species rather than from O. sativa japonica.

Conclusions
Our study highlights the immense phenotypic diversity of conserved landraces. Among the morphological characteristics, grain weight, length and width; decorticated grain length and width; leaf length; panicle density and plant height contributed most to overall variability among indica rice landraces. The existence of a wide range of phenotypic traits indicates that landraces were (and continue to be) selected by farmers for diverse cultural and local ecological needs. This has led to preservation, and in some cases, enhancement of the allelic diversity in rice. This inference is in contrast to the widely held view that the domestication process entailed loss of immense genetic diversity.
Our study further emphasizes the distinctive morphological difference between most of the aromatic and non-aromatic landraces, and a clear separation between AoB PLANTS www.aobplants.oxfordjournals.org long-grain Basmati and short-grain non-Basmati aromatic varieties, which indicates the possibile origin of additional aromatic lineages within the indica group. Moreover, the two distinct groups of landraces based on grain size appear to contradistinguish the aromatic from the nonaromatic landraces-a feature that is not predictable from the biochemical basis of rice aroma (imparted by 2-acetyl 1-pyrroline; Brahmachary 1996).
These findings seem to pose a challenge to the conjectures that at least some genes or QTLs 'for' aroma (Garris et al. 2005;Kovach et al. 2007Kovach et al. , 2009) and genes 'for' grain length (Takano-Kai et al. 2009) were introgressed from japonica into indica landraces during early migration of people and their selection of rice lines. The distinctive separation of aromatic short-grain from non-aromatic long-grain landraces may indicate that two separate gene clusters, expressing aroma and short-grain trait are tightly linked, or alternatively, a single gene cluster has pleiotropic effects on aroma and short-grain length, so that long-grain non-Basmati landraces are not aromatic. Of course, our interpretations warrant further molecular analyses, which would essentially follow our initial conclusions to gain fresh insights into the history of rice domestication.

Sources of Funding
The authors received no financial support for this study.

Contributions by the Authors
A.R. and D.D. conceived the idea and designed the experiments; D.D. collected morphological data; A.R., R.R. and B.C. analysed the data; A.R. and D.D. wrote the manuscript.

Conflicts of Interest Statement
None declared. farmers of Vrihi for providing all rice samples for our study.

Supporting Information
The following files are available in the online version of this article: File 1. Table S1: A list of phenotypic characters and their units of measurements included in this study with their abbreviations used in the text.
File 2. Figure S1: Frequency distribution of 29 phenotypic characters measured in 414 landraces. Details of abbreviations and units of measurements are summarized in Table S1.
File 3. Figure S2: Principal coordinate analysis with all characters of 414 landraces showing separation among landraces and within aromatic landraces. The non-Basmati aromatic landraces, non-aromatic landraces and Basmati along with Dehradun gandheswari are shown as red, black and blue circles, respectively.
File 4. Figure S3: Cluster diagram with grain weight, length and width of global accessions of O. sativa as well as 414 landraces showing separation between Basmati and non-Basmati aromatic landraces. The non-Basmati aromatic landraces, non-aromatic landraces, Basmati along with Dehradun gandheswari (Basmati-type), and global O. sativa accessions are shown in red, black, blue and green fonts, respectively [trop. japonica ¼ tropical japonica, temp. japonica ¼ temperate japonica].