|
|
||||||||
a Facultad de Agronomia, Universidad de la Republica Oriental del Uruguay, Garzon 780, Montevideo, Uruguay
b International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, 06600 Mexico DF, Mexico
* Corresponding author (j.crossa{at}cgiar.org)
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: MLM, Modified Location Model
| INTRODUCTION |
|---|
|
|
|---|
The genetic resources of maize from Latin America have been intensively studied. The first subspecific classification of maize was produced by Sturtevant (1899)(quoted by Brandolini, 1970) by dividing individuals into six groups according to grain texture, shape, and color. Anderson and Cutler (1942) criticized this classification and proposed a new one, the racial classification, which was based on the work of Kulesov (1929) on phenotypic diversity, as well as on genetic and archeological information. The concept of maize race was defined as "a group of maize plants which share characteristics that allow them to be identified as a group" (Anderson and Cutler, 1942). Some methodologies for obtaining racial classifications have been criticized, mainly because of the type of traits used to produce them (Hallauer and Miranda, 1988). The racial classification should not be based on single gene characteristics (Anderson and Cutler, 1942) such as the difference between floury and flint. Racial classification should be based on traits that reflect primarily genetic, rather than environmental or genotype x environmental interaction, differences among accessions.
Beginning in the 1980s, numerical taxonomy gained importance, facilitated by advances in computer technology that allowed the use of multivariate statistical methods for the classification of genetic resources.
There are many classification methods, including the two-stage, Ward-MLM method, proposed by Franco et al. (1998). In the first stage, initial groups are defined by a hierarchical method known as the Ward method. The second stage consists of improving these groups by means of the Modified Location Model (MLM). This classification strategy has the following advantages: (i) the process responds to the optimization of two related objective functions, in the first stage the sum of squares within groups and in the second stage the likelihood function of the observations; (ii) it is linked to a method for defining the optimum number of groups; (iii) it allows calculation of a measure of the group's precision (quality) because it assigns to each observation a probability of membership to the group; and (iv) it uses all the available information, that is, the continuous variables as well as the categorical variables. The use of all the information produces better classification results than the use of only part of the data. The distances between groups are maximized and better differentiated, and more compact groups are obtained (Franco et al., 1998).
Uruguay's maize landrace collection consists of 852 landraces collected in 1978 from farmers' fields by the College of Agriculture of the University of Uruguay, with the support of the International Board for Plant Genetic Resources (IBPGR). At present, one copy of this collection is conserved in Uruguay in the Germplasm Bank of the National Agriculture Research Institute (INIA) and another copy in the International Maize and Wheat Improvement Center (CIMMYT) maize germplasm bank in Mexico. An evaluation of this collection has demonstrated its potential for use in maize breeding programs (De María et al., 1979; Abadie et al., 1997; Salhuana et al., 1998). Adequate classification of these types of collections is critical for future breeding. The Uruguay collection was preliminarily classified into 10 races through visual assessment of ear and grain characters. This racial classification was developed as a starting point and as a basis for more definitive studies.
This study is part of an effort to improve the classification of the maize landrace collection from Uruguay. The objective of this research was to compare the preliminary racial classification obtained through a visual assessment with a numerical classification, and to understand further the advantages and disadvantages of the numerical classification.
| MATERIALS AND METHODS |
|---|
|
|
|---|
The characteristics evaluated were divided in two groups: (i) primary characteristics such as grain yield, forage yield, and lodging, and (ii) accessory traits such as cycle, height of plant and ear, number of tillers and ears per plant, and kernel and ear description characteristics. The database obtained was published in the Catálogo de Recursos Genéticos de Maíz de Sudamérica-Uruguay (Catalog of Genetic Resources of South America-Uruguay) (Fernández et al., 1983).
The database represents a total of 27 variables. The continuous variables were days to anthesis, days to silking, plant height, ear height, prolificacy index, tiller index, lodging, grain yield, ratio biomassear, residual biomass, ear length, ear diameter, row number, grain thickness, grain length, grain width, weight of 100 kernels, grain weight per ear, ear weight, and percentage of ear weight due to grain. The categorical variables included were ear shape, grain shape, grain texture, grain color, aleurone color, endosperm color, and ear color.
Racial Classification Based on Visual Assessment
The accessions were classified by De María et al. (1979) into 10 preliminary races on the basis of visual assessment of grain and ear mophology, following the general guidelines given by Paterniani and Goodman (1977). Ten races were defined: Dente Branco (dentsemident) (R1), Moroti (floury) (R2), Dente Riograndense (dentsemident) (R3), Semidentado Riograndense (flintsemiflint) (R4), Canario de Ocho (flintsemiflint) (R5), Cateto Sulino (flintsemiflint) (R6), Cuarentino (flintsemiflint) (R7), Cateto Sulino Grosso (flintsemiflint) (R8), Cristal (flintsemiflint) (R10), and Pisingallo (popcorn) (R9) (probably the same as Pisinkalla described by Goodman and Brown, 1988).
Numerical Classification
The Ward-MLM method (Franco et al., 1998) was used to perform the numerical classification. This methodology used a two-stage classification strategy: (i) accessions were initially grouped by a hierarchical method (Ward, 1963) and an approximate number of groups was determined by the Mojena index (Mojena, 1977), and the likelihood profile, related to the maximum likelihood ratio test (Mardia et al., 1979); (ii) then the initial groups were improved by an iterative process that maximized the likelihood function (Franco et al., 1998).
Data were analyzed from a total of 24 variables (Table 1, 17 continuous and seven categorical). Row number, despite its apparent continuity, had few classes, so it was transformed into a categorical variable with two classes separated by the mean: 0 when the value was less than 13.4, and 1 when it was greater than 13.4. The variable ear diameter was also transformed to a categorical variable of the same type: 0, when the value was less than 4.1 and 1 when it was greater than 4.1. The variables ear shape and ear color were not included in the analysis because almost all individuals belonged to only one class. For the analysis, 840 out of the 852 existing accessions were used. Twelve accessions were eliminated due to missing data.
|
Variable Information and Its Importance for Forming the Groups
To determine the variables that better discriminated the groups or races, a stepwise discriminant approach (Klecka, 1980) was implemented. SAS PROC STEPDISC (SAS Institute Inc., 1996), with the FORWARD selection method and a 15% significance level (P
0.15) (Franco et al., 1997a) was used. The importance of categorical variables was determined with the
2 statistic (adjusted by its degrees of freedom), which tested the independence between each categorical variable and the classification.
Distance between Groups
To determine the best classification strategy, three distance measures were studied: (i) the Mahalanobis (1930) distance between groups for continuous variables, defined as D2 =
'
-1
, where
i and
j are the vectors of means of the two groups, assuming a common variancecovariance matrix
; (ii) the square root of the product of the relative frequencies over the categorical variables (Dd); and (iii) the Krzanowski (1983) distance (Md) between groups, using a mixture of continuous and categorical variables.
Multivariate Statistics
Four multivariate statistics, used by the multivariate analysis of variance to assess the differentiation among groups mean vectors, were computed: Wilks' Lambda,
= |W|/|W+B|; Pillai' Trace, V = tr (B(B+W)-1); Hotelling-Lawley Trace, U = tr(W-1B); and Roy maximum characteristic root, max {
i} =
1 (maximum characteristic root of the matrix W-1B). Smaller
values and larger V, U, and
1 values indicated a greater difference between the means of the groups (Franco et al., 1997b). Matrix B is the sum of squares and products between groups, and W is the sum of squares and products within groups in the multivariate analysis of variance.
These four statistics can be approximated by an F statistic for comparing the variability between groups with variability within groups and are used with the same purpose as the F statistics of Wright (1951) or Cockerham (1969), which quantify the effect of inbreeding in population substructures using allele frequency data. By just estimating the value of F for each of the groups formed by the classification method, it is possible to compare the magnitude of these F values. A better classification is that in which greater F values are obtained. However, confidence intervals and hypothesis testing using these F values can only be done by numerical resampling.
Recovery of the Phenotypic Diversity
The fourth criterion used was the recovery of the phenotypic true variance through a process of sampling the groups. This criterion is very important because one of the aims of the classification is to conduct sampling that could be representative of the collection. To measure how many times the true phenotypic variance was under or over estimated by the sample in each classification, a simulated sampling process was used. One hundred random samples, stratified proportionally to the group size, were obtained for each classification. First, 20% of the individuals of each group (or race) were sampled. Second, the mean and variance of the sample of each variable were calculated. Third, the process was repeated randomly 100 times, so 100 different samples were obtained. Finally, the differences between the estimated and the true variances (for each continuous variable) were calculated and a frequency analysis was conducted. Estimates were considered correct when the estimated value was within a 10% interval around the true value.
Graphical Representation of a Classification
A graphic representation of the classification was done by means of a canonical analysis, with PROC CANDISC (SAS Institute, Inc., 1996) (Taba et al., 1997). The canonical analysis of the data set structured in g groups allowed us to show the differences between groups using a two- or three-dimensional graph (Mardia et al., 1979). The loss of information was considered minimal when canonical variables justified at least 70% of the betweenwithin variance relation of the groups (Franco et al., 1997b). The correlation coefficients between the original variables and the canonical variables provided a biological interpretation of the canonical variables, allowing for a characterization of the groups as well as interpretations of the graphs (Franco et al., 1997a).
Quality of the Classification
The average probability of each observation belonging to each group was calculated as follows:
![]() |
ij is the probability of each subject (j) belonging to each group (i). The membership probability is a measure of the classification quality; higher T values indicate a better classification. | RESULTS AND DISCUSSION |
|---|
|
|
|---|
|
|
Comparing Racial and Numerical Classifications
The comparison of the numerical and racial classifications, based on four criteria, is presented below.
Number of Relevant Variables in the Classification
The stepwise discriminant analysis indicated that all of the continuous variables contributed to the differentiation of groups and races (P
0.15), but there were more continuous variables with high F values in the numerical classification (FG) than in the racial classification (FR)(Table 1). The most important continuous variables in the racial classification were: weight of 100 kernels, tiller index, ear height, ear length, ear weight, and grain thickness. Twelve variables contributed most to the numerical classification: grain width, ratio biomassear, tiller index, days to anthesis, weight of 100 kernels, ear length, ear height, lodging, prolificacy index, residual biomass, grain thickness, and grain weight per ear. The racial and numerical classifications shared five out of six relevant variables: weight of 100 kernels, tiller index, ear height, ear length, and grain thickness (Table 1).
On the basis of the chi-squared values (divided by its degrees of freedom), all categorical variables contributed significantly to racial and numerical classifications (P
0.001), but numerical classification gave greater chi-square values than the preliminary racial classification (Table 1). The most important categorical variables that defined racial classifications and Ward-MLM groups were grain texture, endosperm color, and aleurone color.
Table 2 shows, for all continuous variables, the between-groups to the within-groups variability proportion given by the F value. For all the variables, the within-groups variability of the numerical classification was smaller than the within-groups variability of the racial classification (the F values were greater in the numerical classification than in the racial classification). These results indicate that the numerical methodology produces a more parsimonious and cohesive classification than the preliminary racial classification. The frequencies of categorical variables for the 10 races and the seven groups are shown in Table 3.
Since all the variables were used when producing the numerical classification, while only the variables involving grain and ear morphology were used to produce the preliminary racial classification, it was logical to expect that the discriminant analysis would include more variables for explaining the numerical classification than for the racial classification. Among the six most important continuous variables used in the racial classification, five were also used by the numerical classification, and the latter added grain width, which was also the most discriminative variable for this classification. In addition, the numerical classification formed better groups with respect to all the categorical variables, which are all related to grain and ear morphology.
It is generally accepted that the validity of the classification of genetic resources strongly depends on heritability of the traits. It is also accepted that the environment has less of an affect on grain and ear morphology traits than on agronomic traits. Thus, the numerical classification has the disadvantage of adding traits that are highly influenced by the environment. There are, however, two traits added to the classification that can be considered very informative: grain width and days to anthesis. In addition, tiller index, a trait that has high environmental influence, is highly discriminative in both classifications. This indicates that despite the low heritability of tiller index, it is very informative due to the large differences between groups in both classifications. The diverse genetic background of the accessions, which range from indigenous to recent commercial races (Paterniani and Goodman, 1977), is surely the reason behind the diversity for this trait.
Distances between Groups (D2)
The average Mahalanobis distance (D2) for numerical classification was 49% greater than for the preliminary racial classification (Table 4). This showed that the Ward-MLM groups were better separated for continuous variables, than the preliminary races. For the discrete variables, the average distance among races (Dd) was 2% greater than in the Ward-MLM groups. For a mixture of variables, the average distance (Md) of the Ward-MLM groups was 2% greater than among the races (Table 4).
|
= 0.059 (F = 18.05) and
0.015 (F = 49.53), Pillai's Trace V = 2.039 (F = 14.16) and V = 2.765 (F = 41.32), Hotelling-Lawley Trace, U = 4.227 (F = 22.44) and U = 7.141 (F = 57.08), Roy's Greatest root,
1 = 1.799 (F = 86.98) and
1 = 2.922 (F = 141.28) in racial and numerical classification, respectively. As in the univariate case, these values are showing that numerical classification gives greater variability between groups to variability within groups ratios; that is, numerical classification obtains groups that are more homogeneous within and more heterogeneous between the groups.
Recovering the True Phenotypic Variance
A sampling process was used to assess the frequency of correct estimation of the true phenotypic variance on the racial and Ward-MLM classification. Sampling produced similar performances, with a slight advantage for racial classification (Table 5). The averages of underestimation were 15 and 16%, and those of overestimation were 23 and 25%, for racial and numerical classification, respectively (data not shown). These similar performances were expected because, in the long term, the proportional sampling will produce the same estimations for any classification method.
|
The goal of numerical classification is to obtain the best groups, given the available data. The predictability of the classification is given by the adequate choice of the data and the variables. In this study, we did not use grain yield, because of its low heritability, but we did use variables that are affected by the environment. These variables, however, were not relevant discriminators of the groups.
It should be noted that same data was used for both producing and verifying the classifications. This approach can be described as a postdictive verification and has been used by several authors when analyzing genetic resources (Crossa et al., 1995; Franco et al., 1997a, 1997b, 1998, 1999; Malosetti and Abadie, 2001). The predictability of this verification approach can be affected by the low heritability of some of the traits included in the numerical classification.
Canonical Analysis
The centroids of the Dente Branco (R1), Moroti (R2), Cuarentino (R7), Pisingallo (R10), and Cristal (R9) races were separated by the first canonical variable (Fig. 1a). These results are similar to those obtained by Malosetti and Abadie (2001) using a principal component analysis.
|
The second canonical variable was widely correlated to tiller index (r = 0.576) and maturity variables, days to silking (r = 0.355), and days to anthesis (r = 0.352). Tiller index was one of the continuous variables that contributed most to racial differentiation.
The centroids of groups G1G7 in the graph of the first two canonical axes are shown in Fig. 1b. The groups formed by Ward-MLM showed similar correlation values between the original variables and the first two canonical variables than the groups formed by the preliminary racial classification. The variables correlated to canonical variable 1 were grain width (r = 0.638), grain length (r = 0.541), and weight of 100 kernels (r = 0.536). The second canonical variable was correlated to ratio biomass-ear (r = -0.561), as well as tiller index (r = 0.517), days to anthesis (r = 0.465), and days to silking (r = 0.434).
Classification Quality
The numerical classification showed, on average, a high probability (0.966) of membership. Only 5% of the classified observations had a probability of membership <0.75, and there was only one observation classified with a probability <0.5. Groups G3, G4, and G6 had, on average, low probabilities of membership, and several observations with probabilities <0.75.
Implications for a New Classification
The races were almost entirely maintained by the numerical classification. Groups G1, G2, G5, and G7, obtained by the numerical classification method, were formed mainly with individuals belonging to Dente Branco (dent, 88%), Moroti (floury, 76%), Cuarentino (flint-semiflint, 62%), and Pisingallo (popcorn, 96%), respectively (data not shown). The remaining Ward-MLM groups contained different combinations of flintsemiflint races. Groups G3 and G6 were mostly constituted by individuals from the Cateto Sulino race, and G4 by a mixture of accessions of different flintsemiflint races (Tables 2 and 3). Most of the accessions from the Cristal race (flintsemiflint) were included in group G2 together with Moroti, which is not a surprise, since the latter is supposed to be its ancestor (Paterniani and Goodman, 1977).
The redistribution of the Cateto Sulino race into groups G3 and G6 deserves further discussion since this is a highly important race because of its potential for breeding (Salhuana et al., 1998) and because it constitutes 53% of the collection. These two groups were mainly separated by the categorical variable grain shape: one with round, flat, small grains (G6), the other one with round, flat, large grains (G3) (Table 3). The continuous variables that most differentiated the G3 and G6 groups were grain width (GWI), grain length (GLE), and days to silk (DS) (Fig. 2). The individuals with greater grain length and width and earliness were grouped in G3. The variance and coefficient of variation for the continuous variables were lower in G3 and G6 than in the Cateto Sulino race. Their means were approximately similar to those of the Cateto Sulino race; one of the groups was slightly above and the other below (Fig. 2, and Table 2).
|
Grain texture is a highly discriminant variable in both classifications, being the most important one in the numerical classification. This was also observed for several other collections of the region (Paterniani and Goodman, 1977; Abadie et al., 1999). Grain texture (pop, flint, floury, and dent) is a simple morphological trait associated with different steps in the domestication of maize (Goodman, 1976), and with different uses of the crop and the cultural preferences of the farmers. This has been shown in the classification of Brazilian maize landraces and adjacent areas proposed by Paterniani and Goodman (1977), in which racial groups are confounded with different grain types. An association between grain texture and other variables could have occurred during the process of maize domestication, explaining the importance of this single characteristic in maize classification.
This study helped to obtain an updated and more adequate classification of the maize genetic resources of Uruguay. The numerical classification obtained in this study can be considered a refinement and more detailed analysis of the relationships between Uruguayan maize landraces, as compared with the preliminary racial classification. It is an advance, not only because it creates more homogeneous groups, but also because it uses more traits, some of which are important for breeding purposes. In the future, a more precise selection of the variables to be used in numerical classification would be advisable as part of the process of producing the final racial classification.
| ACKNOWLEDGMENTS |
|---|
Received for publication February 5, 2002.
| REFERENCES |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Agronomy Journal | Vadose Zone Journal | |||
| Journal of Natural Resources and Life Sciences Education |
Soil Science Society of America Journal | ||||
| Journal of Plant Registrations | Journal of Environmental Quality |
The Plant Genome | |||