|
|
||||||||
a Dep. of Genetics, ESALQ/Universidade de Sao Paulo, Cx. P. 83, 13400-970, Piracicaba, Sao Paulo, Brazil
b Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Apdo. Postal 6-641, México DF, México
* Corresponding author (j.crossa{at}cgiar.org).
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
In both in situ and ex situ genetic resources conservation, the notion of representativeness is important and its quantification under different practical situations is relevant. In many cases, however, computing Ne may be complex, especially if the reference population has a natural structure or is subdivided into smaller populations related to each other at different hierarchical levels and with several degrees of genetic differentiation. In ex situ genetic conservation, it is necessary to distinguish between activities related to germplasm collection and those aimed at accession regeneration. Nevertheless, the same theoretical basis is applied in both cases, but reference populations are often quite different, thus requiring the use of different theoretical models that appropriately fit the diverse practical situations. When collecting germplasm, the reference population can be considered of infinite size, whereas in accession regeneration, the reference population is of finite size. Nevertheless, Ne is a basic parameter that largely determines allelic retention, preservation, and conservation over generations.
For plant breeding programs, it is also necessary to maintain the genetic base of the population as widely as possible to avoid bottlenecks that are an important cause of reduced genetic variability. Likewise, it is important to recall that fixation of favorable alleles by recurrent selection does not depend exclusively on heritability, gene action, initial allelic frequencies, and selection intensity; it is significantly determined by the Ne maintained throughout selection cycles (Comstock, 1996). On the other hand, in production systems at the farmer's level, cultivars may have been derived from a limited number of genotypes or closely related genotypes. The same issue arises in parent selection for crossing and further selection. In these situations, precautions related to the underlying genetic base are also important.
This review includes a short outline concerning types of Ne. Rather than focusing on their derivation, we emphasize their interpretation and utilization and provide information on how to compute them, as well as their limitations and adequacy. We underscore how genetic marker information can be incorporated into measurements of representativeness that are useful for genetic conservation activities. The term representativeness is used here to indicate the correspondence between a sample of individuals and its reference population with respect to changes in allelic frequencies because of drift.
| MODELS, PARAMETERS, AND ESTIMATES |
|---|
|
|
|---|
Two additional alternatives are also considered: (i) when the information about the genetic structure of one or more populations is obtained from neutral and codominant genetic markers and (ii) when the reproduction systems of the populations are previously known and measured by the natural rate of self-fertilization (s).
Effective size (Ne) defined on the basis of genetic drift due to sampling (Crow and Kimura, 1970; Crow and Denniston, 1988) and its adaptations (Crossa and Vencovsky, 1994, 1997; Vencovsky and Crossa, 1999) is used. Effective size, measured as half the inverse of the average coancestry among individuals of a specific set, including individuals with themselves ("status" number, as proposed by Lindgren et al., 1996, 1997), is also considered as effective number involving allele capture probability in a specific sample. The parameters of representativeness given here always refer to events that occurred in one generation only. Cumulative effects of sampling over a generation sequence is briefly discussed. Species with separate sexes or crossing systems in which monoecious or hermaphroditic plants are manipulated to simulate separate sexes are excluded but diploid segregation is considered. In the case of random sampling of seeds, it is assumed that the number of seeds contributed per pollinated plant follows a binomial distribution (except in specific situations as indicated).
The following symbols are used:
Ne, effective population size (or number).
N, number of individuals of the reference population in the previous generation (when finite).
n, number of individuals or seeds sampled from the offspring generation for which Ne is required.
P, number of seed parents sampled that generated the total of n seeds. For variable numbers, n =
ni (i = 1,2,...,P).
M, number of functional pollen parents.
u, = P/N and v = M/N fraction of functional female and male parents, respectively.
S, number of local populations sampled from the metapopulation.
S*, actual number of populations comprising a metapopulation.
s, natural (or artificial) rate of self-fertilization; assumed equal for all plants; in natural populations under inbreeding equilibrium,
![]() |
m, coancestry coefficient of individuals within maternal progenies. Under inbreeding equilibrium and reproduction through a mixture of selfing and random mating,
m = (1 + s)2/[4(2 - s)] (Cockerham and Weir, 1984).
C, coefficient of variation of the number of sampled individuals. Generally used when individuals are sampled from sub or local populations, such that ni refers to the ith population (i = 1,2,...,S) and n =
ni,
= n/S,
2n = 1/S
2, and C2 =
2n/
2. When ni refers to a given family within a population and under random sampling, it is assumed that this number follows a binomial distribution and
2n = n
, with i = 1,2,...,P female parents.
F AND PARAMETERS
|
|---|
|
|
|---|
For a system of subpopulations (metapopulation), the appropriate nested random linear model corresponding to the levels of the hierarchy is then
![]() | [1] |
2T =
2P +
2I +
2G = p
and following Weir (1996)
![]() |
![]() |
![]() |
(within populations).
Here, as defined by Weir (1996), FST is the correlation of alleles of different individuals in the same subpopulation, FIT is the correlation of alleles within individuals over all subpopulations, and FIS is the correlation of alleles within individuals within sub or local populations. Also
2P = p
p,
2I = p
, and
2G = p
(Weir, 1996). Here, measures of
P, F, and f (Cockerham, 1969) are the analogous of Wright's F. Several alleles may exist at different loci, and each will provide an estimate of the parameters when neutrality is assumed. Coefficient FST can also be interpreted as a measurement of allelic diversity among subpopulations, while FIS is related to deviations from Hardy-Weinberg frequencies within subpopulations. Parameter FIT quantifies these deviations for the whole set of individuals, ignoring the subpopulation structure.
Considering a set of S subpopulations, each with n diploid individuals sampled from a large reference metapopulation, the average gene frequency of the 2Sn genes is
... =
, with the sampling variance being
![]() | [2] |
Taking
![]() |
Ne can be expressed as function of the F and/or
statistics, and of quantities S and n.
Introducing a possible variation of the number (ni) of individuals sampled from different subpopulations is convenient for expanding Eq. [2]. It may also be appropriate to consider that the actual number of populations, in natural conditions, is not necessarily very large (or of infinite size). In this case, the expression for
2
will require a correction for sampling from a finite set of populations. For deriving the variance given in Eq. [2], with finite number S*, it is assumed that the random subpopulation effects Pi of the linear model are such that
Pi = 0
. This generates a covariance between pairs of Pi effects, namely cov
= -
2P/
. Incorporating this covariance into
2
leads to 
instead of simple
2P/S as the first term of the right hand side of Eq. [2]. Term [(S - 1)/(S* - 1)] corrects for the finiteness of the number of subpopulations (S*) (Searle and Fawcett, 1970).
Equation [2] is applicable if gene frequencies of different populations are uncorrelated, which implies that the given metapopulation is the most inclusive stratum of the population hierarchy. Higher order hierarchical structures, however, can be conveniently accommodated. These hierarchical structures may occur when populations belong to different regions in which case the corresponding intraclass correlations are
![]() |
![]() |
and populations
, respectively, (where the subscript R symbolizes regions) (Weir, 1996).
Using Wright's notation, it can be seen that FRT =
R and
![]() |
Obtaining Ne expression in the case of such a three-level hierarchy is equivalent to the two-level case, equating
![]() |
![]() |
![]() |
![]() |
![]() |
The linear model given in Eq. [1] can be adapted to situations where computing Ne is desired for samples of individuals or seeds taken from random maternal families (M). The intraclass correlation or coancestry
m is required to account for this new source of variation relative to individuals within families and within populations. For a single population, the components of variance with a family structure are now
2M = p
m,
2I = p
, and
2G = p
.
Weir (1996) gives an extensive description of the estimation of these F and/or
parameters, when data on codominant genetic markers are available.
| TYPES OF EFFECTIVE POPULATION SIZES |
|---|
|
|
|---|
The Case of Several Populations
The metapopulation is subdivided into local populations and there are no higher hierarchical levels (e.g., population structured by regions, etc.). Expressions for Ne may be adapted to include higher hierarchies by subdividing FST into FSR and FRT, etc., as already mentioned.
Given only one subdivision level, FST and FIT are sufficient, and Ne is obtained as follows:
![]() | [3] |
![]() |
Equation [3] applies when n plants are taken from S subpopulations to represent a region containing S* subpopulations under natural conditions. Under equilibrium with respect to the mating system, this equation also applies when n refers to randomly and bulk collected seeds from plants of those S subpopulations, such that n =
ni, ni being the number of seeds collected from the ith subpopulation. Equation [3] is an extension of the theory developed by Cockerham (1969).
From Eq. [3] and with n sufficiently large, the effective number is scarcely influenced by the total inbreeding (FIT). Thus, Ne becomes mostly dependent on the allelic diversity among subpopulations (FST) and the number of populations (S) studied. Number n is not necessarily the total number of individuals under investigation and genotyped. Sampling allows applying FST and FIT to the whole set of S* subpopulations, so that n may be the existing number in a specific region. In many cases, this number may be considered very large. Under these circumstances, and assuming that the possible number of subpopulations in the region is such that 1/(S* - 1)
0 and S*/(S* - 1)
1, the upper limit of the effective number is Ne = S/(2FST) given S subpopulations. If the condition Ne = 500 is set as a minimum for genetic conservation in that region and, for instance, FST = 0.05, the number of subpopulations required to obtain such representativeness is S = 50. With FST = 0.05, 5% of the total gene diversity occurs among subpopulations and 95% within subpopulations. Therefore, it would be incorrect to state that since FST is small, diversity can be preserved and maintained on the basis of only a few subpopulations. However, if it is known that S* is finite, the required number of subpopulations (S) for attaining Ne = 500 would be smaller. With S* = 20, for instances, Eq. [3] gives S = 14.5 instead of S = 50, for FST = 0.05.
The estimation of parameter FST does not necessarily require the existence of well delimited and geographically isolated populations. This parameter also makes sense when the metapopulation is a continuum, as assumed under a model of isolation by distance. Parameter FST is then a function of the internal structure of the population or due to reproduction within the neighborhood, and can be estimated on the basis of the sites from which material to be genotyped has been collected. Parameter FST may be substituted for GST, since the latter is the multiple allelic version of FST and because FST is perfectly estimable with multiple alleles (Weir, 1996).
Furthermore, Eq. [3] shows that it is impossible to measure Ne appropriately if the structure of the metapopulation is unknown or, in other words, if the number of the component subpopulations (S*) and the interpopulation diversity (FST) under real conditions are unknown. In an extreme case, it may be assumed that all existing populations are sampled or covered so that S = S*.
With S = S*, Eq. [3] shows that Ne is now dominated by the total number of plants n and by the total inbreeding, FIT, especially when C2
0. This makes sense because, given that S = S*, differences among populations are under control and subpopulations are no longer sampling units to be considered. In intermediate cases (S < S*), obtaining the Ne value requires knowing the total number S*.
It is important to note that parameters FIT, FIS, and FST (or equivalent GST) can actually be computed through genetic markers for any type of genetic material, for instance a group of varieties, lines, accessions, etc. Under such a fixed model, however, conditions of the variance effective size theory are not satisfied. To be applicable for estimating an Ne value, the genetic materials under investigation should share a common genetic history and should have diverged as a result of genetic drift. This condition is reasonably met in research of natural populations by means of neutral genetic markers. If materials studied belong to an arbitrary group, it is advisable to use the "status" number Ns proposed by Lindgren et al. (1996)(1997) to measure the representativeness of the group, including the coancestry between all pairs of individuals and the coancestry of individuals with themselves. Ns is half the inverse of the coancestry of the group of individuals in question. The status number is a very general measurement because it does not require a regular pedigree structure nor does it require the conditions of a random model. It can also be applied when a reference population cannot be clearly defined.
The Case of One Population
In many cases, the main concern is to measure representativeness of a sample from a single reference population that may be of infinite or finite size. The following circumstances are relevant.
Reference Population of Infinite Size
Here the reference population consists of a single population and the aim is to verify the degree to which a sample of n individuals or seeds represents that specific population. The individuals sampled from the reference population may or may not have a family structure. Therefore, the following situations are considered.
Individuals without family structure. Now, the appropriate expression of Ne is taken from equation Eq. [3], considering FST = 0, such that
![]() | [4] |
Eq. [4] under inbreeding equilibrium is equal to Ne = n (1 - 0.5s). Also, given FST = 0, then FIT = FIS = f. Now, for a given n, Ne is only affected by the inbreeding of the population. Under panmixia, Ne = n and under complete self-fertilization, Ne = 0.5n. Therefore, for collection or in situ preservation, the species with the largest natural rate of self-fertilization requires more effort to compensate for the reduction effect of s (or f) on Ne.
It is relevant to verify whether a set of samples, or in situ populations, each with its own effective size, has an aggregate effective size equal to the sum of the individual Ne. This happens only if FST = 0 or is negligible. In fact, given that ni is the number of individuals of population i, and FIS = f is the species parameter under the model adopted, it is inferred, from Eq. [4], that [Ne]i = ni/(1 + f). From Eq. [3], and representing the set of individuals by n =
ni, then
![]() |
![]() |
Strictly, Eq. [4] applies only when the n individuals involved are not related to each other, which may not occur under actual circumstances. It may approximately apply when random sampling is performed in large populations, restricting to a minimum the average coancestry among the sampled individuals.
Individuals with family structure. A family structure is created when seeds are collected from a given number (P) of seed parents. For a large reference population, Ne is expressed as follows, for unrelated families
![]() | [5] |
m[(1 + C)2/P - 1/n] + (1 + f)/2n which is an adaptation of Eq. [3]. Here, f refers to the inbreeding coefficient of the offspring generation. From Eq. [5], it can be seen that an inequality of seed numbers (C2 > 0) per progeny reduces Ne. In random sampling, this number is unknown, but may be substituted for the value expected under binomial distribution, i.e., C2 = (1/n)P(1 - 1/P). If the same number of seeds per parent is taken (female gametic control), then C2 = 0 which increases Ne, as is well known.
To examine limiting Ne values, consider a sufficiently large n and C2 = 0, then effective size value approaches Ne = P/(2
m), which depends only on the number of seed parents and the coancestry coefficient among maternal sibs within families. For small n, Ne values will be below this limit. When genetic marker data are not available to estimate
m and f, these values can be replaced by
m = (1 + s)2/[4(2 - s)] and f = s/(2 - s); these are valid for populations in inbreeding equilibrium. In addition,
m expressed as a function of the selfing rate (s) as shown, is applicable only when reproduction is a mixture of selfing and random mating and thus excludes the possibility of biparental sibs. For noninbred maternal half sibs, f = 0 and
m = 1/8. For complete autogamic populations, f = 1 and
m = 1. Increasing the coancestry within progenies requires sampling of a larger number of seed parents to obtain an Ne value equivalent to that obtained in a situation of panmixia with half-sib families.
Equation [5] is also applicable when gametes are handled or when families are obtained with controlled pollination. For panmictic species in which full sibs from nonrelated plants are generated, f = 0 and
m = 1/4. Likewise, for selfed families of non-inbred plants, f = 1/2 and
m = 1/2. Again, it is important to note that in Eq. [5], these two parameters refer to the offspring and not to the parental generation. It is true that in germplasm collections under natural conditions, unlike accession regeneration, controlled pollination is not the usual practice. Therefore, Eq. [5] would be applicable for regeneration only if the reference population (accession) can be considered sufficiently large.
It may be of interest to verify the representativeness of a single maternal family, sampled from a large (infinite sized) reference population. With n seeds sampled, Eq. [5] is adequate, introducing the quantities P = 1 and C2 = 0. If, in addition, it is assumed that the reference population is in inbreeding equilibrium and that the mating system is by mixed self and random mating, such that f = s/(2 - s) and
m = (1 + s)2/[4(2 - s)] as already mentioned, the resulting Ne expression is
![]() |
Table 1 shows the representativeness (Ne) contained in one maternal family for different values of the rate of self-fertilization (s) and several sample sizes (n). Increasing n is effective only for smaller s values, and even for s = 0 there is hardly an advantage of taking more than n = 50 seeds per family. The total Ne value for a set of maternal families is strongly affected by the number of seed parents (P), especially when they are unrelated, a situation in which the individual Ne's can be summed up.
|
m and f are estimated from codominant genetic marker data, using for instance the analysis of variance of gene frequency technique (Weir, 1996), and the genotyped individuals (usually seedlings) are kept with their family structure. In the latter case, Eq. [5] is the adequate expression for obtaining an estimate of Ne. Another weak point of the model considered here for mixed mating systems is the assumption that rate s is constant for all seed parents.
Reference Population of Finite Size
In accession regeneration, reference populations may be small. Such reference populations are the accessions themselves because maintaining the representativeness contained in the original accession in subsequent generations is a concern. This matter is most relevant in the case of accessions with high genetic value that contain a limited number of seeds or face the risk of deterioration. We exclude here monomorphic accessions such as pure lines, where the effective size concept makes no sense.
Given a population of finite size (N) in a specific generation and n seed samples collected from the next generation, Crossa and Vencovsky (1994)( 1997) and Vencovsky and Crossa (1999) have developed expressions characterized by their ease of use and broad applicability. The most general formulas for random sampling of gametes and for female gametic control are
![]() | [6] |
![]() |
![]() |
Female gametic control sampling.
![]() | [7] |
![]() |
![]() |
In Equations[6] and [7], population parameters s and f refer to the parental generation. In these expressions, the magnitude of s and f will depend on the circumstances as well as on the manner in which the regeneration is done. As already observed, given inbreeding equilibrium and if natural reproduction is allowed, inbreeding will be f = s/(2 - s). If all plants of a panmictic species are self-fertilized manually, f = 0 but s = 1. Also, if all plants of a partly allogamic species are self-fertilized, s = 1 again, but f is now species specific, and will depend on its rate s and on the regeneration procedure used in previous generations. For successive self-fertilization, in generation t, ft = (1/2)(1 + ft-1) as is well known.
Equations [6] and [7], apart from providing Ne for specific situations, allow the examination of factors that affect representativeness through numerical evaluations. If Equations [6] and [7] are compared, for example, it is concluded that gametic control is a major factor in regeneration, even more so than in collection. Ratios u and v are also important parameters. Consider, for instance for simplicity, that out of N original seeds from an accession, only 80% are effectively used in regeneration or multiplication, such that u = v = 0.8 and P = M = 0.8N. Consider, likewise for simplicity, that the regenerated sample size (n) has been kept equal to that of the original accession to prevent germplasm bank material from increasing, i.e., that n = N. In such circumstances and if natural reproduction is allowed, Ne values given in Table 2 are to be expected.
|
In summary, these expressions allow us to assert that the requirements for a safe regeneration process are (i) to prevent accession losses greater than 20% (because of poor germination, plant rejection, or loss or any other environmental factor that may affect plant stand in the regeneration plot) and (ii) to apply gametic control. They also show that for species with any type of sexual reproduction (0
s <1), controlled self-fertilization is a highly efficient method for producing regenerated samples, provided that gametic control is applied. Situations in which inbreeding depression is a limiting factor are naturally excluded.
In general, attaining an adequate representativeness becomes more difficult in collection activities as rate s increases. The reverse occurs in accession regeneration, provided efforts are made for keeping sample integrity at acceptable levels [u*
4/(s2 + 5)] and female gametic control is practiced.
The concepts of Ne presented so far are related to random genetic drift and may be used to make inferences about deviations expected in allelic frequencies because of sampling. The "status" number (Ns) proposed by Lindgren et al. (1996) is based on inbreeding and coancestry. Therefore, these numbers provide insufficient information on the probability of allele retention in samples. Considering that sampling involves the risk of losing alleles, a different approach is required.
Let us consider a situation where, in a specific locus, allele a is rare and occurs at frequency q = 1 - p, where p is the frequency of the other allele. It is common to estimate the sample size (n) required to retain at least one allele a with a certain probability level
. If the population to be sampled is large and obtained through random mating, this probability may be expressed as
= 1 -
no and if
and p are chosen, the required no size is obtained. In a naturally inbreeding population,
= 1 -
nf. If the same
is taken in both cases, given that nf is the sample size required under the inbreeding situation, and equating both expressions, this quantity can be written as nf = bf no, where bf = 2/(1 + cf) and cf = [ln (p + fq)/ln (p)] (0 < p < 1). At the extremes, it can be seen that, for f = 0, bf = 1 and for f = 1, bf = 2. Therefore, sample size nf should be, at most, twice the necessary size required for a panmictic (no) population. Naturally, the no value should be previously obtained as a function of a chosen
level (e.g.,
= 0.95) and of the frequency of the rare allele (e.g., q = 0.01).
Following the same line of thought, an alternative approach for adjusting sample size to the effect of inbreeding is to consider effective size as follows. For a large reference population, as shown earlier, Ne = n/(1 + f), such that for f = 0, Ne = n = no. Writing Ne = nf for f > 0 then nf =
no = b*fno. At the extremes, for f = 0 and f = 1, this coefficient b*f is equal to bf. However, for intermediate values of f, it can be shown that b*f > bf for any level
and that the difference b*f - bf is negligible. Then, it can be said that increasing sample size to compensate for the level of inbreeding of a reference population, by the criterion nf = (1 + f)no , which leads to the same amount of drift as that expected under random mating (and sample size no), also prevents the loss of alleles in probabilistic terms. Actually, coefficient b*f =
ensures a probability of allele retention slightly greater than
, and therefore may be considered a reliable sampling criterion (Crossa and Vencovsky, 1999).
Effective Population Size in Recurrent Selection Schemes
Monitoring effective population size through selection cycles is an ongoing practice. Some of the Ne expressions shown here are also adequate for recurrent selection schemes if quantities u and v are taken as selection proportions. Neutral alleles to artificial selection are the indicators of the effect of sampling (drift) along cycles.
Equations 6 and 7 can be applied in schemes where the selected individuals are also the recombination units, as in mass selection or in progeny testing. In such cases, P and M are the numbers of selected seed and pollen parents, respectively, and u = P/N and v = M/N, the corresponding selection proportions, applied on both sexes. Therefore, if N is the number of plants before selection, the selected seed and pollen parents are P = Nu and M = Nv, respectively. For selection on both sexes, with equal intensity, u = v and P = M. For selection on seed parents only, such as half mass selection, u = P/N and v = 1.
Assuming a panmictic species (s = f = 0) and selection acting on both sexes, Eq. [6] can be rewritten as
![]() | [8] |
0, then Eq. [8] is Ne
Nu. Note that in this case, Ne is directly proportional to selection proportion.
Under the same circumstances but with female gametic control, Eq. [7] becomes
![]() | [9] |
Assuming constant population (n = N) size and 1/Nu
0 then Eq. [9] simplifies to Ne = 4Nu/(4 - u). Note that the relation between Ne and u is no longer linear.
If Eq. [8] and [9] are compared for 10% selection pressure (u = 0.10), in the absence of female gametic control, Ne = 0.100N, and with female gametic control, Ne = 0.102N. The gain in Ne through gametic control is very small as always occurs when selection is intense (small u).
Under the same circumstances but with selection acting only on the female side, Ne expressions are Ne
4Nu/(1 + 3u) for random sampling of offspring from selected parents (Eq. [8]) and Ne
4Nu/(1 + 2u) for gametic control (Eq. [9]). With u = 0.10 these quantities are Ne = 0.31N and Ne = 0.33N, respectively. As before, when u is relatively small, taking an equal number of seeds per selected parent is inefficient for increasing Ne as compared with random sampling.
If epistatic gene effects are negligible, it is known that, for panmictic species, expected progress for a quantitative trait from selection on one sex is half of that expected when selection is practiced on both sexes under comparable selection intensities. What partly compensates for the smaller efficiency of the former scheme is the larger Ne value it yields after each cycle. The ratio between the corresponding Ne's, however, is not constant and varies according to selection intensity. Assuming gametic control and equal selection proportion for both cases, the ratio is
![]() |
In inter- and intrafamily selection, where sibs are now the recombination units, effective size may be estimated on the basis of each progeny Ne value. It is therefore valid to use expressions in the section describing Ne's for individuals with progeny structure. When selecting P families and n plants per family, it is valid to obtain a global Ne taking PNe, where Ne is the effective size of each family. When n is variable, individual Ne's should be summed. This process is valid when families are not related to each other.
To measure Ne along several cycles of selection selective cycles, given that Ne j is the size for cycle j (j = 1,2,...,t), the mean effective size may be obtained as
![]() |
The accumulated effective population size
e
is then
![]() |
Effective Population Size under the Modified Pedigree Method of Selection: Single Seed Descent Method and the Bulk System
The single seed descent (SSD) selection method (Brim, 1966) is widely used in self-pollinated crops and offers several advantages, one being the high level of genetic variability that can be maintained among the lines of the population after successive generations of selfing. In other words, the SSD produces a better representation of the original population in advanced generations of selfing.
In this section, the variance effective population size is derived for quantifying the representativeness of the SSD method as compared with the bulk system, in which seeds are harvested from all plants of a given generation and then bulked. The sample of seeds is then taken randomly from this bulk without control of the pedigree. A potentially infinite population (N
) is taken as the initial reference population. This base population is usually the F2 generation of a cross between two homozygous lines, but can be any segregating population used as source of new inbred lines.
It is assumed that in this initial generation (F2), a set of n plants is taken and that n (F3) seeds are subsequently sampled after self-fertilization. In both methods, SSD and bulk, this number (n) is then maintained constant over generations. Only neutral loci are considered for measuring the amount of drift.
The inbreeding coefficient fi is here indexed to indicate the respective generation, being f2 = 0, f3 = 1/2 and fi = 1 - (1/2)i-2 for generation Fi. Gametic control is intrinsic to the SSD method because an equal number of seeds are taken from each seed parent, in contrast to the bulk system in which seeds are taken randomly. Consequently, for studying Ne for the bulk and SSD procedures, Eq. [6] and [7] should be utilized, respectively.
Single Seed Descent (Female Gametic Control)
For obtaining Ne, it is necessary to consider two distinct steps, the first one from F2 to F3 and the second one from F3 to subsequent generations. For the first step, from F2 to F3, given that f in expression [7] refers to the parental generation, f2 = 0 and s = 1. In this situation the number of sampled plants is the same as the number of seeds that are used later and therefore P = n. Since the reference population is considered very large, P is a very small fraction of the total reference population such that P/N = u
0. Thus, Eq. [7] reduces to
![]() |
For generation F3 and subsequent ones, the inbreeding in the parental generation is fi (i = 4,5,...,t); the n parents always generate n seeds such that u = 1 and s = 1. Thus, Eq. [7] reduces to
![]() |
Bulk system (random sampling of gametes)
The same two cases considered for single seed descent occur in this case, the difference now being that the n seeds are selected at random (without female gametic control). For the case of generation F2 to F3, Eq. [6] simplifies to
![]() |
0.
For generation F3 and subsequent ones, Eq. [6] reduces to
![]() |
0.
Table 3
shows Ne values for SSD and bulk schemes with n = 100 considered constant over generations of selfing. Mean values of Ne (
e) are also shown. Clearly the SSD, which controls the number female gametes, is very effective in maintaining high values of effective population size and therefore exerts better control on genetic drift and allows maintaining higher levels of genetic diversity as compared to the bulk system.
|
e at generation t for SSD and the bulk schemes are as follows
![]() |
![]() |
Substituting the values n = 100 and t = 7 in these expressions gives
e
= 253.8 and
e
= 49.8 as shown in the bottom line of Table 3.
A ratio between
e
and
e
gives an estimate of the average increase in the effective population size attained with the SSD method as compared to the bulk system. This ratio is
![]() |
(t = 3,4,...,t) and approximates t - 2 when t increases. For n = 100 and t = 7
e
/
e
= 253.8/49.8 = 5.1.
With t sufficiently large, (1/2)t-1 becomes negligible and the
e mean approaches
e
0.5n
and
e
0.5n. For example, seeds collected in generation F5 will have an
e value 3.3 larger with SSD than with bulk, that is,
e
/
e
= 3.3.
The SSD method maintains genetic drift at lower levels because of female gametic control. Since s = 1, this control is also exerted on the male gametes. Under practical conditions, the SSD method produces a variance of allele frequencies 3 to 5 times smaller than the bulk method when advancing to F5 or F7 generations of selfing, respectively. Clearly, the SSD method offers good protection against random loss of alleles during the selfing generations. However, it should be pointed out that with large n values the bulk method can also lead to adequate
e values since this tends to
e
0.5n. Results indicate that the advantage of the SSD method for preventing the random loss of alleles due to genetic drift is clear under small sample sizes, such as those used in common breeding programs where several F2 generations are simultaneously advanced, a situation in which it becomes difficult to maintain large n values in each population.
| CONCLUSIONS |
|---|
|
|
|---|
Various types of Ne were studied depending on whether one or several populations were considered. Ne given by Eq. [3], for the case of several populations, showed that with n sufficiently large, the effective number is scarcely influenced by the total inbreeding (FIT). Thus, Ne becomes mostly dependent on the allelic diversity among populations (FST) and the number of subpopulations (S) studied.
The case of Ne, when one population of infinite size is considered and individuals do not have a family structure, requires modifying Eq. [3] by considering FST = 0 such that Ne = n/(1 + f), where f = FIT = FIS. Under random mating, Ne = n and under complete self fertilization Ne = 0.5n. For collection, species with largest rate of self-fertilization requires more effort to compensate for the reduction effects of f on Ne. When a single population of infinite size is considered but individuals have a family structure, Eq. [3] shows that, at the limit, Ne depends only on the number of seed parents and the coancestry coefficient among maternal sibs within families.
Accession regeneration is the case where the reference population is of finite size and Eq. [6] and [7] provide expressions for computing Ne for random sampling of gametes and female gametic control, respectively. Gametic control is a major factor in regeneration. Under constant accession size, the loss of 20% of seeds may be recovered in terms of Ne when female gametic control is applied. This is not attainable with random sampling.
When studying Ne in recurrent selection schemes by adapting Eq. [6] and [7] and considering that the selected individuals are also the recombinant units, as in mass selection or in progeny testing, results show that the gain in Ne through gametic control is very small when selection is intensive. When comparing effective population sizes for the SSD method versus the bulk system, results showed that SSD maintained genetic drift at a much lower level due to gametic control. The SSD method offers a good protection against random loss of alleles during selfing generations specially under small sample sizes.
Estimating parameters such as the inbreeding coefficient, the rate of natural self-fertilization, the coancestry of individuals within families, and the allelic diversity among subpopulations, through codominant genetic markers, is fundamental for obtaining reliable estimates of effective population size in many circumstances.
Received for publication June 15, 2002.
| REFERENCES |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||