Figure 1: The 'training population' is genotyped and phenotyped to 'train' the genomic selection (GS) prediction model. Genotypic information from the breeding material is then fed into the model to calculate genomic estimated breeding values (GEBV) for these lines. From Heffner et al. 2009 Crop Sci. 49:1–12
Figure 2: Information from a majority of lines in the breeding population (the training set) is used to create the prediction model. The model is then used to predict the phenotypes of the remaining lines (the validation set), using genotypic information only. The results from the model are compared to the actual data to give the prediction accuracy. Image courtesy of Martha Hamblin, Cornell University
Genomic Selection is a new plant breeding method that uses statistical modeling to predict how a plant will perform, before it is field-tested. Novel statistical models and bioinformatics tools, combined with increasingly abundant genomic information, have enabled the deployment of prediction-based breeding methods such as Genomic Selection in crop breeding programs.
To implement Genomic Selection, first a 'training population' is formed that is composed of plant lines covering all of the important material in the breeding program in question. The training population is then genotyped and phenotyped for all traits of interest to the breeding program. The resulting set of data comprises a full record of the genetic makeup of individuals in the training population, together with detailed phenotypic characteristics for multiple traits on these same lines.
A Genomic Selection 'prediction model' is subsequently created that integrates these two sets of data. The model produces a Genomic Estimated Breeding Value (GEBV) for lines within the breeding population for which genotypic information is available. GEBVs represent the overall predicted quality, or value, of a line as a potential parent for crossing. Essentially, it is this GEBV that tells a breeder how a plant will perform in the field, before it is field-tested
Before the prediction model can be applied to the breeding population, the accuracy of the model must be tested. For this, the majority of the training population is used to create a prediction model, which is then used to predict the GEBVs of the remaining individuals in the training population, using genotypic data only. This enables researchers to 'test' and refine the prediction model to ensure the prediction accuracy is high enough that future predictions can be relied upon. Once validated, the model can be applied to a breeding population to calculate GEBVs of lines for which genotypic, but not phenotypic, information is available.
The significant advantage of genomic selection is that phenotyping for lines for which GEBVs have been calculated can be done after selection and crossing, while the breeding cycle advances on the basis of the GEBVs alone. This advantage allows for more cycles and increased genetic gain per unit time using Genomic Selection, compared to phenotypic selection. Of course, Genomic Selection does not eliminate the need for phenotyping. On the contrary, the training population must be regularly updated with phenotypic and genotypic data of new lines in the breeding population, to maintain or increase accuracy as the breeding population evolves.
In conventional cassava breeding, new breeding material resulting from a cross is assessed by breeders over several growing seasons and in multiple locations before they select superior clones for the next cycle of germplasm improvement. Cassava is vegetatively propagated, and limited numbers (5-10) of propagules can be obtained from one plant. To obtain sufficient propagules to perform replicated and multi-locational trials for meaningful phenotypic characterization, several cycles of propagation are thus required.
Cassava breeding cycles, from seedling to multi-location field trials, can therefore take more than five years, limiting the rate of variety improvement and breeders' ability to respond to new challenges. Shifting from phenotype-based selection to genotype-based selection has the potential to overcome lengthy cassava breeding cycles. Using Genomic Selection, breeders can select superior clones using GEBVs after only two years, which more than doubles the number of breeding cycles feasible per unit of time.
In addition to shortening the cassava breeding cycle, genomic selection also promises to increase the accuracy of selection in cassava breeding. It is not feasible to advance many thousands of cassava seedlings through to adult clonal evaluation. In conventional cassava breeding schemes, breeders thus discard many clones at the seedling stage, a point at which important cassava traits can only be poorly predicted. Thus, valuable clones may be lost. Genomic Selection allows breeders to use GEBVs to predict adult performance at the seedling stage. This significantly increases the number of seedlings that can be accurately evaluated.
Genomic Selection promises to revolutionize cassava breeding for the future. Giving breeders the ability to select based on predictions rather than observations will result in much improved genetic gains and efficiency in cassava breeding. This dramatic increase in efficiency will allow cassava breeders to meet the unprecedented demands of crop improvement.