Evaluation of association mapping and genomic prediction in diverse barley and cauliflower breeding material

Thorwarth, Patrick

Doctoral Thesis

2018

Evaluation of association mapping and genomic prediction in diverse barley and cauliflower breeding material

Thorwarth, Patrick

dissertation_abgabeversion_p_thorwarth.pdf (374.12 KB)

Abstract (English)

Due to the advent of new sequencing technologies and high-throughput phenotyping an almost unlimited amount of data is available. In combination with statistical methods such as Genome-wide association mapping (GWAM) and Genomic prediction (GP), these information can provide valuable insight into the genetic potential of individuals and support selection and crossing decisions in a breeding program. In this thesis we focused on the evaluation of the aforementioned methods in diverse barley (Hordeum vulgare L.) and cauliflower (Brassica oleracea var. botrytis) populations consisting of elite material and genetic resources. We concentrated on the dissection of the influence of specific parameters such as marker type, statistical models, influence of population structure and kinship, on the performance of GWAM and GP. For parts of this thesis, we additionally used simulated data to support findings based on empirical data. First, we compared four different GWAM methods that either use single-marker or haplotypes for the detection of quantitative trait loci in a barley population. To find out the required population size and marker density to detect QTLs of varying effect size, we performed a simulation study based on parameter estimates of the empirical population. We could demonstrate that already in small populations of about 100 individuals, QTLs with a large effect can be detected and that at least 500 individuals are necessary to detect QTLs with an effect < 10%. Furthermore, we demonstrated an increased power of haplotpye based methods in the detection of very small QTLs. In a second study we used a barley population consisting of 750 individuals as training set to compare different GP models, that are currently used by scientists and plant breeders. From the training set 33 offspring families were derived with a total of 750 individuals. This enabled us to assess the prediction ability not only based on cross-validation but also in a large offspring population with varying degree of relatedness to the training population. We investigated the effects of linkage disequilibrium and linkage phase, population structure and relatedness of individuals, on the prediction ability. We could demonstrate a strong effect of the population structure on the prediction ability and show that about 11,203 evenly spaced SNP markers are necessary to predict even genetically distant populations. This implies that at the current marker density prediction ability is based on the relatedness of the individuals. In a third study we focused on the evaluation of GWAM and GP in cauliflower. We focused on the evaluation of genotyping-by-sequencing and compared the influence of imputation methods on the prediction ability and the number of significant associations. We obtained a total 120,693 SNPs in a random collection of 174 cauliflower genebank accessions. We demonstrated that imputation did not increase prediction ability and that the number of detected QTLs only slightly differed between the imputed and the unimputed data set. GP performed well even in such a diverse gene bank sample, but population structure again influenced the prediction ability. We could demonstrate the usefulness and limitations of Genome-wide association mapping and genomic prediction in two species. Even though a lot of research in the field of statistical genetics has provided valuable insight, the usage of Genomic prediction should still be applied with care and only as a supporting tool for classical breeding methods.

Abstract (German)

Durch die Einführung neuer Sequenzierungstechnologien, welche tausende genetische Marker verfügbar macht und die Hochdurchsatz-Phänotypisierung, steht eine beinahe unbegrenzte Anzahl an Daten zur Verfügung. In Verbindung mit statistischen Methoden wie der Genomweiten Assoziationskartierung (GWA) und der Genomischen Vorhersage (GP), können nützliche Erkenntnisse über das genetische Potential von Individuen erhalten werden. In dieser Doktorarbeit haben wir uns auf die Bewertung dieser Methoden in diversen Gerste (Hordeum vulgare L.) und Blumenkohl (Brassica oleracea var. botrytis) Populationen, bestehend aus Elitematerial und genetischen Ressourcen, fokussiert. Wir analysierten den Einfluss verschiedener Parameter auf die Ergebnisse der GWA und der GP. Für Teile der Doktorarbeit verwendeten wir simulierte Daten, um unsere Forschungsergebnisse zu unterstützen. Zuerst verglichen wir vier Methoden zur GWA. Diese verwenden einzelne Marker oder Haplotypen für die Detektion von möglichen Regionen eines quantitativen Merkmals (QTL). Um die Populationsgröße und Markerdichte herauszufinden, welche notwendig ist um QTL mit unterschiedlicher Effektstärke zu entdecken, wurde eine Simulationsstudie verwendet, die auf Parameterschätzungen der empirischen Daten einer Gerstenpopulation beruht. Wir wiesen nach, dass Populationen von 100 Individuen ausreichen um QTLs mit einem großen Effekt zu entdecken und dass mindestens 500 Individuen notwendig sind, um QTLs mit einem Effekt von < 10% aufzuzeigen. Des Weiteren, zeigten wir, dass eine Erhöhung der Teststärke durch die Verwendung von haplotyp-basierten Methoden zur Detektion von QTLs erreicht werden kann. In einer zweiten Studie verwendeten wir eine Gerstenpopulation bestehend aus 750 Individuen als Trainingset um verschiedene Methoden zur GP zu vergleichen. Auf Basis der Trainingspopulation wurden 33 Familien entwickelt, die insgesamt aus 750 Individuen bestehen. Dies ermöglichte es uns, die Vorhersagegenauigkeit nicht nur auf Basis von Kreuzvalidierung zu bestimmen, sondern ebenfalls in einer großen Nachkommenspopulation mit unterschiedlichem Verwandtschaftsgrad zum Trainingset. Wir erforschten unter anderem den Einfluss des Kopplungsungleichgewichtes und der Populationsstruktur auf die Vorhersagegenauigkeit. Wir konnten zeigen, dass die Populationsstruktur einen starken Effekt auf die Vorhersagegenauigkeit hat und dass 11,203 SNP Marker notwendig sind, um genetisch entfernte Populationen vorherzusagen. In einer dritten Studie fokussierten wir uns auf die Evaluierung der GWAM und der GP in Blumenkohl. Hier untersuchten wir den Einflusses von Genotypisierung durch Sequenzierung (GBS) und Methoden zur Imputierung fehlender Werte sowie deren Einfluss auf die Vorhersagegenauigkeit und die Anzahl an signifikanten Assoziationen. Die Verwendung von Imputierungsmethoden führte nicht zu einer Erhöhung der Vorhersagegenauigkeit und die Anzahl der gefundenen QTLs wich nur geringfügig zwischen den imputieren und nicht-imputierten Datensätzen ab. Die GP funktionierte gut in diesem diversen Genbank Material, aber die Populationsstruktur hatte einen starken Einfluss auf die Vorhersagegenauigkeit. Wir konnten in dieser Doktorarbeit Nutzen und Limitierung der GWA und der GP anhand von Gerste und Blumenkohl aufzeigen. Obwohl die vielen Forschungsbemühungen im Bereich der statistischen Genetik wichtige Erkenntnisse geliefert haben, sollten die hier verwendeten Methoden mit Vorsicht angewendet werden und zurzeit nur als unterstützende Maßnahmen zu klassischen Züchtungsverfahren gesehen werden.

Publication license

CC BY-NC-ND 3.0

Faculty

Faculty of Agricultural Sciences

Institute

Institute of Plant Breeding, Seed Science and Population Genetics

Examination date

2018-02-21

Supervisor

Schmid, Karl J.

Cite this publication

Thorwarth, P. (2018). Evaluation of association mapping and genomic prediction in diverse barley and cauliflower breeding material. https://hohpublica.uni-hohenheim.de/handle/123456789/6267

Identification

https://hohpublica.uni-hohenheim.de/handle/123456789/6267

Language

English

Classification (DDC)

630 Agriculture

Collections

Institut für Pflanzenzüchtung, Saatgutforschung und Populationsgenetik

Free keywords

Genomic prediction Genome wide association mapping Barley Cauliflower Genomische Vorhersage Genomweite Assoziatioskartierung Gerste Blumenkohl

Standardized keywords (GND)

Gerste Blumenkohl

BibTeX@phdthesis{Thorwarth2018,
url = {https://hohpublica.uni-hohenheim.de/handle/123456789/6267},
author = {Thorwarth, Patrick},
title = {Evaluation of association mapping and genomic prediction in diverse barley and cauliflower breeding material},
year = {2018},
school = {Universität Hohenheim},
}

Share this publication

Full item page

A new version of this entry is available:

Evaluation of association mapping and genomic prediction in diverse barley and cauliflower breeding material

Abstract (English)

Abstract (German)

File is subject to an embargo until

This is a correction to:

A correction to this entry is available:

This is a new version of:

Other version

Notes

Publication license

Publication series

Published in

Other version

Faculty

Institute

Examination date

Supervisor

Cite this publication

Edition / version

Citation

Identification

DOI

ISSN

ISBN

Language

Publisher

Publisher place

Classification (DDC)

Collections

Original object

University bibliography

Free keywords

Standardized keywords (GND)

Sustainable Development Goals

BibTeX

Share this publication