Genomic prediction using machine learning: a comparison of the performance of regularized regression, ensemble, instance-based and deep learning methods on synthetic and empirical data

dc.contributor.authorLourenço, Vanda M.
dc.contributor.authorOgutu, Joseph O.
dc.contributor.authorRodrigues, Rui A.P.
dc.contributor.authorPosekany, Alexandra
dc.contributor.authorPiepho, Hans-Peter
dc.date.accessioned2026-01-30T08:28:44Z
dc.date.available2026-01-30T08:28:44Z
dc.date.issued2024
dc.date.updated2025-11-04T18:09:16Z
dc.description.abstractBackground: The accurate prediction of genomic breeding values is central to genomic selection in both plant and animal breeding studies. Genomic prediction involves the use of thousands of molecular markers spanning the entire genome and therefore requires methods able to efficiently handle high dimensional data. Not surprisingly, machine learning methods are becoming widely advocated for and used in genomic prediction studies. These methods encompass different groups of supervised and unsupervised learning methods. Although several studies have compared the predictive performances of individual methods, studies comparing the predictive performance of different groups of methods are rare. However, such studies are crucial for identifying (i) groups of methods with superior genomic predictive performance and assessing (ii) the merits and demerits of such groups of methods relative to each other and to the established classical methods. Here, we comparatively evaluate the genomic predictive performance and informally assess the computational cost of several groups of supervised machine learning methods, specifically, regularized regression methods, deep , ensemble and instance-based learning algorithms, using one simulated animal breeding dataset and three empirical maize breeding datasets obtained from a commercial breeding program. Results: Our results show that the relative predictive performance and computational expense of the groups of machine learning methods depend upon both the data and target traits and that for classical regularized methods, increasing model complexity can incur huge computational costs but does not necessarily always improve predictive accuracy. Thus, despite their greater complexity and computational burden, neither the adaptive nor the group regularized methods clearly improved upon the results of their simple regularized counterparts. This rules out selection of one procedure among machine learning methods for routine use in genomic prediction. The results also show that, because of their competitive predictive performance, computational efficiency, simplicity and therefore relatively few tuning parameters, the classical linear mixed model and regularized regression methods are likely to remain strong contenders for genomic prediction. Conclusions: The dependence of predictive performance and computational burden on target datasets and traits call for increasing investments in enhancing the computational efficiency of machine learning algorithms and computing resources.en
dc.description.sponsorshipOpen Access funding enabled and organized by Projekt DEAL.
dc.description.sponsorshipFundação para a Ciência e a Tecnologia
dc.description.sponsorshipGerman Federal Ministry of Education and Research
dc.description.sponsorshipDeutsche Forschungsgemeinschafthttp://dx.doi.org/10.13039/501100001659
dc.description.sponsorshipUniversität Hohenheim (3153)
dc.identifier.urihttps://doi.org/10.1186/s12864-023-09933-x
dc.identifier.urihttps://hohpublica.uni-hohenheim.de/handle/123456789/18407
dc.language.isoeng
dc.rights.licensecc_by
dc.subjectGenomic prediction
dc.subjectGenomic selection
dc.subjectBreeding value
dc.subjectPredictive accuracy
dc.subjectPredictive ability
dc.subjectHigh-dimensional data
dc.subjectSupervised machine learning methods
dc.subject.ddc630
dc.titleGenomic prediction using machine learning: a comparison of the performance of regularized regression, ensemble, instance-based and deep learning methods on synthetic and empirical dataen
dc.type.diniArticle
dcterms.bibliographicCitationBMC genomics, 25 (2024), 1, 152. https://doi.org/10.1186/s12864-023-09933-x. ISSN: 1471-2164 London : BioMed Central
dcterms.bibliographicCitation.articlenumber152
dcterms.bibliographicCitation.issn1471-2164
dcterms.bibliographicCitation.issue1
dcterms.bibliographicCitation.journaltitleBMC genomics
dcterms.bibliographicCitation.originalpublishernameBioMed Central
dcterms.bibliographicCitation.originalpublisherplaceLondon
dcterms.bibliographicCitation.volume25
local.export.bibtex@article{Lourenço2024, doi = {10.1186/s12864-023-09933-x}, author = {Lourenço, Vanda M. and Ogutu, Joseph O. and Rodrigues, Rui A.P. et al.}, title = {Genomic prediction using machine learning: a comparison of the performance of regularized regression, ensemble, instance-based and deep learning methods on synthetic and empirical data}, journal = {BMC Genomics}, year = {2024}, volume = {25}, number = {1}, }
local.subject.sdg2
local.title.fullGenomic prediction using machine learning: a comparison of the performance of regularized regression, ensemble, instance-based and deep learning methods on synthetic and empirical data
local.university.bibliographyhttps://hohcampus.verw.uni-hohenheim.de/qisserver/a/fs.res.frontend/pub/view/44072

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
12864_2024_Article_9933.pdf
Size:
10.95 MB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
supp.zip
Size:
60.78 MB
Format:
Unknown data format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
7.85 KB
Format:
Item-specific license agreed to upon submission
Description: