A machine-learning heuristic to improve gene score prediction of polygenic traits

Gespeichert in:

Bibliographische Detailangaben
Zeitschriftentitel:	Scientific Reports
Personen und Körperschaften:	Paré, Guillaume, Mao, Shihong, Deng, Wei Q.
In:	Scientific Reports, 7, 2017, 1
Format:	E-Article
Sprache:	Englisch
veröffentlicht:	Springer Science and Business Media LLC
Schlagwörter:	Multidisciplinary

author_facet	Paré, Guillaume Mao, Shihong Deng, Wei Q. Paré, Guillaume Mao, Shihong Deng, Wei Q.
author	Paré, Guillaume Mao, Shihong Deng, Wei Q.
spellingShingle	Paré, Guillaume Mao, Shihong Deng, Wei Q. Scientific Reports A machine-learning heuristic to improve gene score prediction of polygenic traits Multidisciplinary
author_sort	paré, guillaume
spelling	Paré, Guillaume Mao, Shihong Deng, Wei Q. 2045-2322 Springer Science and Business Media LLC Multidisciplinary http://dx.doi.org/10.1038/s41598-017-13056-1 <jats:title>Abstract</jats:title><jats:p>Machine-learning techniques have helped solve a broad range of prediction problems, yet are not widely used to build polygenic risk scores for the prediction of complex traits. We propose a novel heuristic based on machine-learning techniques (GraBLD) to boost the predictive performance of polygenic risk scores. Gradient boosted regression trees were first used to optimize the weights of SNPs included in the score, followed by a novel regional adjustment for linkage disequilibrium. A calibration set with sample size of ~200 individuals was sufficient for optimal performance. GraBLD yielded prediction <jats:italic>R</jats:italic><jats:sup>2</jats:sup> of 0.239 and 0.082 using GIANT summary association statistics for height and BMI in the UK Biobank study (<jats:italic>N</jats:italic> = 130 K; 1.98 M SNPs), explaining 46.9% and 32.7% of the overall polygenic variance, respectively. For diabetes status, the area under the receiver operating characteristic curve was 0.602 in the UK Biobank study using summary-level association statistics from the DIAGRAM consortium. GraBLD outperformed other polygenic score heuristics for the prediction of height (<jats:italic>p</jats:italic> < 2.2 × 10<jats:sup>−16</jats:sup>) and BMI (<jats:italic>p</jats:italic> < 1.57 × 10<jats:sup>−4</jats:sup>), and was equivalent to LDpred for diabetes. Results were independently validated in the Health and Retirement Study (<jats:italic>N</jats:italic> = 8,292; 688,398 SNPs). Our report demonstrates the use of machine-learning techniques, coupled with summary-level data from large genome-wide meta-analyses to improve the prediction of polygenic traits.</jats:p> A machine-learning heuristic to improve gene score prediction of polygenic traits Scientific Reports
doi_str_mv	10.1038/s41598-017-13056-1
facet_avail	Online Free
format	ElectronicArticle
fullrecord	blob:ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTAzOC9zNDE1OTgtMDE3LTEzMDU2LTE
id	ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTAzOC9zNDE1OTgtMDE3LTEzMDU2LTE
institution	DE-Zi4 DE-Gla1 DE-15 DE-Pl11 DE-Rs1 DE-14 DE-105 DE-Ch1 DE-L229 DE-D275 DE-Bn3 DE-Brt1 DE-Zwi2 DE-D161
imprint	Springer Science and Business Media LLC, 2017
imprint_str_mv	Springer Science and Business Media LLC, 2017
issn	2045-2322
issn_str_mv	2045-2322
language	English
mega_collection	Springer Science and Business Media LLC (CrossRef)
match_str	pare2017amachinelearningheuristictoimprovegenescorepredictionofpolygenictraits
publishDateSort	2017
publisher	Springer Science and Business Media LLC
recordtype	ai
record_format	ai
series	Scientific Reports
source_id	49
title	A machine-learning heuristic to improve gene score prediction of polygenic traits
title_unstemmed	A machine-learning heuristic to improve gene score prediction of polygenic traits
title_full	A machine-learning heuristic to improve gene score prediction of polygenic traits
title_fullStr	A machine-learning heuristic to improve gene score prediction of polygenic traits
title_full_unstemmed	A machine-learning heuristic to improve gene score prediction of polygenic traits
title_short	A machine-learning heuristic to improve gene score prediction of polygenic traits
title_sort	a machine-learning heuristic to improve gene score prediction of polygenic traits
topic	Multidisciplinary
url	http://dx.doi.org/10.1038/s41598-017-13056-1
publishDate	2017
physical
description	<jats:title>Abstract</jats:title><jats:p>Machine-learning techniques have helped solve a broad range of prediction problems, yet are not widely used to build polygenic risk scores for the prediction of complex traits. We propose a novel heuristic based on machine-learning techniques (GraBLD) to boost the predictive performance of polygenic risk scores. Gradient boosted regression trees were first used to optimize the weights of SNPs included in the score, followed by a novel regional adjustment for linkage disequilibrium. A calibration set with sample size of ~200 individuals was sufficient for optimal performance. GraBLD yielded prediction <jats:italic>R</jats:italic><jats:sup>2</jats:sup> of 0.239 and 0.082 using GIANT summary association statistics for height and BMI in the UK Biobank study (<jats:italic>N</jats:italic> = 130 K; 1.98 M SNPs), explaining 46.9% and 32.7% of the overall polygenic variance, respectively. For diabetes status, the area under the receiver operating characteristic curve was 0.602 in the UK Biobank study using summary-level association statistics from the DIAGRAM consortium. GraBLD outperformed other polygenic score heuristics for the prediction of height (<jats:italic>p</jats:italic> < 2.2 × 10<jats:sup>−16</jats:sup>) and BMI (<jats:italic>p</jats:italic> < 1.57 × 10<jats:sup>−4</jats:sup>), and was equivalent to LDpred for diabetes. Results were independently validated in the Health and Retirement Study (<jats:italic>N</jats:italic> = 8,292; 688,398 SNPs). Our report demonstrates the use of machine-learning techniques, coupled with summary-level data from large genome-wide meta-analyses to improve the prediction of polygenic traits.</jats:p>
container_issue	1
container_start_page	0
container_title	Scientific Reports
container_volume	7
format_de105	Article, E-Article
format_de14	Article, E-Article
format_de15	Article, E-Article
format_de520	Article, E-Article
format_de540	Article, E-Article
format_dech1	Article, E-Article
format_ded117	Article, E-Article
format_degla1	E-Article
format_del152	Buch
format_del189	Article, E-Article
format_dezi4	Article
format_dezwi2	Article, E-Article
format_finc	Article, E-Article
format_nrw	Article, E-Article
_version_	1792347064217108486
geogr_code	not assigned
last_indexed	2024-03-01T17:49:21.171Z
geogr_code_person	not assigned
openURL	url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fvufind.svn.sourceforge.net%3Agenerator&rft.title=A+machine-learning+heuristic+to+improve+gene+score+prediction+of+polygenic+traits&rft.date=2017-10-04&genre=article&issn=2045-2322&volume=7&issue=1&jtitle=Scientific+Reports&atitle=A+machine-learning+heuristic+to+improve+gene+score+prediction+of+polygenic+traits&aulast=Deng&aufirst=Wei+Q.&rft_id=info%3Adoi%2F10.1038%2Fs41598-017-13056-1&rft.language%5B0%5D=eng
SOLR
_version_	1792347064217108486
author	Paré, Guillaume, Mao, Shihong, Deng, Wei Q.
author_facet	Paré, Guillaume, Mao, Shihong, Deng, Wei Q., Paré, Guillaume, Mao, Shihong, Deng, Wei Q.
author_sort	paré, guillaume
container_issue	1
container_start_page	0
container_title	Scientific Reports
container_volume	7
description	<jats:title>Abstract</jats:title><jats:p>Machine-learning techniques have helped solve a broad range of prediction problems, yet are not widely used to build polygenic risk scores for the prediction of complex traits. We propose a novel heuristic based on machine-learning techniques (GraBLD) to boost the predictive performance of polygenic risk scores. Gradient boosted regression trees were first used to optimize the weights of SNPs included in the score, followed by a novel regional adjustment for linkage disequilibrium. A calibration set with sample size of ~200 individuals was sufficient for optimal performance. GraBLD yielded prediction <jats:italic>R</jats:italic><jats:sup>2</jats:sup> of 0.239 and 0.082 using GIANT summary association statistics for height and BMI in the UK Biobank study (<jats:italic>N</jats:italic> = 130 K; 1.98 M SNPs), explaining 46.9% and 32.7% of the overall polygenic variance, respectively. For diabetes status, the area under the receiver operating characteristic curve was 0.602 in the UK Biobank study using summary-level association statistics from the DIAGRAM consortium. GraBLD outperformed other polygenic score heuristics for the prediction of height (<jats:italic>p</jats:italic> < 2.2 × 10<jats:sup>−16</jats:sup>) and BMI (<jats:italic>p</jats:italic> < 1.57 × 10<jats:sup>−4</jats:sup>), and was equivalent to LDpred for diabetes. Results were independently validated in the Health and Retirement Study (<jats:italic>N</jats:italic> = 8,292; 688,398 SNPs). Our report demonstrates the use of machine-learning techniques, coupled with summary-level data from large genome-wide meta-analyses to improve the prediction of polygenic traits.</jats:p>
doi_str_mv	10.1038/s41598-017-13056-1
facet_avail	Online, Free
format	ElectronicArticle
format_de105	Article, E-Article
format_de14	Article, E-Article
format_de15	Article, E-Article
format_de520	Article, E-Article
format_de540	Article, E-Article
format_dech1	Article, E-Article
format_ded117	Article, E-Article
format_degla1	E-Article
format_del152	Buch
format_del189	Article, E-Article
format_dezi4	Article
format_dezwi2	Article, E-Article
format_finc	Article, E-Article
format_nrw	Article, E-Article
geogr_code	not assigned
geogr_code_person	not assigned
id	ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTAzOC9zNDE1OTgtMDE3LTEzMDU2LTE
imprint	Springer Science and Business Media LLC, 2017
imprint_str_mv	Springer Science and Business Media LLC, 2017
institution	DE-Zi4, DE-Gla1, DE-15, DE-Pl11, DE-Rs1, DE-14, DE-105, DE-Ch1, DE-L229, DE-D275, DE-Bn3, DE-Brt1, DE-Zwi2, DE-D161
issn	2045-2322
issn_str_mv	2045-2322
language	English
last_indexed	2024-03-01T17:49:21.171Z
match_str	pare2017amachinelearningheuristictoimprovegenescorepredictionofpolygenictraits
mega_collection	Springer Science and Business Media LLC (CrossRef)
physical
publishDate	2017
publishDateSort	2017
publisher	Springer Science and Business Media LLC
record_format	ai
recordtype	ai
series	Scientific Reports
source_id	49
spelling	Paré, Guillaume Mao, Shihong Deng, Wei Q. 2045-2322 Springer Science and Business Media LLC Multidisciplinary http://dx.doi.org/10.1038/s41598-017-13056-1 <jats:title>Abstract</jats:title><jats:p>Machine-learning techniques have helped solve a broad range of prediction problems, yet are not widely used to build polygenic risk scores for the prediction of complex traits. We propose a novel heuristic based on machine-learning techniques (GraBLD) to boost the predictive performance of polygenic risk scores. Gradient boosted regression trees were first used to optimize the weights of SNPs included in the score, followed by a novel regional adjustment for linkage disequilibrium. A calibration set with sample size of ~200 individuals was sufficient for optimal performance. GraBLD yielded prediction <jats:italic>R</jats:italic><jats:sup>2</jats:sup> of 0.239 and 0.082 using GIANT summary association statistics for height and BMI in the UK Biobank study (<jats:italic>N</jats:italic> = 130 K; 1.98 M SNPs), explaining 46.9% and 32.7% of the overall polygenic variance, respectively. For diabetes status, the area under the receiver operating characteristic curve was 0.602 in the UK Biobank study using summary-level association statistics from the DIAGRAM consortium. GraBLD outperformed other polygenic score heuristics for the prediction of height (<jats:italic>p</jats:italic> < 2.2 × 10<jats:sup>−16</jats:sup>) and BMI (<jats:italic>p</jats:italic> < 1.57 × 10<jats:sup>−4</jats:sup>), and was equivalent to LDpred for diabetes. Results were independently validated in the Health and Retirement Study (<jats:italic>N</jats:italic> = 8,292; 688,398 SNPs). Our report demonstrates the use of machine-learning techniques, coupled with summary-level data from large genome-wide meta-analyses to improve the prediction of polygenic traits.</jats:p> A machine-learning heuristic to improve gene score prediction of polygenic traits Scientific Reports
spellingShingle	Paré, Guillaume, Mao, Shihong, Deng, Wei Q., Scientific Reports, A machine-learning heuristic to improve gene score prediction of polygenic traits, Multidisciplinary
title	A machine-learning heuristic to improve gene score prediction of polygenic traits
title_full	A machine-learning heuristic to improve gene score prediction of polygenic traits
title_fullStr	A machine-learning heuristic to improve gene score prediction of polygenic traits
title_full_unstemmed	A machine-learning heuristic to improve gene score prediction of polygenic traits
title_short	A machine-learning heuristic to improve gene score prediction of polygenic traits
title_sort	a machine-learning heuristic to improve gene score prediction of polygenic traits
title_unstemmed	A machine-learning heuristic to improve gene score prediction of polygenic traits
topic	Multidisciplinary
url	http://dx.doi.org/10.1038/s41598-017-13056-1