author_facet Paré, Guillaume
Mao, Shihong
Deng, Wei Q.
Paré, Guillaume
Mao, Shihong
Deng, Wei Q.
author Paré, Guillaume
Mao, Shihong
Deng, Wei Q.
spellingShingle Paré, Guillaume
Mao, Shihong
Deng, Wei Q.
Scientific Reports
A machine-learning heuristic to improve gene score prediction of polygenic traits
Multidisciplinary
author_sort paré, guillaume
spelling Paré, Guillaume Mao, Shihong Deng, Wei Q. 2045-2322 Springer Science and Business Media LLC Multidisciplinary http://dx.doi.org/10.1038/s41598-017-13056-1 <jats:title>Abstract</jats:title><jats:p>Machine-learning techniques have helped solve a broad range of prediction problems, yet are not widely used to build polygenic risk scores for the prediction of complex traits. We propose a novel heuristic based on machine-learning techniques (GraBLD) to boost the predictive performance of polygenic risk scores. Gradient boosted regression trees were first used to optimize the weights of SNPs included in the score, followed by a novel regional adjustment for linkage disequilibrium. A calibration set with sample size of ~200 individuals was sufficient for optimal performance. GraBLD yielded prediction <jats:italic>R</jats:italic><jats:sup>2</jats:sup> of 0.239 and 0.082 using GIANT summary association statistics for height and BMI in the UK Biobank study (<jats:italic>N</jats:italic> = 130 K; 1.98 M SNPs), explaining 46.9% and 32.7% of the overall polygenic variance, respectively. For diabetes status, the area under the receiver operating characteristic curve was 0.602 in the UK Biobank study using summary-level association statistics from the DIAGRAM consortium. GraBLD outperformed other polygenic score heuristics for the prediction of height (<jats:italic>p</jats:italic> &lt; 2.2 × 10<jats:sup>−16</jats:sup>) and BMI (<jats:italic>p</jats:italic> &lt; 1.57 × 10<jats:sup>−4</jats:sup>), and was equivalent to LDpred for diabetes. Results were independently validated in the Health and Retirement Study (<jats:italic>N</jats:italic> = 8,292; 688,398 SNPs). Our report demonstrates the use of machine-learning techniques, coupled with summary-level data from large genome-wide meta-analyses to improve the prediction of polygenic traits.</jats:p> A machine-learning heuristic to improve gene score prediction of polygenic traits Scientific Reports
doi_str_mv 10.1038/s41598-017-13056-1
facet_avail Online
Free
format ElectronicArticle
fullrecord blob:ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTAzOC9zNDE1OTgtMDE3LTEzMDU2LTE
id ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTAzOC9zNDE1OTgtMDE3LTEzMDU2LTE
institution DE-Zi4
DE-Gla1
DE-15
DE-Pl11
DE-Rs1
DE-14
DE-105
DE-Ch1
DE-L229
DE-D275
DE-Bn3
DE-Brt1
DE-Zwi2
DE-D161
imprint Springer Science and Business Media LLC, 2017
imprint_str_mv Springer Science and Business Media LLC, 2017
issn 2045-2322
issn_str_mv 2045-2322
language English
mega_collection Springer Science and Business Media LLC (CrossRef)
match_str pare2017amachinelearningheuristictoimprovegenescorepredictionofpolygenictraits
publishDateSort 2017
publisher Springer Science and Business Media LLC
recordtype ai
record_format ai
series Scientific Reports
source_id 49
title A machine-learning heuristic to improve gene score prediction of polygenic traits
title_unstemmed A machine-learning heuristic to improve gene score prediction of polygenic traits
title_full A machine-learning heuristic to improve gene score prediction of polygenic traits
title_fullStr A machine-learning heuristic to improve gene score prediction of polygenic traits
title_full_unstemmed A machine-learning heuristic to improve gene score prediction of polygenic traits
title_short A machine-learning heuristic to improve gene score prediction of polygenic traits
title_sort a machine-learning heuristic to improve gene score prediction of polygenic traits
topic Multidisciplinary
url http://dx.doi.org/10.1038/s41598-017-13056-1
publishDate 2017
physical
description <jats:title>Abstract</jats:title><jats:p>Machine-learning techniques have helped solve a broad range of prediction problems, yet are not widely used to build polygenic risk scores for the prediction of complex traits. We propose a novel heuristic based on machine-learning techniques (GraBLD) to boost the predictive performance of polygenic risk scores. Gradient boosted regression trees were first used to optimize the weights of SNPs included in the score, followed by a novel regional adjustment for linkage disequilibrium. A calibration set with sample size of ~200 individuals was sufficient for optimal performance. GraBLD yielded prediction <jats:italic>R</jats:italic><jats:sup>2</jats:sup> of 0.239 and 0.082 using GIANT summary association statistics for height and BMI in the UK Biobank study (<jats:italic>N</jats:italic> = 130 K; 1.98 M SNPs), explaining 46.9% and 32.7% of the overall polygenic variance, respectively. For diabetes status, the area under the receiver operating characteristic curve was 0.602 in the UK Biobank study using summary-level association statistics from the DIAGRAM consortium. GraBLD outperformed other polygenic score heuristics for the prediction of height (<jats:italic>p</jats:italic> &lt; 2.2 × 10<jats:sup>−16</jats:sup>) and BMI (<jats:italic>p</jats:italic> &lt; 1.57 × 10<jats:sup>−4</jats:sup>), and was equivalent to LDpred for diabetes. Results were independently validated in the Health and Retirement Study (<jats:italic>N</jats:italic> = 8,292; 688,398 SNPs). Our report demonstrates the use of machine-learning techniques, coupled with summary-level data from large genome-wide meta-analyses to improve the prediction of polygenic traits.</jats:p>
container_issue 1
container_start_page 0
container_title Scientific Reports
container_volume 7
format_de105 Article, E-Article
format_de14 Article, E-Article
format_de15 Article, E-Article
format_de520 Article, E-Article
format_de540 Article, E-Article
format_dech1 Article, E-Article
format_ded117 Article, E-Article
format_degla1 E-Article
format_del152 Buch
format_del189 Article, E-Article
format_dezi4 Article
format_dezwi2 Article, E-Article
format_finc Article, E-Article
format_nrw Article, E-Article
_version_ 1792347064217108486
geogr_code not assigned
last_indexed 2024-03-01T17:49:21.171Z
geogr_code_person not assigned
openURL url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fvufind.svn.sourceforge.net%3Agenerator&rft.title=A+machine-learning+heuristic+to+improve+gene+score+prediction+of+polygenic+traits&rft.date=2017-10-04&genre=article&issn=2045-2322&volume=7&issue=1&jtitle=Scientific+Reports&atitle=A+machine-learning+heuristic+to+improve+gene+score+prediction+of+polygenic+traits&aulast=Deng&aufirst=Wei+Q.&rft_id=info%3Adoi%2F10.1038%2Fs41598-017-13056-1&rft.language%5B0%5D=eng
SOLR
_version_ 1792347064217108486
author Paré, Guillaume, Mao, Shihong, Deng, Wei Q.
author_facet Paré, Guillaume, Mao, Shihong, Deng, Wei Q., Paré, Guillaume, Mao, Shihong, Deng, Wei Q.
author_sort paré, guillaume
container_issue 1
container_start_page 0
container_title Scientific Reports
container_volume 7
description <jats:title>Abstract</jats:title><jats:p>Machine-learning techniques have helped solve a broad range of prediction problems, yet are not widely used to build polygenic risk scores for the prediction of complex traits. We propose a novel heuristic based on machine-learning techniques (GraBLD) to boost the predictive performance of polygenic risk scores. Gradient boosted regression trees were first used to optimize the weights of SNPs included in the score, followed by a novel regional adjustment for linkage disequilibrium. A calibration set with sample size of ~200 individuals was sufficient for optimal performance. GraBLD yielded prediction <jats:italic>R</jats:italic><jats:sup>2</jats:sup> of 0.239 and 0.082 using GIANT summary association statistics for height and BMI in the UK Biobank study (<jats:italic>N</jats:italic> = 130 K; 1.98 M SNPs), explaining 46.9% and 32.7% of the overall polygenic variance, respectively. For diabetes status, the area under the receiver operating characteristic curve was 0.602 in the UK Biobank study using summary-level association statistics from the DIAGRAM consortium. GraBLD outperformed other polygenic score heuristics for the prediction of height (<jats:italic>p</jats:italic> &lt; 2.2 × 10<jats:sup>−16</jats:sup>) and BMI (<jats:italic>p</jats:italic> &lt; 1.57 × 10<jats:sup>−4</jats:sup>), and was equivalent to LDpred for diabetes. Results were independently validated in the Health and Retirement Study (<jats:italic>N</jats:italic> = 8,292; 688,398 SNPs). Our report demonstrates the use of machine-learning techniques, coupled with summary-level data from large genome-wide meta-analyses to improve the prediction of polygenic traits.</jats:p>
doi_str_mv 10.1038/s41598-017-13056-1
facet_avail Online, Free
format ElectronicArticle
format_de105 Article, E-Article
format_de14 Article, E-Article
format_de15 Article, E-Article
format_de520 Article, E-Article
format_de540 Article, E-Article
format_dech1 Article, E-Article
format_ded117 Article, E-Article
format_degla1 E-Article
format_del152 Buch
format_del189 Article, E-Article
format_dezi4 Article
format_dezwi2 Article, E-Article
format_finc Article, E-Article
format_nrw Article, E-Article
geogr_code not assigned
geogr_code_person not assigned
id ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTAzOC9zNDE1OTgtMDE3LTEzMDU2LTE
imprint Springer Science and Business Media LLC, 2017
imprint_str_mv Springer Science and Business Media LLC, 2017
institution DE-Zi4, DE-Gla1, DE-15, DE-Pl11, DE-Rs1, DE-14, DE-105, DE-Ch1, DE-L229, DE-D275, DE-Bn3, DE-Brt1, DE-Zwi2, DE-D161
issn 2045-2322
issn_str_mv 2045-2322
language English
last_indexed 2024-03-01T17:49:21.171Z
match_str pare2017amachinelearningheuristictoimprovegenescorepredictionofpolygenictraits
mega_collection Springer Science and Business Media LLC (CrossRef)
physical
publishDate 2017
publishDateSort 2017
publisher Springer Science and Business Media LLC
record_format ai
recordtype ai
series Scientific Reports
source_id 49
spelling Paré, Guillaume Mao, Shihong Deng, Wei Q. 2045-2322 Springer Science and Business Media LLC Multidisciplinary http://dx.doi.org/10.1038/s41598-017-13056-1 <jats:title>Abstract</jats:title><jats:p>Machine-learning techniques have helped solve a broad range of prediction problems, yet are not widely used to build polygenic risk scores for the prediction of complex traits. We propose a novel heuristic based on machine-learning techniques (GraBLD) to boost the predictive performance of polygenic risk scores. Gradient boosted regression trees were first used to optimize the weights of SNPs included in the score, followed by a novel regional adjustment for linkage disequilibrium. A calibration set with sample size of ~200 individuals was sufficient for optimal performance. GraBLD yielded prediction <jats:italic>R</jats:italic><jats:sup>2</jats:sup> of 0.239 and 0.082 using GIANT summary association statistics for height and BMI in the UK Biobank study (<jats:italic>N</jats:italic> = 130 K; 1.98 M SNPs), explaining 46.9% and 32.7% of the overall polygenic variance, respectively. For diabetes status, the area under the receiver operating characteristic curve was 0.602 in the UK Biobank study using summary-level association statistics from the DIAGRAM consortium. GraBLD outperformed other polygenic score heuristics for the prediction of height (<jats:italic>p</jats:italic> &lt; 2.2 × 10<jats:sup>−16</jats:sup>) and BMI (<jats:italic>p</jats:italic> &lt; 1.57 × 10<jats:sup>−4</jats:sup>), and was equivalent to LDpred for diabetes. Results were independently validated in the Health and Retirement Study (<jats:italic>N</jats:italic> = 8,292; 688,398 SNPs). Our report demonstrates the use of machine-learning techniques, coupled with summary-level data from large genome-wide meta-analyses to improve the prediction of polygenic traits.</jats:p> A machine-learning heuristic to improve gene score prediction of polygenic traits Scientific Reports
spellingShingle Paré, Guillaume, Mao, Shihong, Deng, Wei Q., Scientific Reports, A machine-learning heuristic to improve gene score prediction of polygenic traits, Multidisciplinary
title A machine-learning heuristic to improve gene score prediction of polygenic traits
title_full A machine-learning heuristic to improve gene score prediction of polygenic traits
title_fullStr A machine-learning heuristic to improve gene score prediction of polygenic traits
title_full_unstemmed A machine-learning heuristic to improve gene score prediction of polygenic traits
title_short A machine-learning heuristic to improve gene score prediction of polygenic traits
title_sort a machine-learning heuristic to improve gene score prediction of polygenic traits
title_unstemmed A machine-learning heuristic to improve gene score prediction of polygenic traits
topic Multidisciplinary
url http://dx.doi.org/10.1038/s41598-017-13056-1