Eintrag weiter verarbeiten
A machine-learning heuristic to improve gene score prediction of polygenic traits
Gespeichert in:
Zeitschriftentitel: | Scientific Reports |
---|---|
Personen und Körperschaften: | , , |
In: | Scientific Reports, 7, 2017, 1 |
Format: | E-Article |
Sprache: | Englisch |
veröffentlicht: |
Springer Science and Business Media LLC
|
Schlagwörter: |
author_facet |
Paré, Guillaume Mao, Shihong Deng, Wei Q. Paré, Guillaume Mao, Shihong Deng, Wei Q. |
---|---|
author |
Paré, Guillaume Mao, Shihong Deng, Wei Q. |
spellingShingle |
Paré, Guillaume Mao, Shihong Deng, Wei Q. Scientific Reports A machine-learning heuristic to improve gene score prediction of polygenic traits Multidisciplinary |
author_sort |
paré, guillaume |
spelling |
Paré, Guillaume Mao, Shihong Deng, Wei Q. 2045-2322 Springer Science and Business Media LLC Multidisciplinary http://dx.doi.org/10.1038/s41598-017-13056-1 <jats:title>Abstract</jats:title><jats:p>Machine-learning techniques have helped solve a broad range of prediction problems, yet are not widely used to build polygenic risk scores for the prediction of complex traits. We propose a novel heuristic based on machine-learning techniques (GraBLD) to boost the predictive performance of polygenic risk scores. Gradient boosted regression trees were first used to optimize the weights of SNPs included in the score, followed by a novel regional adjustment for linkage disequilibrium. A calibration set with sample size of ~200 individuals was sufficient for optimal performance. GraBLD yielded prediction <jats:italic>R</jats:italic><jats:sup>2</jats:sup> of 0.239 and 0.082 using GIANT summary association statistics for height and BMI in the UK Biobank study (<jats:italic>N</jats:italic> = 130 K; 1.98 M SNPs), explaining 46.9% and 32.7% of the overall polygenic variance, respectively. For diabetes status, the area under the receiver operating characteristic curve was 0.602 in the UK Biobank study using summary-level association statistics from the DIAGRAM consortium. GraBLD outperformed other polygenic score heuristics for the prediction of height (<jats:italic>p</jats:italic> < 2.2 × 10<jats:sup>−16</jats:sup>) and BMI (<jats:italic>p</jats:italic> < 1.57 × 10<jats:sup>−4</jats:sup>), and was equivalent to LDpred for diabetes. Results were independently validated in the Health and Retirement Study (<jats:italic>N</jats:italic> = 8,292; 688,398 SNPs). Our report demonstrates the use of machine-learning techniques, coupled with summary-level data from large genome-wide meta-analyses to improve the prediction of polygenic traits.</jats:p> A machine-learning heuristic to improve gene score prediction of polygenic traits Scientific Reports |
doi_str_mv |
10.1038/s41598-017-13056-1 |
facet_avail |
Online Free |
format |
ElectronicArticle |
fullrecord |
blob:ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTAzOC9zNDE1OTgtMDE3LTEzMDU2LTE |
id |
ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTAzOC9zNDE1OTgtMDE3LTEzMDU2LTE |
institution |
DE-Zi4 DE-Gla1 DE-15 DE-Pl11 DE-Rs1 DE-14 DE-105 DE-Ch1 DE-L229 DE-D275 DE-Bn3 DE-Brt1 DE-Zwi2 DE-D161 |
imprint |
Springer Science and Business Media LLC, 2017 |
imprint_str_mv |
Springer Science and Business Media LLC, 2017 |
issn |
2045-2322 |
issn_str_mv |
2045-2322 |
language |
English |
mega_collection |
Springer Science and Business Media LLC (CrossRef) |
match_str |
pare2017amachinelearningheuristictoimprovegenescorepredictionofpolygenictraits |
publishDateSort |
2017 |
publisher |
Springer Science and Business Media LLC |
recordtype |
ai |
record_format |
ai |
series |
Scientific Reports |
source_id |
49 |
title |
A machine-learning heuristic to improve gene score prediction of polygenic traits |
title_unstemmed |
A machine-learning heuristic to improve gene score prediction of polygenic traits |
title_full |
A machine-learning heuristic to improve gene score prediction of polygenic traits |
title_fullStr |
A machine-learning heuristic to improve gene score prediction of polygenic traits |
title_full_unstemmed |
A machine-learning heuristic to improve gene score prediction of polygenic traits |
title_short |
A machine-learning heuristic to improve gene score prediction of polygenic traits |
title_sort |
a machine-learning heuristic to improve gene score prediction of polygenic traits |
topic |
Multidisciplinary |
url |
http://dx.doi.org/10.1038/s41598-017-13056-1 |
publishDate |
2017 |
physical |
|
description |
<jats:title>Abstract</jats:title><jats:p>Machine-learning techniques have helped solve a broad range of prediction problems, yet are not widely used to build polygenic risk scores for the prediction of complex traits. We propose a novel heuristic based on machine-learning techniques (GraBLD) to boost the predictive performance of polygenic risk scores. Gradient boosted regression trees were first used to optimize the weights of SNPs included in the score, followed by a novel regional adjustment for linkage disequilibrium. A calibration set with sample size of ~200 individuals was sufficient for optimal performance. GraBLD yielded prediction <jats:italic>R</jats:italic><jats:sup>2</jats:sup> of 0.239 and 0.082 using GIANT summary association statistics for height and BMI in the UK Biobank study (<jats:italic>N</jats:italic> = 130 K; 1.98 M SNPs), explaining 46.9% and 32.7% of the overall polygenic variance, respectively. For diabetes status, the area under the receiver operating characteristic curve was 0.602 in the UK Biobank study using summary-level association statistics from the DIAGRAM consortium. GraBLD outperformed other polygenic score heuristics for the prediction of height (<jats:italic>p</jats:italic> < 2.2 × 10<jats:sup>−16</jats:sup>) and BMI (<jats:italic>p</jats:italic> < 1.57 × 10<jats:sup>−4</jats:sup>), and was equivalent to LDpred for diabetes. Results were independently validated in the Health and Retirement Study (<jats:italic>N</jats:italic> = 8,292; 688,398 SNPs). Our report demonstrates the use of machine-learning techniques, coupled with summary-level data from large genome-wide meta-analyses to improve the prediction of polygenic traits.</jats:p> |
container_issue |
1 |
container_start_page |
0 |
container_title |
Scientific Reports |
container_volume |
7 |
format_de105 |
Article, E-Article |
format_de14 |
Article, E-Article |
format_de15 |
Article, E-Article |
format_de520 |
Article, E-Article |
format_de540 |
Article, E-Article |
format_dech1 |
Article, E-Article |
format_ded117 |
Article, E-Article |
format_degla1 |
E-Article |
format_del152 |
Buch |
format_del189 |
Article, E-Article |
format_dezi4 |
Article |
format_dezwi2 |
Article, E-Article |
format_finc |
Article, E-Article |
format_nrw |
Article, E-Article |
_version_ |
1792347064217108486 |
geogr_code |
not assigned |
last_indexed |
2024-03-01T17:49:21.171Z |
geogr_code_person |
not assigned |
openURL |
url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fvufind.svn.sourceforge.net%3Agenerator&rft.title=A+machine-learning+heuristic+to+improve+gene+score+prediction+of+polygenic+traits&rft.date=2017-10-04&genre=article&issn=2045-2322&volume=7&issue=1&jtitle=Scientific+Reports&atitle=A+machine-learning+heuristic+to+improve+gene+score+prediction+of+polygenic+traits&aulast=Deng&aufirst=Wei+Q.&rft_id=info%3Adoi%2F10.1038%2Fs41598-017-13056-1&rft.language%5B0%5D=eng |
SOLR | |
_version_ | 1792347064217108486 |
author | Paré, Guillaume, Mao, Shihong, Deng, Wei Q. |
author_facet | Paré, Guillaume, Mao, Shihong, Deng, Wei Q., Paré, Guillaume, Mao, Shihong, Deng, Wei Q. |
author_sort | paré, guillaume |
container_issue | 1 |
container_start_page | 0 |
container_title | Scientific Reports |
container_volume | 7 |
description | <jats:title>Abstract</jats:title><jats:p>Machine-learning techniques have helped solve a broad range of prediction problems, yet are not widely used to build polygenic risk scores for the prediction of complex traits. We propose a novel heuristic based on machine-learning techniques (GraBLD) to boost the predictive performance of polygenic risk scores. Gradient boosted regression trees were first used to optimize the weights of SNPs included in the score, followed by a novel regional adjustment for linkage disequilibrium. A calibration set with sample size of ~200 individuals was sufficient for optimal performance. GraBLD yielded prediction <jats:italic>R</jats:italic><jats:sup>2</jats:sup> of 0.239 and 0.082 using GIANT summary association statistics for height and BMI in the UK Biobank study (<jats:italic>N</jats:italic> = 130 K; 1.98 M SNPs), explaining 46.9% and 32.7% of the overall polygenic variance, respectively. For diabetes status, the area under the receiver operating characteristic curve was 0.602 in the UK Biobank study using summary-level association statistics from the DIAGRAM consortium. GraBLD outperformed other polygenic score heuristics for the prediction of height (<jats:italic>p</jats:italic> < 2.2 × 10<jats:sup>−16</jats:sup>) and BMI (<jats:italic>p</jats:italic> < 1.57 × 10<jats:sup>−4</jats:sup>), and was equivalent to LDpred for diabetes. Results were independently validated in the Health and Retirement Study (<jats:italic>N</jats:italic> = 8,292; 688,398 SNPs). Our report demonstrates the use of machine-learning techniques, coupled with summary-level data from large genome-wide meta-analyses to improve the prediction of polygenic traits.</jats:p> |
doi_str_mv | 10.1038/s41598-017-13056-1 |
facet_avail | Online, Free |
format | ElectronicArticle |
format_de105 | Article, E-Article |
format_de14 | Article, E-Article |
format_de15 | Article, E-Article |
format_de520 | Article, E-Article |
format_de540 | Article, E-Article |
format_dech1 | Article, E-Article |
format_ded117 | Article, E-Article |
format_degla1 | E-Article |
format_del152 | Buch |
format_del189 | Article, E-Article |
format_dezi4 | Article |
format_dezwi2 | Article, E-Article |
format_finc | Article, E-Article |
format_nrw | Article, E-Article |
geogr_code | not assigned |
geogr_code_person | not assigned |
id | ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTAzOC9zNDE1OTgtMDE3LTEzMDU2LTE |
imprint | Springer Science and Business Media LLC, 2017 |
imprint_str_mv | Springer Science and Business Media LLC, 2017 |
institution | DE-Zi4, DE-Gla1, DE-15, DE-Pl11, DE-Rs1, DE-14, DE-105, DE-Ch1, DE-L229, DE-D275, DE-Bn3, DE-Brt1, DE-Zwi2, DE-D161 |
issn | 2045-2322 |
issn_str_mv | 2045-2322 |
language | English |
last_indexed | 2024-03-01T17:49:21.171Z |
match_str | pare2017amachinelearningheuristictoimprovegenescorepredictionofpolygenictraits |
mega_collection | Springer Science and Business Media LLC (CrossRef) |
physical | |
publishDate | 2017 |
publishDateSort | 2017 |
publisher | Springer Science and Business Media LLC |
record_format | ai |
recordtype | ai |
series | Scientific Reports |
source_id | 49 |
spelling | Paré, Guillaume Mao, Shihong Deng, Wei Q. 2045-2322 Springer Science and Business Media LLC Multidisciplinary http://dx.doi.org/10.1038/s41598-017-13056-1 <jats:title>Abstract</jats:title><jats:p>Machine-learning techniques have helped solve a broad range of prediction problems, yet are not widely used to build polygenic risk scores for the prediction of complex traits. We propose a novel heuristic based on machine-learning techniques (GraBLD) to boost the predictive performance of polygenic risk scores. Gradient boosted regression trees were first used to optimize the weights of SNPs included in the score, followed by a novel regional adjustment for linkage disequilibrium. A calibration set with sample size of ~200 individuals was sufficient for optimal performance. GraBLD yielded prediction <jats:italic>R</jats:italic><jats:sup>2</jats:sup> of 0.239 and 0.082 using GIANT summary association statistics for height and BMI in the UK Biobank study (<jats:italic>N</jats:italic> = 130 K; 1.98 M SNPs), explaining 46.9% and 32.7% of the overall polygenic variance, respectively. For diabetes status, the area under the receiver operating characteristic curve was 0.602 in the UK Biobank study using summary-level association statistics from the DIAGRAM consortium. GraBLD outperformed other polygenic score heuristics for the prediction of height (<jats:italic>p</jats:italic> < 2.2 × 10<jats:sup>−16</jats:sup>) and BMI (<jats:italic>p</jats:italic> < 1.57 × 10<jats:sup>−4</jats:sup>), and was equivalent to LDpred for diabetes. Results were independently validated in the Health and Retirement Study (<jats:italic>N</jats:italic> = 8,292; 688,398 SNPs). Our report demonstrates the use of machine-learning techniques, coupled with summary-level data from large genome-wide meta-analyses to improve the prediction of polygenic traits.</jats:p> A machine-learning heuristic to improve gene score prediction of polygenic traits Scientific Reports |
spellingShingle | Paré, Guillaume, Mao, Shihong, Deng, Wei Q., Scientific Reports, A machine-learning heuristic to improve gene score prediction of polygenic traits, Multidisciplinary |
title | A machine-learning heuristic to improve gene score prediction of polygenic traits |
title_full | A machine-learning heuristic to improve gene score prediction of polygenic traits |
title_fullStr | A machine-learning heuristic to improve gene score prediction of polygenic traits |
title_full_unstemmed | A machine-learning heuristic to improve gene score prediction of polygenic traits |
title_short | A machine-learning heuristic to improve gene score prediction of polygenic traits |
title_sort | a machine-learning heuristic to improve gene score prediction of polygenic traits |
title_unstemmed | A machine-learning heuristic to improve gene score prediction of polygenic traits |
topic | Multidisciplinary |
url | http://dx.doi.org/10.1038/s41598-017-13056-1 |