flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding

Gespeichert in:

Bibliographische Detailangaben
Zeitschriftentitel:	Bioinformatics
Personen und Körperschaften:	Ge, Yongchao, Sealfon, Stuart C.
In:	Bioinformatics, 28, 2012, 15, S. 2052-2058
Format:	E-Article
Sprache:	Englisch
veröffentlicht:	Oxford University Press (OUP)
Schlagwörter:	Computational Mathematics Computational Theory and Mathematics Computer Science Applications Molecular Biology Biochemistry Statistics and Probability

author_facet	Ge, Yongchao Sealfon, Stuart C. Ge, Yongchao Sealfon, Stuart C.
author	Ge, Yongchao Sealfon, Stuart C.
spellingShingle	Ge, Yongchao Sealfon, Stuart C. Bioinformatics flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding Computational Mathematics Computational Theory and Mathematics Computer Science Applications Molecular Biology Biochemistry Statistics and Probability
author_sort	ge, yongchao
spelling	Ge, Yongchao Sealfon, Stuart C. 1367-4811 1367-4803 Oxford University Press (OUP) Computational Mathematics Computational Theory and Mathematics Computer Science Applications Molecular Biology Biochemistry Statistics and Probability http://dx.doi.org/10.1093/bioinformatics/bts300 <jats:title>Abstract</jats:title> <jats:p>Motivation: For flow cytometry data, there are two common approaches to the unsupervised clustering problem: one is based on the finite mixture model and the other on spatial exploration of the histograms. The former is computationally slow and has difficulty to identify clusters of irregular shapes. The latter approach cannot be applied directly to high-dimensional data as the computational time and memory become unmanageable and the estimated histogram is unreliable. An algorithm without these two problems would be very useful.</jats:p> <jats:p>Results: In this article, we combine ideas from the finite mixture model and histogram spatial exploration. This new algorithm, which we call flowPeaks, can be applied directly to high-dimensional data and identify irregular shape clusters. The algorithm first uses K-means algorithm with a large K to partition the cell population into many small clusters. These partitioned data allow the generation of a smoothed density function using the finite mixture model. All local peaks are exhaustively searched by exploring the density function and the cells are clustered by the associated local peak. The algorithm flowPeaks is automatic, fast and reliable and robust to cluster shape and outliers. This algorithm has been applied to flow cytometry data and it has been compared with state of the art algorithms, including Misty Mountain, FLOCK, flowMeans, flowMerge and FLAME.</jats:p> <jats:p>Availability: The R package flowPeaks is available at https://github.com/yongchao/flowPeaks.</jats:p> <jats:p>Contact: yongchao.ge@mssm.edu</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online</jats:p> flowPeaks: a fast unsupervised clustering for flow cytometry data via <i>K</i>-means and density peak finding Bioinformatics
doi_str_mv	10.1093/bioinformatics/bts300
facet_avail	Online Free
finc_class_facet	Chemie und Pharmazie Mathematik Informatik Biologie
format	ElectronicArticle
fullrecord	blob:ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTA5My9iaW9pbmZvcm1hdGljcy9idHMzMDA
id	ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTA5My9iaW9pbmZvcm1hdGljcy9idHMzMDA
institution	DE-Gla1 DE-Zi4 DE-15 DE-Pl11 DE-Rs1 DE-105 DE-14 DE-Ch1 DE-L229 DE-D275 DE-Bn3 DE-Brt1 DE-Zwi2 DE-D161
imprint	Oxford University Press (OUP), 2012
imprint_str_mv	Oxford University Press (OUP), 2012
issn	1367-4811 1367-4803
issn_str_mv	1367-4811 1367-4803
language	English
mega_collection	Oxford University Press (OUP) (CrossRef)
match_str	ge2012flowpeaksafastunsupervisedclusteringforflowcytometrydataviakmeansanddensitypeakfinding
publishDateSort	2012
publisher	Oxford University Press (OUP)
recordtype	ai
record_format	ai
series	Bioinformatics
source_id	49
title	flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_unstemmed	flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_full	flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_fullStr	flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_full_unstemmed	flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_short	flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_sort	flowpeaks: a fast unsupervised clustering for flow cytometry data via <i>k</i>-means and density peak finding
topic	Computational Mathematics Computational Theory and Mathematics Computer Science Applications Molecular Biology Biochemistry Statistics and Probability
url	http://dx.doi.org/10.1093/bioinformatics/bts300
publishDate	2012
physical	2052-2058
description	<jats:title>Abstract</jats:title> <jats:p>Motivation: For flow cytometry data, there are two common approaches to the unsupervised clustering problem: one is based on the finite mixture model and the other on spatial exploration of the histograms. The former is computationally slow and has difficulty to identify clusters of irregular shapes. The latter approach cannot be applied directly to high-dimensional data as the computational time and memory become unmanageable and the estimated histogram is unreliable. An algorithm without these two problems would be very useful.</jats:p> <jats:p>Results: In this article, we combine ideas from the finite mixture model and histogram spatial exploration. This new algorithm, which we call flowPeaks, can be applied directly to high-dimensional data and identify irregular shape clusters. The algorithm first uses K-means algorithm with a large K to partition the cell population into many small clusters. These partitioned data allow the generation of a smoothed density function using the finite mixture model. All local peaks are exhaustively searched by exploring the density function and the cells are clustered by the associated local peak. The algorithm flowPeaks is automatic, fast and reliable and robust to cluster shape and outliers. This algorithm has been applied to flow cytometry data and it has been compared with state of the art algorithms, including Misty Mountain, FLOCK, flowMeans, flowMerge and FLAME.</jats:p> <jats:p>Availability: The R package flowPeaks is available at https://github.com/yongchao/flowPeaks.</jats:p> <jats:p>Contact: yongchao.ge@mssm.edu</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online</jats:p>
container_issue	15
container_start_page	2052
container_title	Bioinformatics
container_volume	28
format_de105	Article, E-Article
format_de14	Article, E-Article
format_de15	Article, E-Article
format_de520	Article, E-Article
format_de540	Article, E-Article
format_dech1	Article, E-Article
format_ded117	Article, E-Article
format_degla1	E-Article
format_del152	Buch
format_del189	Article, E-Article
format_dezi4	Article
format_dezwi2	Article, E-Article
format_finc	Article, E-Article
format_nrw	Article, E-Article
_version_	1792346157724205057
geogr_code	not assigned
last_indexed	2024-03-01T17:34:54.083Z
geogr_code_person	not assigned
openURL	url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fvufind.svn.sourceforge.net%3Agenerator&rft.title=flowPeaks%3A+a+fast+unsupervised+clustering+for+flow+cytometry+data+via+K-means+and+density+peak+finding&rft.date=2012-08-01&genre=article&issn=1367-4803&volume=28&issue=15&spage=2052&epage=2058&pages=2052-2058&jtitle=Bioinformatics&atitle=flowPeaks%3A+a+fast+unsupervised+clustering+for+flow+cytometry+data+via+%3Ci%3EK%3C%2Fi%3E-means+and+density+peak+finding&aulast=Sealfon&aufirst=Stuart+C.&rft_id=info%3Adoi%2F10.1093%2Fbioinformatics%2Fbts300&rft.language%5B0%5D=eng
SOLR
_version_	1792346157724205057
author	Ge, Yongchao, Sealfon, Stuart C.
author_facet	Ge, Yongchao, Sealfon, Stuart C., Ge, Yongchao, Sealfon, Stuart C.
author_sort	ge, yongchao
container_issue	15
container_start_page	2052
container_title	Bioinformatics
container_volume	28
description	<jats:title>Abstract</jats:title> <jats:p>Motivation: For flow cytometry data, there are two common approaches to the unsupervised clustering problem: one is based on the finite mixture model and the other on spatial exploration of the histograms. The former is computationally slow and has difficulty to identify clusters of irregular shapes. The latter approach cannot be applied directly to high-dimensional data as the computational time and memory become unmanageable and the estimated histogram is unreliable. An algorithm without these two problems would be very useful.</jats:p> <jats:p>Results: In this article, we combine ideas from the finite mixture model and histogram spatial exploration. This new algorithm, which we call flowPeaks, can be applied directly to high-dimensional data and identify irregular shape clusters. The algorithm first uses K-means algorithm with a large K to partition the cell population into many small clusters. These partitioned data allow the generation of a smoothed density function using the finite mixture model. All local peaks are exhaustively searched by exploring the density function and the cells are clustered by the associated local peak. The algorithm flowPeaks is automatic, fast and reliable and robust to cluster shape and outliers. This algorithm has been applied to flow cytometry data and it has been compared with state of the art algorithms, including Misty Mountain, FLOCK, flowMeans, flowMerge and FLAME.</jats:p> <jats:p>Availability: The R package flowPeaks is available at https://github.com/yongchao/flowPeaks.</jats:p> <jats:p>Contact: yongchao.ge@mssm.edu</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online</jats:p>
doi_str_mv	10.1093/bioinformatics/bts300
facet_avail	Online, Free
finc_class_facet	Chemie und Pharmazie, Mathematik, Informatik, Biologie
format	ElectronicArticle
format_de105	Article, E-Article
format_de14	Article, E-Article
format_de15	Article, E-Article
format_de520	Article, E-Article
format_de540	Article, E-Article
format_dech1	Article, E-Article
format_ded117	Article, E-Article
format_degla1	E-Article
format_del152	Buch
format_del189	Article, E-Article
format_dezi4	Article
format_dezwi2	Article, E-Article
format_finc	Article, E-Article
format_nrw	Article, E-Article
geogr_code	not assigned
geogr_code_person	not assigned
id	ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTA5My9iaW9pbmZvcm1hdGljcy9idHMzMDA
imprint	Oxford University Press (OUP), 2012
imprint_str_mv	Oxford University Press (OUP), 2012
institution	DE-Gla1, DE-Zi4, DE-15, DE-Pl11, DE-Rs1, DE-105, DE-14, DE-Ch1, DE-L229, DE-D275, DE-Bn3, DE-Brt1, DE-Zwi2, DE-D161
issn	1367-4811, 1367-4803
issn_str_mv	1367-4811, 1367-4803
language	English
last_indexed	2024-03-01T17:34:54.083Z
match_str	ge2012flowpeaksafastunsupervisedclusteringforflowcytometrydataviakmeansanddensitypeakfinding
mega_collection	Oxford University Press (OUP) (CrossRef)
physical	2052-2058
publishDate	2012
publishDateSort	2012
publisher	Oxford University Press (OUP)
record_format	ai
recordtype	ai
series	Bioinformatics
source_id	49
spelling	Ge, Yongchao Sealfon, Stuart C. 1367-4811 1367-4803 Oxford University Press (OUP) Computational Mathematics Computational Theory and Mathematics Computer Science Applications Molecular Biology Biochemistry Statistics and Probability http://dx.doi.org/10.1093/bioinformatics/bts300 <jats:title>Abstract</jats:title> <jats:p>Motivation: For flow cytometry data, there are two common approaches to the unsupervised clustering problem: one is based on the finite mixture model and the other on spatial exploration of the histograms. The former is computationally slow and has difficulty to identify clusters of irregular shapes. The latter approach cannot be applied directly to high-dimensional data as the computational time and memory become unmanageable and the estimated histogram is unreliable. An algorithm without these two problems would be very useful.</jats:p> <jats:p>Results: In this article, we combine ideas from the finite mixture model and histogram spatial exploration. This new algorithm, which we call flowPeaks, can be applied directly to high-dimensional data and identify irregular shape clusters. The algorithm first uses K-means algorithm with a large K to partition the cell population into many small clusters. These partitioned data allow the generation of a smoothed density function using the finite mixture model. All local peaks are exhaustively searched by exploring the density function and the cells are clustered by the associated local peak. The algorithm flowPeaks is automatic, fast and reliable and robust to cluster shape and outliers. This algorithm has been applied to flow cytometry data and it has been compared with state of the art algorithms, including Misty Mountain, FLOCK, flowMeans, flowMerge and FLAME.</jats:p> <jats:p>Availability: The R package flowPeaks is available at https://github.com/yongchao/flowPeaks.</jats:p> <jats:p>Contact: yongchao.ge@mssm.edu</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online</jats:p> flowPeaks: a fast unsupervised clustering for flow cytometry data via <i>K</i>-means and density peak finding Bioinformatics
spellingShingle	Ge, Yongchao, Sealfon, Stuart C., Bioinformatics, flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding, Computational Mathematics, Computational Theory and Mathematics, Computer Science Applications, Molecular Biology, Biochemistry, Statistics and Probability
title	flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_full	flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_fullStr	flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_full_unstemmed	flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_short	flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_sort	flowpeaks: a fast unsupervised clustering for flow cytometry data via <i>k</i>-means and density peak finding
title_unstemmed	flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
topic	Computational Mathematics, Computational Theory and Mathematics, Computer Science Applications, Molecular Biology, Biochemistry, Statistics and Probability
url	http://dx.doi.org/10.1093/bioinformatics/bts300