author_facet Ge, Yongchao
Sealfon, Stuart C.
Ge, Yongchao
Sealfon, Stuart C.
author Ge, Yongchao
Sealfon, Stuart C.
spellingShingle Ge, Yongchao
Sealfon, Stuart C.
Bioinformatics
flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
Computational Mathematics
Computational Theory and Mathematics
Computer Science Applications
Molecular Biology
Biochemistry
Statistics and Probability
author_sort ge, yongchao
spelling Ge, Yongchao Sealfon, Stuart C. 1367-4811 1367-4803 Oxford University Press (OUP) Computational Mathematics Computational Theory and Mathematics Computer Science Applications Molecular Biology Biochemistry Statistics and Probability http://dx.doi.org/10.1093/bioinformatics/bts300 <jats:title>Abstract</jats:title> <jats:p>Motivation: For flow cytometry data, there are two common approaches to the unsupervised clustering problem: one is based on the finite mixture model and the other on spatial exploration of the histograms. The former is computationally slow and has difficulty to identify clusters of irregular shapes. The latter approach cannot be applied directly to high-dimensional data as the computational time and memory become unmanageable and the estimated histogram is unreliable. An algorithm without these two problems would be very useful.</jats:p> <jats:p>Results: In this article, we combine ideas from the finite mixture model and histogram spatial exploration. This new algorithm, which we call flowPeaks, can be applied directly to high-dimensional data and identify irregular shape clusters. The algorithm first uses K-means algorithm with a large K to partition the cell population into many small clusters. These partitioned data allow the generation of a smoothed density function using the finite mixture model. All local peaks are exhaustively searched by exploring the density function and the cells are clustered by the associated local peak. The algorithm flowPeaks is automatic, fast and reliable and robust to cluster shape and outliers. This algorithm has been applied to flow cytometry data and it has been compared with state of the art algorithms, including Misty Mountain, FLOCK, flowMeans, flowMerge and FLAME.</jats:p> <jats:p>Availability: The R package flowPeaks is available at https://github.com/yongchao/flowPeaks.</jats:p> <jats:p>Contact: yongchao.ge@mssm.edu</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online</jats:p> flowPeaks: a fast unsupervised clustering for flow cytometry data via <i>K</i>-means and density peak finding Bioinformatics
doi_str_mv 10.1093/bioinformatics/bts300
facet_avail Online
Free
finc_class_facet Chemie und Pharmazie
Mathematik
Informatik
Biologie
format ElectronicArticle
fullrecord blob:ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTA5My9iaW9pbmZvcm1hdGljcy9idHMzMDA
id ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTA5My9iaW9pbmZvcm1hdGljcy9idHMzMDA
institution DE-Gla1
DE-Zi4
DE-15
DE-Pl11
DE-Rs1
DE-105
DE-14
DE-Ch1
DE-L229
DE-D275
DE-Bn3
DE-Brt1
DE-Zwi2
DE-D161
imprint Oxford University Press (OUP), 2012
imprint_str_mv Oxford University Press (OUP), 2012
issn 1367-4811
1367-4803
issn_str_mv 1367-4811
1367-4803
language English
mega_collection Oxford University Press (OUP) (CrossRef)
match_str ge2012flowpeaksafastunsupervisedclusteringforflowcytometrydataviakmeansanddensitypeakfinding
publishDateSort 2012
publisher Oxford University Press (OUP)
recordtype ai
record_format ai
series Bioinformatics
source_id 49
title flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_unstemmed flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_full flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_fullStr flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_full_unstemmed flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_short flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_sort flowpeaks: a fast unsupervised clustering for flow cytometry data via <i>k</i>-means and density peak finding
topic Computational Mathematics
Computational Theory and Mathematics
Computer Science Applications
Molecular Biology
Biochemistry
Statistics and Probability
url http://dx.doi.org/10.1093/bioinformatics/bts300
publishDate 2012
physical 2052-2058
description <jats:title>Abstract</jats:title> <jats:p>Motivation: For flow cytometry data, there are two common approaches to the unsupervised clustering problem: one is based on the finite mixture model and the other on spatial exploration of the histograms. The former is computationally slow and has difficulty to identify clusters of irregular shapes. The latter approach cannot be applied directly to high-dimensional data as the computational time and memory become unmanageable and the estimated histogram is unreliable. An algorithm without these two problems would be very useful.</jats:p> <jats:p>Results: In this article, we combine ideas from the finite mixture model and histogram spatial exploration. This new algorithm, which we call flowPeaks, can be applied directly to high-dimensional data and identify irregular shape clusters. The algorithm first uses K-means algorithm with a large K to partition the cell population into many small clusters. These partitioned data allow the generation of a smoothed density function using the finite mixture model. All local peaks are exhaustively searched by exploring the density function and the cells are clustered by the associated local peak. The algorithm flowPeaks is automatic, fast and reliable and robust to cluster shape and outliers. This algorithm has been applied to flow cytometry data and it has been compared with state of the art algorithms, including Misty Mountain, FLOCK, flowMeans, flowMerge and FLAME.</jats:p> <jats:p>Availability: The R package flowPeaks is available at https://github.com/yongchao/flowPeaks.</jats:p> <jats:p>Contact:  yongchao.ge@mssm.edu</jats:p> <jats:p>Supplementary information:  Supplementary data are available at Bioinformatics online</jats:p>
container_issue 15
container_start_page 2052
container_title Bioinformatics
container_volume 28
format_de105 Article, E-Article
format_de14 Article, E-Article
format_de15 Article, E-Article
format_de520 Article, E-Article
format_de540 Article, E-Article
format_dech1 Article, E-Article
format_ded117 Article, E-Article
format_degla1 E-Article
format_del152 Buch
format_del189 Article, E-Article
format_dezi4 Article
format_dezwi2 Article, E-Article
format_finc Article, E-Article
format_nrw Article, E-Article
_version_ 1792346157724205057
geogr_code not assigned
last_indexed 2024-03-01T17:34:54.083Z
geogr_code_person not assigned
openURL url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fvufind.svn.sourceforge.net%3Agenerator&rft.title=flowPeaks%3A+a+fast+unsupervised+clustering+for+flow+cytometry+data+via+K-means+and+density+peak+finding&rft.date=2012-08-01&genre=article&issn=1367-4803&volume=28&issue=15&spage=2052&epage=2058&pages=2052-2058&jtitle=Bioinformatics&atitle=flowPeaks%3A+a+fast+unsupervised+clustering+for+flow+cytometry+data+via+%3Ci%3EK%3C%2Fi%3E-means+and+density+peak+finding&aulast=Sealfon&aufirst=Stuart+C.&rft_id=info%3Adoi%2F10.1093%2Fbioinformatics%2Fbts300&rft.language%5B0%5D=eng
SOLR
_version_ 1792346157724205057
author Ge, Yongchao, Sealfon, Stuart C.
author_facet Ge, Yongchao, Sealfon, Stuart C., Ge, Yongchao, Sealfon, Stuart C.
author_sort ge, yongchao
container_issue 15
container_start_page 2052
container_title Bioinformatics
container_volume 28
description <jats:title>Abstract</jats:title> <jats:p>Motivation: For flow cytometry data, there are two common approaches to the unsupervised clustering problem: one is based on the finite mixture model and the other on spatial exploration of the histograms. The former is computationally slow and has difficulty to identify clusters of irregular shapes. The latter approach cannot be applied directly to high-dimensional data as the computational time and memory become unmanageable and the estimated histogram is unreliable. An algorithm without these two problems would be very useful.</jats:p> <jats:p>Results: In this article, we combine ideas from the finite mixture model and histogram spatial exploration. This new algorithm, which we call flowPeaks, can be applied directly to high-dimensional data and identify irregular shape clusters. The algorithm first uses K-means algorithm with a large K to partition the cell population into many small clusters. These partitioned data allow the generation of a smoothed density function using the finite mixture model. All local peaks are exhaustively searched by exploring the density function and the cells are clustered by the associated local peak. The algorithm flowPeaks is automatic, fast and reliable and robust to cluster shape and outliers. This algorithm has been applied to flow cytometry data and it has been compared with state of the art algorithms, including Misty Mountain, FLOCK, flowMeans, flowMerge and FLAME.</jats:p> <jats:p>Availability: The R package flowPeaks is available at https://github.com/yongchao/flowPeaks.</jats:p> <jats:p>Contact:  yongchao.ge@mssm.edu</jats:p> <jats:p>Supplementary information:  Supplementary data are available at Bioinformatics online</jats:p>
doi_str_mv 10.1093/bioinformatics/bts300
facet_avail Online, Free
finc_class_facet Chemie und Pharmazie, Mathematik, Informatik, Biologie
format ElectronicArticle
format_de105 Article, E-Article
format_de14 Article, E-Article
format_de15 Article, E-Article
format_de520 Article, E-Article
format_de540 Article, E-Article
format_dech1 Article, E-Article
format_ded117 Article, E-Article
format_degla1 E-Article
format_del152 Buch
format_del189 Article, E-Article
format_dezi4 Article
format_dezwi2 Article, E-Article
format_finc Article, E-Article
format_nrw Article, E-Article
geogr_code not assigned
geogr_code_person not assigned
id ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTA5My9iaW9pbmZvcm1hdGljcy9idHMzMDA
imprint Oxford University Press (OUP), 2012
imprint_str_mv Oxford University Press (OUP), 2012
institution DE-Gla1, DE-Zi4, DE-15, DE-Pl11, DE-Rs1, DE-105, DE-14, DE-Ch1, DE-L229, DE-D275, DE-Bn3, DE-Brt1, DE-Zwi2, DE-D161
issn 1367-4811, 1367-4803
issn_str_mv 1367-4811, 1367-4803
language English
last_indexed 2024-03-01T17:34:54.083Z
match_str ge2012flowpeaksafastunsupervisedclusteringforflowcytometrydataviakmeansanddensitypeakfinding
mega_collection Oxford University Press (OUP) (CrossRef)
physical 2052-2058
publishDate 2012
publishDateSort 2012
publisher Oxford University Press (OUP)
record_format ai
recordtype ai
series Bioinformatics
source_id 49
spelling Ge, Yongchao Sealfon, Stuart C. 1367-4811 1367-4803 Oxford University Press (OUP) Computational Mathematics Computational Theory and Mathematics Computer Science Applications Molecular Biology Biochemistry Statistics and Probability http://dx.doi.org/10.1093/bioinformatics/bts300 <jats:title>Abstract</jats:title> <jats:p>Motivation: For flow cytometry data, there are two common approaches to the unsupervised clustering problem: one is based on the finite mixture model and the other on spatial exploration of the histograms. The former is computationally slow and has difficulty to identify clusters of irregular shapes. The latter approach cannot be applied directly to high-dimensional data as the computational time and memory become unmanageable and the estimated histogram is unreliable. An algorithm without these two problems would be very useful.</jats:p> <jats:p>Results: In this article, we combine ideas from the finite mixture model and histogram spatial exploration. This new algorithm, which we call flowPeaks, can be applied directly to high-dimensional data and identify irregular shape clusters. The algorithm first uses K-means algorithm with a large K to partition the cell population into many small clusters. These partitioned data allow the generation of a smoothed density function using the finite mixture model. All local peaks are exhaustively searched by exploring the density function and the cells are clustered by the associated local peak. The algorithm flowPeaks is automatic, fast and reliable and robust to cluster shape and outliers. This algorithm has been applied to flow cytometry data and it has been compared with state of the art algorithms, including Misty Mountain, FLOCK, flowMeans, flowMerge and FLAME.</jats:p> <jats:p>Availability: The R package flowPeaks is available at https://github.com/yongchao/flowPeaks.</jats:p> <jats:p>Contact: yongchao.ge@mssm.edu</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online</jats:p> flowPeaks: a fast unsupervised clustering for flow cytometry data via <i>K</i>-means and density peak finding Bioinformatics
spellingShingle Ge, Yongchao, Sealfon, Stuart C., Bioinformatics, flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding, Computational Mathematics, Computational Theory and Mathematics, Computer Science Applications, Molecular Biology, Biochemistry, Statistics and Probability
title flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_full flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_fullStr flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_full_unstemmed flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_short flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
title_sort flowpeaks: a fast unsupervised clustering for flow cytometry data via <i>k</i>-means and density peak finding
title_unstemmed flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
topic Computational Mathematics, Computational Theory and Mathematics, Computer Science Applications, Molecular Biology, Biochemistry, Statistics and Probability
url http://dx.doi.org/10.1093/bioinformatics/bts300