生物医学工程学杂志

生物医学工程学杂志

基于t分布邻域嵌入算法的流式数据自动分群方法

查看全文

流式细胞仪中多参数流式数据分群传统方法主要是利用专业软件采取人工设门方式,圈出目标细胞进行分析,分析过程较为复杂,专业性较强。基于此,本文提出了一种基于 t 分布邻域嵌入(t-SNE)算法对多参数流式数据进行分群处理。该算法将样本数据在高维空间中的欧几里德距离转化为条件概率来表征相似性,使数据降到低维空间。本文通过使用流式细胞仪处理染色后的人体外周血细胞,并将处理后的数据导出作为实验样本数据,对其利用 t-SNE 算法进行降维,并与核主成分分析(KPCA)降维算法对比,分别使用 K 均值(K-means)算法对降维得到的主成分数据进行分类。结果表明,t-SNE 算法对呈非对称且有拖尾分布的细胞类群具有很好的分群效果,分群准确率可达 92.55%,或可有助于多色多参数流式数据进行自动分析。

The traditional method of multi-parameter flow data clustering in flow cytometry is to mainly use professional software to manually set the door and circle out the target cells for analysis. The analysis process is complex and professional. Based on this, a clustering algorithm, which is based on t-distributed stochastic neighbor embedding (t-SNE) algorithm for multi-parameter stream data, is proposed in the paper. In this algorithm, the Euclidean distance of sample data in high dimensional space is transformed into conditional probability to represent similarity, and the data is reduced to low dimensional space. In this paper, the stained human peripheral blood cells were treated by flow cytometry, and the processed data were derived as experimental sample data. Thet-SNE algorithm is compared with the kernel principal component analysis (KPCA) dimensionality reduction algorithm, and the main component data obtained by the dimensionality reduction are classified using K-means algorithm. The results show that thet-SNE algorithm has a good clustering effect on the cell population with asymmetric and trailing distribution, and the clustering accuracy can reach 92.55%, which may be helpful for automatic analysis of multi-color multi-parameter flow data.

关键词: 生物医学; 细胞分群; t分布邻域嵌入算法; 核主成分分析; K均值

Key words: biomedicine; cell clustering; t-distributed stochastic neighbor embedding; kernel principal component analysis; K-means

登录后 ,请手动点击刷新查看全文内容。 没有账号,
登录后 ,请手动点击刷新查看图表内容。 没有账号,
1. Bashashati A, Brinkman R R. A survey of flow cytometry data analysis methods. Advances in Bioinformatics, 2009, 2009: 584603.
2. Jahn K, Buschmann V, Hille C. Simultaneous fluorescence and phosphorescence lifetime imaging microscopy in living cells. Sci Rep, 2015, 5(6262): 739-740.
3. 张文昌, 祝连庆, 娄小平, 等. 基于灰色预测恢复算法的流式细胞仪多参数提取. 仪器仪表学报, 2015, 36(7): 1660-1665.
4. Krutzik P O, Irish J M, Nolan G P, et al. Analysis of protein phosphorylation and cellular signaling events by flow cytometry: techniques and clinical applications. Clin Immunol, 2004, 110(3): 206-221.
5. Brie D, Klotz R, Miron S, et al. Joint analysis of flow cytometry data and fluorescence spectra as a non-negative array factorization problem. Chemometrics and Intelligent Laboratory Systems, 2014, 137(23): 21-32.
6. Qian Yu, Wei C, Lee F H, et al. Elucidation of seventeen human peripheral blood B-cell subsets and quantification of the tetanus response using a density-based method for the automated identification of cell populations in multidimensional flow cytometry data. Cytometry B Clin Cytom, 2010, 78B(1): S69-S82.
7. Aghaeepour N, Nikolic R, Hoos H H, et al. Rapid cell population identification in flow cytometry data. Cytometry Part A, 2011, 79A(1): 6-13.
8. Zeng Q T, Pratt J P, Pak J, et al. Feature-guided clustering of multi-dimensional flow cytometry datasets. Journal of Biomedical Informatics, 2007, 40(3): 325-331.
9. Sugár I P, Sealfon S C. Misty mountain clustering: application to fast unsupervised flow cytometry gating. BMC Bioinformatics, 2010, 11(1): 502.
10. Morris C W, Autret A, Boddy L. Support vector machines for identifying organisms - a comparison with strongly partitioned radial basis function networks. Ecological Modelling, 2001, 146(1/3, SI): 57-67.
11. Boedigheimer M J, Ferbas J. Mixture modeling approach to flow cytometry data. Cytometry Part A, 2008, 73A(5): 421-429.
12. Pedreira C E, Costa E S, Lecrevisse Q, et al. Overview of clinical flow cytometry data analysis: recent advances and future challenges. Trends in Biotechnology, 2013, 31(7): 415-425.
13. Ghaleb T A, Mohammed M A, Ramadan E. Automated analysis of flow cytometry data: a systematic review of recent methods//2016 2nd International Conference On Open Source Software Computing (OSSCOM), IEEE, 2016: 1-7.
14. 张雨晨. 基于改进的SVM和t-SNE高速列车走行部故障诊断. 成都: 西南交通大学, 2016.
15. 徐佳琳, 左国坤. 基于互信息与主成分分析的运动想象脑电特征选择算法. 生物医学工程学杂志, 2016, 33(2): 201-207.
16. 姜战伟, 郑近德, 潘海洋, 等. 基于多尺度时不可逆与t-SNE流形学习的滚动轴承故障诊断. 振动与冲击, 2017, 36(17): 61-68.
17. Gu Yuhai, He Linfeng, Deng Yali, et al. A fault identification method of rotating machinery based on t-SNE. 仪器仪表学报, 2016(s1): 152-156.
18. 马闪闪, 董明利, 张帆, 等. 基于核主成分分析的流式细胞数据分群方法研究. 生物医学工程学杂志, 2017, 34(1): 115-122.
19. 张婷婷, 孙群, 杨磊, 等. 基于电子鼻传感器阵列优化的甜玉米种子活力检测. 农业工程学报, 2017, 33(21): 275-281.
20. 高国琴, 李明. 基于K-means算法的温室移动机器人导航路径识别. 农业工程学报, 2014, 30(7): 25-33.
21. Zhang Wenchang, Lou Xiaoping, Meng Xiaochen, et al. Representation method for spectrally overlapping signals in flow cytometry based on fluorescence pulse time-delay estimation. Sensors, 2016, 16(11): 1978.
22. Zhang W, Zhu L, Lou X, et al. New method of evaluating the liquid path stability of flow cytometer// International Conference on Manipulation, Manufacturing and Measurement on the Nanoscale. IEEE, 2016: 316-320.