In this video we recommend flowSOM as the algorithm most suitable for performing clustering in high dimensional flow cytometry analysis.
We discuss the research which underpins our recommendation and use Tercen to generate a Minimum Spanning Tree for visualising flowSOM clusters.
Table of Contents:
00:00 - Introduction
00:28 - Comparison of clustering algorithms
01:39 - Introduction to flowSOM
02:31 - Using flowSOm in workflows
03:06 - flowSOM Minimum Spanning Tree
03:44 - Using flowSOM with dimention reduction
Using Tercen for Clustering in High Dimensional Flow Cytometry Analysis
Today we’ll take a look at how Tercen uses algorithms to group cell populations into clusters.
Being able to see phenotypic populations and identify patterns in their relationships is what Flow Cytometry is all about.
Typically, clustering is done through Bi-Axial (or Manual) Gating, and many algorithms have been developed to automate this.
But, we are going to recommend that the statistical technique called flowSOM be used for high dimensional flow cytometry analysis.
One of the main reasons we recommend flowSOM is because of the work done by Lukas Weber while at the Mark Robinson Lab in the University of Zurich.
Lukas compared 18 clustering methods over 6 flow cytometry datasets that had different characteristics.
Four were cyTOF (high dimensional) datasets and 2 were traditional cytometry.
For illustration we are showing only one of the six datasets, called Levine 32.
We have Lukas’ datasets on our gitHub if you want to recreate the experiment yourself.
He assigned an F1 score to rate how closely the algorithmic method compared to manual gating.
You can see on this graph that FlowSOM scores the highest accuracy with point 76.
The green and red bars represent performance on false-positives and false-negatives, and flowSOM scores pretty well here too.
This graph represents how the algorithms performed in terms of compute time, and you can see that flowSOM is amongst the quickest to calculate.
So, being the most accurate algorithm and having a really good compute time is why we recommend flowSOM for high dimensional flow cytometry analysis.
High Dimensional Flow Cytometry Analysis: What is flowSOM?
FlowSOM is a basic neural net.
It is very good for classifying groups, such as clustering the pixels of a jpeg image for face recognition.
What makes it interesting to high dimensional flow cytometry is that it can show the relationship of populations, in a structure called a Minimum Spanning Tree.
FlowSOM uses a stochastic method for calculations, and it bases its randomisation off a number, called a seed.
Our Operator allows this seed to be set manually for situations where you need to reproduce exact results, such as for a publication.
FlowSOM can programmatically discover the number of clusters in your dataset, but we don’t recommend doing this because it tends to under-cluster, and you could miss an important population.
We recommend setting the number of clusters manually for your high dimensional flow cytometry analysis, and I will show how you can decide that number in our next video.
Let’s take a look at a flowSOM workflow which we built in Tercen using the PBMC dataset from the earlier video.
You see there is an ACH transformation, like before, and we have run flowSOM on that.
We graphed all of the channels for this illustration, but in a real experiment you would probably analyse a subset.
Here we manually set the algorithm for 7 clusters.
For this data step we projected a visualisation of the flowSOMs output and coloured by group to see how the flowSOM has classified the cells.
This data step has a flowSOM Shiny operator attached.
High Dimensional Flow Cytometry Analysis: Shiny Operators
Shiny Operators have more sophisticated code. They can generate specialised graphs and show them in Tercen’s Operator tab.
This operator has created a Minimum Spanning Tree which shows how the clusters are related to each other.
For example, the yellow group is related to the green group through the cyan group.
If we zoom closer we can observe the respective densities of the channels that make up a population.
It is also possible to apply Dimension Reduction techniques, like UMAP, either before or after the flowSOM calculation.
We have some workflows to illustrate these scenarios.
In this example, the dimension reduction is done after flowSOM and we have coloured by cluster to appraise them.
On this workflow we have done dimension reduction in advance of flowSOM.
FlowSOM is applied to the UMAP components rather than the channels.
This approach has the advantage of getting groups more clearly separated.
However, as we explained in the previous video, dimension reduction is a representation of your data that works by excluding redundancy.
So you need to be careful when choosing this approach because you could potentially lose smaller, rare, populations.
So, that’s a quick overview of why we recommend flowSOM as the Statistical Technique for clustering in high dimensional flow cytometry analysis.
Many thanks to Lukas for the comparison research, and to Sophie who designed the brilliant flowSOM algorithm which we use in Tercen.