Clustree (0.5.0) - A good thing for clustering hierarchies!
Article directory
foreword
Clustreeis an R-based suite for visualizing the structure and hierarchy of clustering trees. The suite provides a simple and intuitive way for researchers to visually explore the results of cluster analysis.

How to download the Clustree suite?
Execute the following command in R or RStudio to install the clustree package:
# download package install.packages("clustree") # load package library(clustree) # check version packageVersion("clustree")sample data
In this example, we will analyze the peripheral blood (PBMC) single cell data set from 10X Genomics, which is a data set containing 2700 single cells sequenced by Illumina NextSeq 500. The raw data can bedownload here.
existSeurat V4.9.9 – A powerful R suite for single-cell analysisWe have learned the basic steps of single-cell analysis, as shown in the following command:
# load package library(Seurat) library(dplyr) library(patchwork) # load PBMC data set, need to use the correct path in the computer, you must change "\" to "/" pbmc.data <- Read10X(data.dir = "C:/Users/Administrator/Desktop/hg19") # Create Seurat object pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200) # cell granules Line body analysis pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-") # Use violin plot to visualize QC indicators VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA ", "percent.mt"), ncol = 3) # QC step pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5) # normalized pbmc <- NormalizeData(pbmc, normalization. method = "LogNormalize", scale.factor = 10000) # select highly variable features pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000) # feature scale all.genes <- rownames(pbmc) pbmc <- ScaleData(pbmc, features = all.genes) # PCA dimension reduction analysis pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc)) # determine dimension reduction results ElbowPlot(pbmc) # determine cell aggregation results pbmc <- FindNeighbors(pbmc, dims = 1:10) pbmc <- FindClusters(pbmc, resolution = 0.5) # Use UMAP or tSNE to cluster pbmc <- RunUMAP(pbmc, dims = 1:10) pbmc <- RunTSNE(pbmc, dims = 1:10)
When standard analysis procedures are performed, we can obtain the above results. butCell clustering (cluster) is not static, it can be three clusters, five clusters, or even ten clusters, so how do we choose the most appropriate number of clusters for the next step of analysis? At this time, the R suite of Clustree can help us make an appropriate choice.

How to use Clustree
Change the clustree suite tolibrary()functionLoaded into the R environment, and withFindClusters()functionDo more clustering, then useclustree()functionDisplay all clustering results:
library(clustree) pbmc <- FindClusters(pbmc, resolution = c(0,0.1,0.5,1,2)) clustree(pbmc@meta.data, prefix = "RNA_snn_res.")
We can see that the results obtained by clustree are consistent with the results of the second legend in this article,There are 4 cell populations at resolution 0.1, 9 cell populations at resolution 0.5, and 16 cell populations at resolution 2.
At this point, the key functions of clustree have been taught. If you just want to observe the changes of cell clustering, the analysis here is actually quite enough. Of course, the clustree suite also has many auxiliary functions, such as:
1. Check the mitochondrial information of each cell population
clustree(pbmc@meta.data, prefix = "RNA_snn_res.", node_colour = "percent.mt", node_colour_aggr = "mean")
2. Add logo
clustree(pbmc, prefix = "RNA_snn_res.", node_label = "RNA_snn_res.")
3. Check the gene expression of each cell population
clustree(pbmc, prefix = "RNA_snn_res.", node_colour = "MS4A1", node_colour_aggr = "median")
4. Overlay the clustering tree on the visual analysis results
use clustree_overlay()function, the structural information of the clustering tree can be superimposed on other visual analysis results of the data, so as to better understand the association between the clustering tree and other dimensions such as distribution, concentration or category.
pbmc <- AddMetaData(pbmc,pbmc@reductions$pca@cell.embeddings, col.name = c("UMAP_1","UMAP_2")) pbmc <- AddMetaData(pbmc,pbmc@reductions$pca@cell.embeddings, col.name = colnames (pbmc@reductions$pca@cell.embeddings)) clustree_overlay(pbmc, prefix = "RNA_snn_res.", x_value = "UMAP_1", y_value = "UMAP_2")
Color the analysis results:
clustree_overlay(pbmc, prefix = "RNA_snn_res.", x_value = "UMAP_1", y_value = "UMAP_2", use_colour = "points", alt_colour = "blue") 
Label the message of the result node:
clustree_overlay(pbmc, prefix = "RNA_snn_res.", x_value = "UMAP_1", y_value = "UMAP_2", label_nodes = TRUE)
Display side view results:
overlay_list <- clustree_overlay(pbmc, prefix = "RNA_snn_res.", x_value = "PC_1", y_value = "PC_2", plot_sides = TRUE) names(overlay_list) > [1] "overlay" "x_side" "y_side" # shows x_side Result overlay_list$x_side
# show y_side result overlay_list $y_side
epilogue
Clustree is a powerful and easy-to-use R suite that provides intuitive ways to visualize the structure and hierarchy of cluster trees. Through Clustree, we can better understand and explain the results of cluster analysis, and can quickly discover cell aggregation patterns and relationships in the data, which is an essential suite for analyzing single-cell data.
references
1. Zappia L, Oshlack A. Clustering trees: a visualization for evaluating clusterings at multiple resolutions. Gigascience. 2018;7. DOI: gigascience/giy083.












