Article directory
foreword
Accurate cell type identification is a prerequisite and an important step in the analysis of single-cell data, which has been detailed in previous articlesBasic operation and usage of SingleR, here I will introduce another R suite that automatically annotates cell types -scCATCH.
scCATCHThe kit builds a cell type integration database, which contains at least 353 cell types, 686 cell subtypes associated with 184 tissues, and 2096 human and mouse cell type references,The cell type integration database can be linked via here.
How to download the scCATCH suite?
# Download scCATCH package install.packages("scCATCH") # Check version packageVersion("scCATCH")
Can be used after downloading the scCATCH suitepackageVersion()
Function check version, the scCATCH suite version used in this article is v3.2.2, and then we will directly enter the operation process!
sample data
In this example, we analyze the peripheral blood (PBMC) single cell dataset from 10X Genomics, which is a dataset of 2700 single cells sequenced using the Illumina NextSeq 500. The raw data can bedownload here. Here we will use the processed data for analysis,Analysis archives can be downloaded here. If you want to know the detailed analysis steps of this set of data, you can refer toSeurat V4.9.9 – A powerful R suite for single-cell analysisteaching.
# load package library(Seurat) library(dplyr) library(patchwork) library(scCATCH) # load the analyzed PBMC data set, you need to use the correct path in the computer, you must change "\" to "/" pbmc <- readRDS(file = "C:/Users/Administrator/Desktop/pbmc3k_final.rds") pbmc # check analysis results DimPlot(pbmc, reduction = "umap", label = TRUE)
We can see that the basic analysis of PCA, UMAP and TSNE has been completed in the pbmc data set. Next, we will further use the scCATCH suite to automatically annotate the cell types of the data.
Example of using scCATCH
The scCATCH suite mainly includes two functions, namelyfindmarkergene()
andfindcelltype()
Function, very convenient and concise to use. It can process single-cell data from different species, different tissues or different cancers, and allows analysts to use custom CellMatch or add more signature genes for annotation. It can directly create scCATCH objects from Seurat results without additional data format conversion. Next, we use scCATCH to annotate pbmc data for cell types.
1. Based on peripheral blood (Peripheral blood) cell type reference data set analysis
PBMC is the abbreviation of peripheral blood mononuclear cell, so the same cell type reference data set is required to analyze the pbmc single cell data. The reference data set can be found in the database integrated by scCATCH.The cell type integration database can be linked via here.
# initial setting (cell_min_pct = 0.25, logfc = 0.25) # Step 1: Prepare a Seurat object containing log1p normalized single-cell transcriptome data matrix and cell population information. obj <- createscCATCH(data = pbmc[['RNA']]@data, cluster = as.character(Idents(pbmc))) # Step 2: Find the characteristic genes in each cell population, the method used here is use_method = "1" obj <- findmarkergene(object = obj, species = "Human", use_method = "1", marker = cellmatch, tissue = c("Peripheral blood"), cell_min_pct = 0.25, logfc = 0.25, pvalue = 0.05 , verbose = TRUE) # Step 3: Automatically identify possible cell types obj <- findcelltype(object = obj, verbose = TRUE) # Step 4: View the result View(obj@celltype)
tissue()
The function uses Peripheral blood, and there are two ways to find characteristic genes, which areuse_method = "1"
,Compare with every other cell population;as well asuse_method = "2"
,Compare with other cell populations, these two methods have their own advantages and disadvantages, the results are as follows:
use_method = "1" result
use_method = "2" result
It is easy to see from the results thatuse_method = "1"
The results are more in line with the expectations of cell type discrimination in pbmc data,The picture below is the official Seurat correct answer to the cell type discrimination of pbmc data, you can compare with the results of scCATCH.
2. Assume that the pbmc data set is a cancer cell data set
In addition to analyzing normal cells, scCATCH can also analyze cancer cell data. Now we assume that the pbmc data set is a cancer cell sample, and try to analyze it with scCATCH.
# initial setting (cell_min_pct = 0.25, logfc = 0.25) # Step 1: Prepare a Seurat object containing log1p normalized single-cell transcriptome data matrix and cell population information. obj <- createscCATCH(data = pbmc[['RNA']]@data, cluster = as.character(Idents(pbmc))) # Step 2: Find the characteristic genes in each cell population, the method used here is use_method = "1", set cancer cells as Hepatocellular Cancer obj <- findmarkergene(object = obj, species = 'Human', cancer = c("Hepatocellular Cancer"), use_method = "1", marker = cellmatch, tissue = c( "Blood", "Bone marrow", "Embryo", "Liver"), cell_min_pct = 0.1, logfc = 0.1, pvalue = 0.05, verbose = TRUE) # Step 3: Automatically identify possible cell types obj <- findcelltype( obj) # Step 4: View the result View(obj@celltype)
From the above results, it can be seen that the kit will analyze the cell type reference data set set by the user, and cannot judge the type of sample of the single cell data by itself. therefore,When choosing a reference dataset, be careful and careful, don't choose the wrong one!
3. Custom usage
When wanting to annotate tissues or cancers in different combinations, analysts can use the cellmatch()
The function customizes the required combination mode, and in findmarkergene()
set in functionif_use_custom_marker
is TRUE, the following are the custom commands provided by scCATCH for your reference, you can try to play around:
# Example 1 cellmatch_new <- cellmatch[cellmatch$species == "Human" & cellmatch$tissue %in% c("Kidney", "Liver", "Lung", "Brain"), ] obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, Marker = Cellmatch_new) Obj <-Findcelltype (OBJ) view (obj@celltype) # Example 2 Cellmatch_new <-Cellmatch1tSpecies == "Human" & Cellmatch1 TP4TCANCER %IN% C ("LUNG CANCER", "Lymph Node", "Renal Cell Carcinoma ", "Prostate Cancer"), ] obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, marker = cellmatch_new) obj <- findcelltype(obj) View(obj@celltype) # Example 3 cellmatch_new <- cellmatch[cellmatch$species == "Human", ] obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, marker = cellmatch_new) obj <- findcelltype(obj) View(obj@celltype) # Example 4 cellmatch_new <- cellmatch[cellmatch$cancer %in% c(" Lung Cancer ", "Lymph node", "Renal Cell Carcinoma", "Prostate Cancer") | cellmatch$tissue %in% c("Kidney", "Liver", "Lung", "Brain"), ] obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, marker = cellmatch_new) obj <- findcelltype(obj) View(obj@celltype)
epilogue
scCATCH is an excellent and easy-to-use automated cell type identification kit. It can be used in conjunction with the SingleR kit to mutually verify the results of other cell types from different kits. In addition, scCATCH has a lot more cell type reference data sets than the SingleR suite. If you are familiar with your own single-cell data, the effect of scCATCH may be better than the SingleR suite.
references
Xin Shao, Jie Liao, Xiaoyan Lu, Rui Xue, Ni Ai, Xiaohui Fan, scCATCH: Automatic Annotation on Cell Types of Clusters from Single-Cell RNA Sequencing Data, iScience, Volume 23, Issue 3, 27 March 2020.