Escape v1.5.1 Suite - Single Cell Gene Set Pathway Analysis (ssGSEA) Example

foreword

Gene Set Enrichment Analysis (GSEA) is an analysis method commonly used in functional genomics (Functional genomics) research. Its purpose is to assess whether a specific gene set (gene set) has a statistically significant difference in gene expression data under a specific condition, such as a normal group and a diseased group sample. This method can help analysts understand which possible pathways (pathways) may be activated or inhibited by specific conditions. ssGSEA (Single-sample Gene Set Enrichment Analysis) is an extension of the GSEA method. Compared with the traditional GSEA method,ssGSEA is an analysis of individual samples, rather than comparing the difference between two different conditions, it is mainly used to assess the activity of a specific gene set in a single sample, and thus suitable for gene set pathway analysis of single-cell data.

Escape KitIt is an R package for analyzing ssGSEA, developed by the team of Nicholas Borcherding, which can perform pathway analysis, visualization, and significant analysis on samples. Basically, only one package can analyze ssGSEA from beginning to end, and there is no need to consider how to solve downstream problems. The result presentation and other issues, it is recommended for everyone to use.

Download and install the Escape suite

First, you must be on the computerInstall R language and RStudio, and then open the RStudio window and enter the installationEscape Kitcommand:

# Download devtools package install.packages("devtools") # Download Escape package devtools::install_github("ncborcherding/escape")

sample data

In this example, we will analyze two sets of single-cell data sets from peripheral blood (PBMC) of 10X Genomics, which respectively contain 2700 and 5025 single-cell data sets. The original data.rds save fileCandownload in my cloud. If you have doubts about the file, you can go to the 10X Genomics database to download it yourself, and then follow theSingle-cell data analysis: detailed explanation of batch effect and Seurat Integration analysisThis article is analyzed.

Dataset download location:

3k PBMCs from a Healthy Donor

5k Peripheral Blood Mononuclear Cells (PBMCs) from a Healthy Donor (v3 chemistry)

1. Load the kit and load the analyzed PBMC data set

Load the required packages into the R environment, and putpbmc3k5K_final.rdsThe data is loaded into R, the command used here isreadRDS()function,The path provided in the code is just an example path and needs to be modified according to the path where the actual file is located on the computer.

# load package library(Seurat) library(dplyr) library(patchwork) library(escape) # needs to use the correct path in the computer, you must change "\" to "/" pbmc.combined <- readRDS(file = "C: /Users/Administrator/Desktop/pbmc3k5K_final.rds") # Check input results DimPlot(pbmc.combined, reduction = "umap", group.by = "orig.ident") DimPlot(pbmc.combined, reduction = "umap") DimPlot(pbmc.combined, reduction = "umap", label = TRUE, split.by = "orig.ident")

2. Select the gene set you want to analyze (gene set)

Molecular Signature Database (MSigDB)It is a database containing tens of thousands of gene sets. Currently, there are two databases for human and mouse, which can be used for GO, KEGG, GSEA, ssGSEA and other analysis.

escape2

The Human Genome Database is divided into nine sub-databases, namely:

  • H: hallmark gene sets (cancer) feature gene set, a total of 50 gene sets, the most commonly used.
  • C1: positional gene sets, a total of 300 gene sets.
  • C2: curated gene sets, a total of 6495 gene sets.
  • C3: regulatory target gene sets, a total of 3713 gene sets.
  • C4: computational gene sets, a total of 858 gene sets.
  • C5: ontology gene sets, including biological process (BP), cellular component (CC) and molecular function (MF), with a total of 15,937 gene sets, most commonly used.
  • C6: oncogenic signatures gene sets, a total of 189 gene sets.
  • C7: immunological signatures gene sets, a total of 5219 gene sets, the most commonly used.
  • C8: cell type signature gene sets, a total of 830 gene sets.

The Escape package kindly includes this library in the package, so we just usegetGeneSets()function, we can select the gene set we want to analyze from the MSigDB database, by settinglibrary()The function can specify the desired gene set sub-database, the command is as follows:

# Extract the gene set you want to analyze from the MSigDB database, take hallmark gene sets as an example GS <- getGeneSets(library = "H")

3. Perform ssGSEA analysis

# Perform ssGSEA analysis ES <- enrichIt(obj = pbmc.combined, gene.sets = GS, groups = 1000, cores = 16) # Add analysis results to pbmc Seurat data pbmc.combined <- AddMetaData(pbmc.combined, ES )

4. Visualizations

The next step is to visualize the analyzed results. You need to download the dittoSeq package first. The method is as follows:

# Download the dittoSeq suite BiocManager::install("dittoSeq")

4.1 Heatmap (Heatmap)

# load dittoSeq suite library(dittoSeq) # load palette colorblind_vector <- colorRampPalette(c("#FF4B20", "#FFB433", "#C6FDEC", "#7AC5FF", "#0348A6")) # hot Figure (Heatmap) pbmc .combined@meta.data$active.idents <- pbmc.combined@active.ident dittoHeatmap(pbmc.combined, genes = NULL, metas = names(ES), heatmap.colors = rev(colorblind_vector(50)), annot.by = c("active.idents", "orig.ident"), fontsize = 7)
Escape-1

If you don't like the color of the palette, you can choose not to add it or change the color of the palette yourself:

dittoHeatmap(pbmc.combined, genes = NULL, metas = names(ES), annot.by = c("active.idents", "orig.ident"), fontsize = 7)
Escape-3

You can choose the required gene set by yourself:

dittoHeatmap(pbmc.combined, genes = NULL, metas = c("HALLMARK_APOPTOSIS", "HALLMARK_DNA_REPAIR", "HALLMARK_P53_PATHWAY"), heatmap.colors = rev(colorblind_vector(50)), annot.by = c("active.idents" , "orig.ident"), fontsize = 7)
Escape-4

4.2 Violin Plot

dittoPlot(pbmc.combined, "HALLMARK_DNA_REPAIR", group.by = "orig.ident") + scale_fill_manual(values = colorblind_vector(2))
Escape-5

Multiple gene set pathway violin plot instructions:

multi_dittoPlot(pbmc.combined, vars = c("HALLMARK_WNT_BETA_CATENIN_SIGNALING", "HALLMARK_XENOBIOTIC_METABOLISM", "HALLMARK_UV_RESPONSE_UP"), group.by = "orig.ident", plots = c("jitter", "vlnplot" , "boxplot"), ylab = "Enrichment Scores", theme = theme_classic() + theme(plot. title = element_text(size = 10)))
Escape-6

4.3 Hex Density Enrichment Plots

If you think that there is a certain correlation between the two pathways, you can use the dense enrichment map to make further observations:

# download required package BiocManager::install("hexbin") # dense enrichment map visualization dittoScatterHex(pbmc.combined, x.var = "HALLMARK_DNA_REPAIR", y.var = "HALLMARK_MTORC1_SIGNALING", do.contour = TRUE) + theme_classic () + scale_fill_gradientn(colors = rev(colorblind_vector(11))) + geom_vline(xintercept = 0, lty=2) + geom_hline(yintercept = 0, lty=2) 
Escape-7

The results of pbmc3k and pbmc5K can be converted tosplit.by()function separately:

dittoScatterHex(pbmc.combined, x.var = "HALLMARK_DNA_REPAIR", y.var = "HALLMARK_MTORC1_SIGNALING", do.contour = TRUE, split.by = "orig.ident") + theme_classic() + scale_fill_gradientn(colors = rev(colorblind_vect or (11))) + geom_vline(xintercept = 0, lty=2) + geom_hline(yintercept = 0, lty=2) 
Escape-8

4.4 Ridge Plot

ES2 <- data.frame(pbmc.combined[[]], Idents(pbmc.combined)) colnames(ES2)[ncol(ES2)] <- "cluster" ridgeEnrichment(ES2, gene.set = "HALLMARK_DNA_REPAIR", group = "orig.ident", add.rug = TRUE)
Escape-9
ridgeEnrichment(ES2, gene.set = "HALLMARK_DNA_REPAIR", group = "seurat_clusters", facet = "orig.ident", add.rug = TRUE)
Escape-10

4.5 Split Violin Plot

splitEnrichment(ES2, split = "orig.ident", gene.set = "HALLMARK_DNA_REPAIR") splitEnrichment(ES2, x.axis = "cluster", split = "orig.ident", gene.set = "HALLMARK_DNA_REPAIR")
Escape-11

5. Expanded Analysis - PCA Analysis

useperformPCA()The function performs PCA analysis on gene set enrichment scores, which can help analysts understand the variability and association between different gene sets in the gene set enrichment analysis results.

# Perform PCA analysis PCA <- performPCA(enriched = ES2, groups = c("orig.ident", "cluster")) pcaEnrichment(PCA, PCx = "PC1", PCy = "PC2", contours = FALSE, facet = "cluster") masterPCAPlot(ES2, gene.sets = names(GS), PCx = "PC1", PCy = "PC2", top.contribution = 10)
Escape-12

6. Significance Testing

Escape KitSignificance analysis is also provided. There are five methods in total, namely T.test, linear regression (LR), Wilcoxon Rank Sum Test (Wilcoxon), ANOVA, and Kruskal-Wallis (KW). Analysts can choose the appropriate one according to their needs. Analysis method, here take T.test as an example:

# significance analysis output <- getSignificance(ES2, group = "orig.ident", fit = "T.test") View(output)
Escape-13

epilogue

Escape KitIt is an R suite that I use very often to analyze ssGSEA. As long as you follow the teaching steps, you can basically get the required results, and it also comes with a visualization method, so I personally recommend this suite. After getting the results of ssGSEA, the next step is to explain the results biologically. There is no other shortcut, relying on your strong biological background and knowledge!

references

Borcherding, N., Vishwakarma, A., Voigt, AP et al. Mapping the immune environment in clear cell renal carcinoma by single-cell genomics. Commun Biol 4, 122 (2021).

I am very grateful for your sharing!!!
MillionQuesn
Million Quesn

A foreigner living in Taiwan, sharing the highlights of a sudden flash of inspiration.

Articles: 46

Leave a Reply

Your email address will not be published. Required fields are marked *