R language suite for building kinship trees - SNPRelate 1.34.1

Introduction

SNPRelate 1.34.1is an R suite for computing distance matrices between samples directly from VCF files, building kinship trees or performing hierarchical clustering. Its main purpose is to provide convenient and fast functions to help users quickly generate kinship trees or cluster analysis results from sequence data.

Install SNPRelate

need to advanceInstall R software and RStudio,then installSNPRelate 1.34.1suite, as follows:

# install package if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("SNPRelate")

Preliminary work before analysis

useSNPRelate 1.34.1When analyzing the data of different samples, all VCF results must be combined into one VCF file for analysis. We can usevcftoolsMerge all sequencing results, please refer to the detailed merging stepsR language suite for building kinship trees-fastreeR 1.4.0The fourth subsection of .

Perform VCF Pooled Sample Analysis

Before using any R package, you need to firstuselibrary()to loadkit needed. Here we will take the VCF file provided by the author as an example, the filecan be downloaded from here.

1. Load all R packages and datasets to be used

# load package library(gdsfmt) library(SNPRelate) library(ggplot2) # check version packageVersion("SNPRelate") # load dataset vcf.fn <- "C:/Users/USER/Desktop/VCFExampleData.vcf"

2. Parse the dataset

Converts VCF files to a less data-dense form (GDS) for faster calculations. This instruction will run slower if it is the result of loading the entire genome.

# analysis data set snpgdsVCF2GDS(vcf.fn,"data.gds",method="biallelic.only", verbose=TRUE)

3. Create the IBS Matrix

Create an Identity by State (IBS) matrix. IBS refers to state homology. If a piece of DNA has the same nucleotide sequence in two or more individuals, the attribute of the DNA fragment can be defined as state homology.

# Create IBS matrix genofile <- snpgdsOpen("data.gds") set.seed(100) ibs.hc<-snpgdsHCluster(snpgdsIBS(genofile,num.thread=2, autosome.only=FALSE))

4. Draw a dendrogram

# draw dendrogram rv <- snpgdsCutTree(ibs.hc) plot(rv$dendrogram,main="Dendrogram based on IBS")

5. Dissimilarity matrices

This command will create a matrix containing the differences between all samples. If you are doing X chromosome correlation analysis, please putautosome.only = TRUEcode changed toautosome.only = FALSE.

# difference matrix analysis dissMatrix = snpgdsDiss(genofile, sample.id=NULL, autosome.only=TRUE, remove.monosnp=TRUE, maf=NaN, missing.rate=NaN, num.thread=2, verbose=TRUE)

6. Cluster Analysis

# cluster analysis snpHCluster = snpgdsHCluster(dissMatrix, sample.id=NULL, need.mat=TRUE, hang=0.01) cutTree = snpgdsCutTree(snpHCluster, z.threshold=15, outlier.n=5, n.perm = 5000, samp.group=NULL, col.outlier="red", col.list=NULL, pch.outlier=4, pch.list=NULL,label.H=FALSE, label.Z=TRUE, verbose=TRUE) cutTree snpgdsClose (genofile)

7. Draw a difference dendrogram

# Draw the difference dendrogram snpgdsDrawTree(cutTree, main = "Dendrogram based on dissimilarity", edgePar=list(col=rgb(0.5,0.5,0.5,0.75),t.col="black"), y.label. kinship=T, leaflab="perpendicular", yaxis.kinship=F)

references

Benbow Lab – Phylogeny form VCF file (https://benbowlab.github.io/Phylogeny.html)

I am very grateful for your sharing!!!
MillionQuesn
Million Quesn

A foreigner living in Taiwan, sharing the highlights of a sudden flash of inspiration.

Articles: 46

Leave a Reply

Your email address will not be published. Required fields are marked *