Title: | Heritability of Gene Expression for Next-Generation Sequencing |
---|---|
Description: | Statistical framework to analyze heritability of gene expression based on next-generation sequencing data and simulating sequencing reads. Variance partition coefficients (VPC) are computed using linear mixed effects and generalized linear mixed effects models. Compound Poisson and negative binomial models are included. Reference: Rudra, Pratyaydipta, et al. "Model based heritability scores for high-throughput sequencing data." BMC bioinformatics 18.1 (2017): 143. |
Authors: | W. Jenny Shi [aut, cre], Pamela Russell [aut], Pratyaydipta Rudra [aut, cre], Brian Vestal [aut, cre], Katerina Kechris [aut], Laura Saba [aut] |
Maintainer: | W. Jenny Shi <[email protected]> |
License: | GPL-2 |
Version: | 1.0.1 |
Built: | 2025-02-15 03:40:33 UTC |
Source: | https://github.com/kechrislab/heritseq |
Calculate the CP VPC for one or more features following the model fitting function fit.CP().
computeVPC.CP(para)
computeVPC.CP(para)
para |
A |
A matrix consisting of VPC for
G features based on compound Poisson mixed models. Column name is
"CP-fit"; row names are the feature names.
## Compute VPC for each feature under compound Poisson mixed models. vpc.cp <- computeVPC.CP(para_cp) ## Visulize the distribution of the VPCs. hist(vpc.cp, breaks = 50, col = "cyan") ## Plot sorted VPCs. plot(sort(vpc.cp), ylab = "Heritability (h2)", ylim = c(0,1), main = "Sorted CP VPC scores") abline(h = 0.9, lty = 2, col = "red") text(50, 0.92, "h2 = 0.9", col = "red")
## Compute VPC for each feature under compound Poisson mixed models. vpc.cp <- computeVPC.CP(para_cp) ## Visulize the distribution of the VPCs. hist(vpc.cp, breaks = 50, col = "cyan") ## Plot sorted VPCs. plot(sort(vpc.cp), ylab = "Heritability (h2)", ylim = c(0,1), main = "Sorted CP VPC scores") abline(h = 0.9, lty = 2, col = "red") text(50, 0.92, "h2 = 0.9", col = "red")
Calculate the NB VPC for one or more features following the model fitting function fit.NB().
computeVPC.NB(para)
computeVPC.NB(para)
para |
A |
A matrix consisting of VPC for
features based on negative binomial mixed model. Column name
is "NB-fit"; row names are the feature names.
## Compute VPC for each feature under negative binomial mixed model. vpc.nb <- computeVPC.NB(para_nb) ## Visulize the distribution of the VPCs. hist(vpc.nb, breaks = 50, col = "cyan") ## Plot sorted VPCs. plot(sort(vpc.nb), ylab = "Heritability (h2)", ylim = c(0,1), main = "Sorted NB VPC scores") abline(h = 0.9, lty = 2, col = "red") text(50, 0.92, "h2 = 0.9", col = "red")
## Compute VPC for each feature under negative binomial mixed model. vpc.nb <- computeVPC.NB(para_nb) ## Visulize the distribution of the VPCs. hist(vpc.nb, breaks = 50, col = "cyan") ## Plot sorted VPCs. plot(sort(vpc.nb), ylab = "Heritability (h2)", ylim = c(0,1), main = "Sorted NB VPC scores") abline(h = 0.9, lty = 2, col = "red") text(50, 0.92, "h2 = 0.9", col = "red")
Fit a CPMM for one or more features and output the fit parameters. It is used before the function computeVPC.CP(). This function also allows to test the presence of heritability via random effect variance of the model.
fit.CP(CountMatrix, Strains, test = FALSE, optimizer = "nlminb")
fit.CP(CountMatrix, Strains, test = FALSE, optimizer = "nlminb")
CountMatrix |
Sequencing count matrix for one or more features. Each row is for one feature, and the columns are for samples. |
Strains |
Strain labels for the samples. |
test |
TRUE or FALSE (default). Test the presence of heritability
through examining the random effect variance |
optimizer |
A character string that determines which optimization routine is to be used. Possible choices are "nlminb" (default), "L-BFGS-B", and "bobyqa". |
A list with two objects. The first object is a
matrix indicating the fitted parameters for each
feature. The columns are ordered by intercept
, tweedie
parameter
, random effect variance
, and dispersion
. Row names are feature names. If the argument test is set to
be true, the second object of the list consists of p-values for testing
the hypothesis that random effects
;
otherwise, the second object is NULL.
## Fit CPMM for the first two features and test the presence of ## heritability. result.cp <- fit.CP(simData[1:2, ], strains, test = TRUE) ## Extract parameters para.cp <- result.cp[[1]] ## Extract p-values pval.cp <- result.cp[[2]]
## Fit CPMM for the first two features and test the presence of ## heritability. result.cp <- fit.CP(simData[1:2, ], strains, test = TRUE) ## Extract parameters para.cp <- result.cp[[1]] ## Extract p-values pval.cp <- result.cp[[2]]
Fit NBMM for one or more features and output the fit parameters. It is used before the function computeVPC.NB(). This function also allows to test the presence of heritability via random effect variance of the model. To fit a NBMM, the glmmADMB package is needed.
fit.NB(CountMatrix, Strains, test = FALSE)
fit.NB(CountMatrix, Strains, test = FALSE)
CountMatrix |
Sequencing count matrix for a list of features. Each row is for one feature, and the columns are for samples. |
Strains |
Strain labels for the samples. |
test |
TRUE or FALSE (default). Test the presence of heritability
through examining the random effect variance |
A list with two objects. The first object is a
matrix indicating the fitted parameters for each feature. The columns are
ordered by
.
Row names are feature names. If the argument test is set to
be true, the second object of the list consists of p-values for testing
the hypothesis that random effects
;
otherwise, the second object is NULL.
## Compute vpc for each feature under NBMM. This will take a while on the ## entire dataset. For the purpose of illustration, here we only fit on ## the first 2 features. library(glmmADMB) result.nb <- fit.NB(simData[1:2, ], strains)
## Compute vpc for each feature under NBMM. This will take a while on the ## entire dataset. For the purpose of illustration, here we only fit on ## the first 2 features. library(glmmADMB) result.nb <- fit.NB(simData[1:2, ], strains)
Fit the Gaussian-like data to LMM and compute the VPC values for one or more features.
fitComputeVPC.lmer(CountMatrix, Strains, PriorWeights = NULL, test = FALSE, VPCname = "LMM")
fitComputeVPC.lmer(CountMatrix, Strains, PriorWeights = NULL, test = FALSE, VPCname = "LMM")
CountMatrix |
Sequencing count matrix for one or more features. Each row is for one feature, and the columns are for samples. |
Strains |
Strain labels for the samples. |
PriorWeights |
Weights used in the lmer function in the package lme4. It is an optional vector used in the fitting process. |
test |
TRUE or FALSE (default). Test the presence of heritability
through examining the random effect variance |
VPCname |
Name of the VPC result, default = "LMM". |
A list with two objects. The first object is a
vector indicating the variance partition coefficients
(VPC). If the argument test is set to be true, the second object of
the list consists of p-values for testing the hypothesis that random
effects
; otherwise, the second
object is NULL.
## Compute VPC for the first two features under linear mixed models for Gaussian-like datasets. ## Provide normalized data and include hypothesis testing on presence of ## heritability: result.vst <- fitComputeVPC.lmer(simData_vst[1:2,], strains, test = TRUE) ## Extract parameters vpc.vst <- result.vst[[1]] ## Extract p-values pval.vst <- result.vst[[2]] ## Visulize the distribution of p-values. hist(pval.vst, breaks = 30, col = "cyan")
## Compute VPC for the first two features under linear mixed models for Gaussian-like datasets. ## Provide normalized data and include hypothesis testing on presence of ## heritability: result.vst <- fitComputeVPC.lmer(simData_vst[1:2,], strains, test = TRUE) ## Extract parameters vpc.vst <- result.vst[[1]] ## Extract p-values pval.vst <- result.vst[[2]] ## Visulize the distribution of p-values. hist(pval.vst, breaks = 30, col = "cyan")
Compute VPC CI based on parametric bootstrap for one or more features.
getBootCI(CountMatrix, Strains, which.features, num.boot, method = "NB-fit", alpha = 0.05, optimizer = "nlminb")
getBootCI(CountMatrix, Strains, which.features, num.boot, method = "NB-fit", alpha = 0.05, optimizer = "nlminb")
CountMatrix |
A |
Strains |
A |
which.features |
A |
num.boot |
Number of bootstraps. |
method |
Which method should be used, "CP-fit", "NB-fit" (default), or "VST". "VST" method bootstraps data under negative binomial mixed models. |
alpha |
A numerical value between 0 and 1, indicating the significance
level of the CI. The CI will be |
optimizer |
A character string that determines which optimization routine is to be used. It is only used for method = "CP-fit". Possible choices are "nlminb" (default), "L-BFGS-B", and "bobyqa". |
A list of two objects. The first object is a
matrix containing the CI. The second object consists of a
num.boot matrix of all bootsrapped VPC values.
## Compute CI based on 100 bootstrap samples for the first feature ## under NBMM. It takes a few minutes. NBboot <- getBootCI(simData, strains, 1, 100) ## Extract CI NBboot.ci <- NBboot[[1]] ## Extract vpcs NBboot.vpc <- NBboot[[2]] ## Compute CI based on 100 bootstrap samples for the first feature ## under vst. VSTboot <- getBootCI(simData, strains, 1, 100, method = "VST")
## Compute CI based on 100 bootstrap samples for the first feature ## under NBMM. It takes a few minutes. NBboot <- getBootCI(simData, strains, 1, 100) ## Extract CI NBboot.ci <- NBboot[[1]] ## Extract vpcs NBboot.vpc <- NBboot[[2]] ## Compute CI based on 100 bootstrap samples for the first feature ## under vst. VSTboot <- getBootCI(simData, strains, 1, 100, method = "VST")
Simulate a (possibly unbalanced) read matrix from CPMM.
For a compound Poisson (CP) random variable with mean
, its variance can be expressed as
, for some
. Under the CPMM, with
a
-link, the regression on the mean has the form:
getReadMatrix.CP(vec.num.rep, alphas, sigma2s, ps, phis)
getReadMatrix.CP(vec.num.rep, alphas, sigma2s, ps, phis)
vec.num.rep |
A vector of replicate numbers for each strain. |
alphas |
Intercept vector |
sigma2s |
Random effect variance vector |
ps |
Tweedie parameter in CP models, |
phis |
Dispersion parameter in CP models, |
A matrix with CP reads.
is the
total number of samples;
is the number of features. Column names
are sample names of the form "Ss_r", where S stands for sample, s is the
strain number, r is the replicate number within the strain. Row names
are the feature names of the form "Gene g", where g is the feature index.
## Generate a sequencing dataset with 5 features and 6 strains. ## Assign parameter values. rep.num <- c(3, 5, 2, 3, 4, 2) a0s <- c(-1, 1, 2, 5, 10) sig2s <- c(10, 0.2, 0.1, 0.03, 0.01) ps <- rep(1.5, 5) phis <- c(1.5, 1, 0.5, 0.1, 0.1) set.seed(1234) ## Generate reads: cpData <- getReadMatrix.CP(rep.num, a0s, sig2s, ps, phis) ## Generate strain names: str <- sapply(1:length(rep.num), function(x){ str.x <- paste0("S", x) return(rep(str.x, rep.num[x])) }) str <- do.call(c, str)
## Generate a sequencing dataset with 5 features and 6 strains. ## Assign parameter values. rep.num <- c(3, 5, 2, 3, 4, 2) a0s <- c(-1, 1, 2, 5, 10) sig2s <- c(10, 0.2, 0.1, 0.03, 0.01) ps <- rep(1.5, 5) phis <- c(1.5, 1, 0.5, 0.1, 0.1) set.seed(1234) ## Generate reads: cpData <- getReadMatrix.CP(rep.num, a0s, sig2s, ps, phis) ## Generate strain names: str <- sapply(1:length(rep.num), function(x){ str.x <- paste0("S", x) return(rep(str.x, rep.num[x])) }) str <- do.call(c, str)
Simulate a (possibly unbalanced) count matrix from NBMM.
Under NBMM, an observed number of reads aligned to feature/gene ,
, follows a negative binomial (NB) distribution with mean
and variance
, where
is the dispersion parameter, shared across strains. The
generalized linear model uses a
-link:
getReadMatrix.NB(vec.num.rep, alphas, sigma2s, phis)
getReadMatrix.NB(vec.num.rep, alphas, sigma2s, phis)
vec.num.rep |
A vector of replicate numbers for each strain. |
alphas |
Intercept vector |
sigma2s |
Random effect variance vector |
phis |
Dispersion parameter in NB models, |
A matrix with NB reads.
is the
total number of samples;
is the number of features. Column names
are sample names of the form "Ss_r", where S stands for sample, s is the
strain number, r is the replicate number within the strain. Row names
are the feature names of the form "Gene g", where g is the feature index.
## Generate a sequencing dataset with 5 features and 6 strains. ## Assign parameter values. rep.num <- c(3, 5, 2, 3, 4, 2) a0s <- c(-1, 1, 2, 5, 10) sig2s <- c(10, 0.2, 0.1, 0.03, 0.01) phis <- c(0.5, 1, 0.05, 0.01, 0.1) set.seed(1234) ## Generate reads: nbData <- getReadMatrix.NB(rep.num, a0s, sig2s, phis)
## Generate a sequencing dataset with 5 features and 6 strains. ## Assign parameter values. rep.num <- c(3, 5, 2, 3, 4, 2) a0s <- c(-1, 1, 2, 5, 10) sig2s <- c(10, 0.2, 0.1, 0.03, 0.01) phis <- c(0.5, 1, 0.05, 0.01, 0.1) set.seed(1234) ## Generate reads: nbData <- getReadMatrix.NB(rep.num, a0s, sig2s, phis)
Parameter matrix obtained from simData by fitting CPMM.
para_cp
para_cp
An object of class matrix
with 100 rows and 4 columns.
Parameter matrix obtained from simData by fitting NBMM.
para_nb
para_nb
An object of class matrix
with 100 rows and 3 columns.
A matrix containing simulated counts for 100 features (rows) and 175 samples (columns)
simData
simData
A matrix with 100 rows and 175 columns
Voom transformed version of simData.
simData_voom
simData_voom
An object of class matrix
with 881 rows and 175 columns.
Variance stabilize transformed version of simData.
simData_vst
simData_vst
An object of class matrix
with 881 rows and 175 columns.
List of strain names for the samples.
strains
strains
An object of class character
of length 175.
Weights used in the voom transformation.
weights_voom
weights_voom
An object of class matrix
with 881 rows and 175 columns.