Package 'HeritSeq'

Title: Heritability of Gene Expression for Next-Generation Sequencing
Description: Statistical framework to analyze heritability of gene expression based on next-generation sequencing data and simulating sequencing reads. Variance partition coefficients (VPC) are computed using linear mixed effects and generalized linear mixed effects models. Compound Poisson and negative binomial models are included. Reference: Rudra, Pratyaydipta, et al. "Model based heritability scores for high-throughput sequencing data." BMC bioinformatics 18.1 (2017): 143.
Authors: W. Jenny Shi [aut, cre], Pamela Russell [aut], Pratyaydipta Rudra [aut, cre], Brian Vestal [aut, cre], Katerina Kechris [aut], Laura Saba [aut]
Maintainer: W. Jenny Shi <[email protected]>
License: GPL-2
Version: 1.0.1
Built: 2025-02-15 03:40:33 UTC
Source: https://github.com/kechrislab/heritseq

Help Index


Calculate the compound Poisson (CP) variance partition coefficient (VPC) for one or more features.

Description

Calculate the CP VPC for one or more features following the model fitting function fit.CP().

Usage

computeVPC.CP(para)

Arguments

para

A G×4G \times 4 matrix of CP fit parameters for GG features, G1G\geq 1. The column order is intercept αg\alpha_g, random effect σg2(σg20)\sigma_g^2 (\sigma_g^2\geq0), tweedie parameter pg(1<pg<2)p_g (1<p_g<2), and dispersion ϕ(ϕ>0)\phi (\phi>0).

Value

A G×1G \times 1 matrix consisting of VPC for G features based on compound Poisson mixed models. Column name is "CP-fit"; row names are the feature names.

Examples

## Compute VPC for each feature under compound Poisson mixed models. 
vpc.cp <- computeVPC.CP(para_cp) 

## Visulize the distribution of the VPCs. 
hist(vpc.cp, breaks = 50, col = "cyan")

## Plot sorted VPCs.
plot(sort(vpc.cp), ylab = "Heritability (h2)", ylim = c(0,1), main = "Sorted CP VPC scores")
abline(h = 0.9, lty = 2, col = "red")
text(50, 0.92, "h2 = 0.9", col = "red")

Calculate the negative binomial (NB) variance partition coefficient (VPC) for one or more features.

Description

Calculate the NB VPC for one or more features following the model fitting function fit.NB().

Usage

computeVPC.NB(para)

Arguments

para

A G×3G \times 3 matrix of negative binomial fit parameters for GG features, G1G\geq 1. The column order is intercept αg\alpha_g, random effect σg2(σg20)\sigma_g^2 (\sigma_g^2\geq0), and dispersion ϕ(ϕ>0)\phi (\phi>0).

Value

A G×1G \times 1 matrix consisting of VPC for GG features based on negative binomial mixed model. Column name is "NB-fit"; row names are the feature names.

Examples

## Compute VPC for each feature under negative binomial mixed model.
vpc.nb <- computeVPC.NB(para_nb)

## Visulize the distribution of the VPCs. 
hist(vpc.nb, breaks = 50, col = "cyan")

## Plot sorted VPCs.
plot(sort(vpc.nb), ylab = "Heritability (h2)", ylim = c(0,1), 
main = "Sorted NB VPC scores")
abline(h = 0.9, lty = 2, col = "red")
text(50, 0.92, "h2 = 0.9", col = "red")

Fit compound Poisson mixed effect models (CPMM) for one or more features.

Description

Fit a CPMM for one or more features and output the fit parameters. It is used before the function computeVPC.CP(). This function also allows to test the presence of heritability via random effect variance of the model.

Usage

fit.CP(CountMatrix, Strains, test = FALSE, optimizer = "nlminb")

Arguments

CountMatrix

Sequencing count matrix for one or more features. Each row is for one feature, and the columns are for samples.

Strains

Strain labels for the samples.

test

TRUE or FALSE (default). Test the presence of heritability through examining the random effect variance σg2=0\sigma_g^2 = 0.

optimizer

A character string that determines which optimization routine is to be used. Possible choices are "nlminb" (default), "L-BFGS-B", and "bobyqa".

Value

A list with two objects. The first object is a G×4G \times 4 matrix indicating the fitted parameters for each feature. The columns are ordered by intercept αg\alpha_g, tweedie parameter pgp_g, random effect variance σg2\sigma^2_g, and dispersion ϕg\phi_g. Row names are feature names. If the argument test is set to be true, the second object of the list consists of p-values for testing the hypothesis that random effects σa2=0\sigma_a^2 = 0; otherwise, the second object is NULL.

Examples

## Fit CPMM for the first two features and test the presence of 
## heritability. 
result.cp <- fit.CP(simData[1:2, ], strains, test = TRUE)
## Extract parameters
para.cp <- result.cp[[1]]
## Extract p-values
pval.cp <- result.cp[[2]]

Fit negative binomial mixed models (NBMM) for one or more features.

Description

Fit NBMM for one or more features and output the fit parameters. It is used before the function computeVPC.NB(). This function also allows to test the presence of heritability via random effect variance of the model. To fit a NBMM, the glmmADMB package is needed.

Usage

fit.NB(CountMatrix, Strains, test = FALSE)

Arguments

CountMatrix

Sequencing count matrix for a list of features. Each row is for one feature, and the columns are for samples.

Strains

Strain labels for the samples.

test

TRUE or FALSE (default). Test the presence of heritability through examining the random effect variance σg2=0\sigma_g^2 = 0.

Value

A list with two objects. The first object is a G×3G \times 3 matrix indicating the fitted parameters for each feature. The columns are ordered by αg,σg2,ϕg\alpha_g, \sigma_g^2, \phi_g. Row names are feature names. If the argument test is set to be true, the second object of the list consists of p-values for testing the hypothesis that random effects σa2=0\sigma_a^2 = 0; otherwise, the second object is NULL.

Examples

## Compute vpc for each feature under NBMM. This will take a while on the
##  entire dataset. For the purpose of illustration, here we only fit on 
##  the first 2 features.
library(glmmADMB)
result.nb <- fit.NB(simData[1:2, ], strains)

Fit linear mixed models (LMM) and compute the VPC values for one or more features.

Description

Fit the Gaussian-like data to LMM and compute the VPC values for one or more features.

Usage

fitComputeVPC.lmer(CountMatrix, Strains, PriorWeights = NULL, test = FALSE,
  VPCname = "LMM")

Arguments

CountMatrix

Sequencing count matrix for one or more features. Each row is for one feature, and the columns are for samples.

Strains

Strain labels for the samples.

PriorWeights

Weights used in the lmer function in the package lme4. It is an optional vector used in the fitting process.

test

TRUE or FALSE (default). Test the presence of heritability through examining the random effect variance σg2=0\sigma_g^2 = 0.

VPCname

Name of the VPC result, default = "LMM".

Value

A list with two objects. The first object is a 1×G1 \times G vector indicating the variance partition coefficients (VPC). If the argument test is set to be true, the second object of the list consists of p-values for testing the hypothesis that random effects σa2=0\sigma_a^2 = 0; otherwise, the second object is NULL.

Examples

## Compute VPC for the first two features under linear mixed models for Gaussian-like datasets. 

## Provide normalized data and include hypothesis testing on presence of
## heritability:
result.vst <- fitComputeVPC.lmer(simData_vst[1:2,], strains, test = TRUE)
## Extract parameters
vpc.vst <- result.vst[[1]]
## Extract p-values
pval.vst <- result.vst[[2]]

## Visulize the distribution of p-values.
hist(pval.vst, breaks = 30, col = "cyan")

Compute variance partition coefficition (VPC) confidence intervals (CI) for one or more features.

Description

Compute VPC CI based on parametric bootstrap for one or more features.

Usage

getBootCI(CountMatrix, Strains, which.features, num.boot, method = "NB-fit",
  alpha = 0.05, optimizer = "nlminb")

Arguments

CountMatrix

A G×NG\times N count matrix. GG is the number of features; NN is the total number of samples.

Strains

A 1×N1\times N vector of strain labels corresponding to each sample.

which.features

A 1×k1\times k vector of select feature numbers for which CI is desired. kGk\leq G.

num.boot

Number of bootstraps.

method

Which method should be used, "CP-fit", "NB-fit" (default), or "VST". "VST" method bootstraps data under negative binomial mixed models.

alpha

A numerical value between 0 and 1, indicating the significance level of the CI. The CI will be 100(1α)100*(1-\alpha) percent CI. Default value is 0.05.

optimizer

A character string that determines which optimization routine is to be used. It is only used for method = "CP-fit". Possible choices are "nlminb" (default), "L-BFGS-B", and "bobyqa".

Value

A list of two objects. The first object is a k×2k \times 2 matrix containing the CI. The second object consists of a k×k \timesnum.boot matrix of all bootsrapped VPC values.

Examples

## Compute CI based on 100 bootstrap samples for the first feature 
##  under NBMM. It takes a few minutes.
NBboot <- getBootCI(simData, strains, 1, 100)
## Extract CI
NBboot.ci <- NBboot[[1]]
## Extract vpcs
NBboot.vpc <- NBboot[[2]]

## Compute CI based on 100 bootstrap samples for the first feature
##  under vst. 
VSTboot <- getBootCI(simData, strains, 1, 100, method = "VST")

Simulate a read matrix from compound Poisson mixed effect models (CPMM).

Description

Simulate a (possibly unbalanced) read matrix from CPMM. For a compound Poisson (CP) random variable YgsrY_{gsr} with mean μgs\mu_{gs}, its variance can be expressed as ϕgμgspg\phi_g\mu_{gs}^{p_g}, for some 1<pg<21<p_g<2. Under the CPMM, with a log\log-link, the regression on the mean has the form:
log(μgs)=αg+bgs,    bgsN(0,σg2).\log(\mu_{gs}) = \alpha_g+ b_{gs}, \;\;b_{gs}\sim N(0, \sigma^2_g).

Usage

getReadMatrix.CP(vec.num.rep, alphas, sigma2s, ps, phis)

Arguments

vec.num.rep

A vector of replicate numbers for each strain.

alphas

Intercept vector αg\alpha_g's, 1×num.features1 \times \texttt{num.features}.

sigma2s

Random effect variance vector σg2\sigma^2_g's, 1×num.features1 \times \texttt{num.features}.

ps

Tweedie parameter in CP models, pgp_g's, a 1×num.features1 \times \texttt{num.features} vector.

phis

Dispersion parameter in CP models, ϕg\phi_g's, a 1×num.features1 \times \texttt{num.features} vector.

Value

A G×NG \times N matrix with CP reads. NN is the total number of samples; GG is the number of features. Column names are sample names of the form "Ss_r", where S stands for sample, s is the strain number, r is the replicate number within the strain. Row names are the feature names of the form "Gene g", where g is the feature index.

Examples

## Generate a sequencing dataset with 5 features and 6 strains. 
## Assign parameter values.
rep.num <- c(3, 5, 2, 3, 4, 2)
a0s <- c(-1, 1, 2, 5, 10)
sig2s <- c(10, 0.2, 0.1, 0.03, 0.01)
ps <- rep(1.5, 5)
phis <- c(1.5, 1, 0.5, 0.1, 0.1)

set.seed(1234)
## Generate reads:
cpData <- getReadMatrix.CP(rep.num, a0s, sig2s, ps, phis)
## Generate strain names:
str <- sapply(1:length(rep.num), function(x){
  str.x <- paste0("S", x)
  return(rep(str.x, rep.num[x]))
})
str <- do.call(c, str)

Simulate a count matrix from negative binomial mixed effect models (NBMM).

Description

Simulate a (possibly unbalanced) count matrix from NBMM. Under NBMM, an observed number of reads aligned to feature/gene gg, YgsrY_{gsr}, follows a negative binomial (NB) distribution with mean μgs\mu_{gs} and variance μgs+ϕgμgs2\mu_{gs}+\phi_{g} \mu_{gs}^2, where ϕg\phi_g is the dispersion parameter, shared across strains. The generalized linear model uses a log\log-link:
log(μgs)=αg+bgs,    bgsN(0,σg2).\log(\mu_{gs}) = \alpha_g+ b_{gs}, \;\;b_{gs}\sim N(0, \sigma^2_g).

Usage

getReadMatrix.NB(vec.num.rep, alphas, sigma2s, phis)

Arguments

vec.num.rep

A vector of replicate numbers for each strain.

alphas

Intercept vector αg\alpha_g's, 1×num.features1 \times \texttt{num.features}.

sigma2s

Random effect variance vector σg2\sigma^2_g's, 1×num.features1 \times \texttt{num.features}.

phis

Dispersion parameter in NB models, ϕg\phi_g's, a 1×num.features1 \times \texttt{num.features} vector.

Value

A G×NG \times N matrix with NB reads. NN is the total number of samples; GG is the number of features. Column names are sample names of the form "Ss_r", where S stands for sample, s is the strain number, r is the replicate number within the strain. Row names are the feature names of the form "Gene g", where g is the feature index.

Examples

## Generate a sequencing dataset with 5 features and 6 strains. 
## Assign parameter values.
rep.num <- c(3, 5, 2, 3, 4, 2)
a0s <- c(-1, 1, 2, 5, 10)
sig2s <- c(10, 0.2, 0.1, 0.03, 0.01)
phis <- c(0.5, 1, 0.05, 0.01, 0.1)

set.seed(1234)
## Generate reads:
nbData <- getReadMatrix.NB(rep.num, a0s, sig2s, phis)

Parameter matrix obtained from simData by fitting CPMM.

Description

Parameter matrix obtained from simData by fitting CPMM.

Usage

para_cp

Format

An object of class matrix with 100 rows and 4 columns.


Parameter matrix obtained from simData by fitting NBMM.

Description

Parameter matrix obtained from simData by fitting NBMM.

Usage

para_nb

Format

An object of class matrix with 100 rows and 3 columns.


A simulated sequencing dataset.

Description

A matrix containing simulated counts for 100 features (rows) and 175 samples (columns)

Usage

simData

Format

A matrix with 100 rows and 175 columns


Voom transformed version of simData.

Description

Voom transformed version of simData.

Usage

simData_voom

Format

An object of class matrix with 881 rows and 175 columns.


Variance stabilize transformed version of simData.

Description

Variance stabilize transformed version of simData.

Usage

simData_vst

Format

An object of class matrix with 881 rows and 175 columns.


List of strain names for the samples.

Description

List of strain names for the samples.

Usage

strains

Format

An object of class character of length 175.


Weights used in the voom transformation.

Description

Weights used in the voom transformation.

Usage

weights_voom

Format

An object of class matrix with 881 rows and 175 columns.