Iteratively reweighted least squares (IRLS) procedure in CS-CORE (Rcpp)
CSCORE_IRLS_cpp.Rd
Estimate and test cell-type-specific co-expression using the IRLS procedure with optional covariate adjustment.
Source
Su, C., Xu, Z., Shan, X., Cai, B., Zhao, H., & Zhang, J. (2023). Cell-type-specific co-expression inference from single cell RNA-sequencing data. Nature Communications. doi: https://doi.org/10.1038/s41467-023-40503-7
Arguments
- X
A numeric matrix of UMI counts (
n x p
), wheren
is the number of cells andp
is the number of genes.- seq_depth
A numeric vector of sequencing depths of length
n
.- covariates
Optional. A numeric matrix of covariates (
n x K
) to be adjusted for in the moment-based regressions. Can be a length n vector if \(K=1\). IfNULL
, no covariates are adjusted for in the regression. Defaults toNULL
.- post_process
Optional. Logical; whether to rescale the estimated co-expressions to lie between –1 and 1. Defaults to
TRUE
.- covariate_level
Optional. A character string indicating whether covariates are assumed to affect the underlying gene expression levels (
"z"
) or the observed counts ("x"
). See the Details section for further explanation. Defaults to"z"
.- adjust_setting
Optional. A named logical vector of length 3; whether to adjust for covariates at the mean, variance, and covariance level. Must be named
c("mean", "var", "covar")
. Defaults toc(mean = T,var = T, covar = T)
.- IRLS_par
Optional. A named list of length 4 specifying parameters for the IRLS algorithm:
n_iter
Maximum number of iterations.
eps
Convergence threshold for log-ratio change
delta
, computed asabs(log(theta / theta_prev))
.verbose
Logical; whether to print the convergence metric (
delta
) at each iteration.conv
Character string; determine convergence based on
q95
(0.95 quantile) ormax
ofdelta
.
Defaults to
list(n_iter = 10, eps = 0.05, verbose = FALSE, cov = "q95")
.- return_all
Logical; whether to return all estimates, including the effect sizes for covariates. Defaults to
FALSE
.
Value
A list containing the following components:
- est
A \(p \times p\) matrix of co-expression estimates.
- p_value
A \(p \times p\) matrix of p-values.
- test_stat
A \(p \times p\) matrix of test statistics evaluating the statistical significance of co-expression.
- mu_beta
A \(k \times p\) matrix of regression coefficients from the mean model. Returned if
return_all = TRUE
.- sigma2_beta
A \(k \times p\) matrix of regression coefficients from the variance model. Returned if
return_all = TRUE
.- cov_beta
A \(k \times p \times p\) array of regression coefficients from the covariance model. Returned if
return_all = TRUE
.
Details
Let \(x_{ij}\) denote the UMI count of gene \(j\) in cell \(i\); \(s_i\) denote the sequencing depth; \(\mu_j,\sigma_{jj}, \sigma_{jj'}\) denote the mean, variance and covariance; \(c_{ik}\) denote additional covariate \(k\) for cell \(i\) (e.g. disease status or cellular state). The procedure consists of two main steps:
Mean and variance estimation: Estimate gene-specific mean and variance parameters using two moment-based regressions:
Mean model: \(x_{ij} = s_i (\mu_j + \sum_k c_{ik} \beta_k) + \epsilon_{ij}\)
Variance model: \((x_{ij} - s_i \mu_{ij})^2 = s_i \mu_{ij} + s_i^2 (\sigma_{jj} + \sum_k c_{ik} \gamma_k) + \eta_{ij}\), where \(\mu_{ij} = \mu_j + \sum_k c_{ik} \beta_k\)
Covariance estimation and hypothesis testing: Estimate gene-gene covariance and compute test statistics to assess the statistical significance of gene co-expression using a third moment-based regression:
Covariance model: \((x_{ij} - s_i \mu_{ij})(x_{ij'} - s_i \mu_{ij'}) = s_i^2 (\sigma_{jj'} + \sum_k c_{ik} \theta_k) + \xi_{ijj'}\)
We note that
The formulation above assumes that the covariates alter the mean / variance / covariance of underlying gene expression, rather than observed counts. If you believe that the covariates directly affect the observed counts independent of underlying gene expression (e.g. \(x_{ij}|z_{ij} \sim \text{Poisson}(s_i z_{ij}+\sum_k c_{ik} \beta_k)\) or \(x_{ij} = s_i \mu_j + \sum_k c_{ik} \beta_k + \epsilon_{ij}\)), please specify
covariate_level="x"
.The original CS-CORE published in https://doi.org/10.1038/s41467-023-40503-7 did not consider adjusting for covariates \(c_{ik}\)'s. This is equivalent to setting
covariates
toNULL
.
Note: This is an R wrapper for the CSCORE_IRLS_cpp_impl()
function implemented in Rcpp.
Examples
## Toy example:
## run CSCORE on a simulated independent gene pair
cscore_example <- CSCORE_IRLS_cpp(ind_gene_pair$counts, ind_gene_pair$seq_depths)
#> [INFO] IRLS converged after 2 iterations.
#> [INFO] Starting WLS for covariance at Tue Jul 1 21:41:10 2025
#> [INFO] 0.0000% co-expression estimates were greater than 1 and were set to 1.
#> [INFO] 0.0000% co-expression estimates were smaller than -1 and were set to -1.
#> [INFO] Finished WLS. Elapsed time: 0.0001 seconds.
## Estimated co-expression between two genes
cscore_example$est[1,2]
#> [1] 0.007820124
# close to 0: 0.007820124
## p-values
cscore_example$p_value[1,2]
#> [1] 0.961981
# not significant: 0.961981