Deconvolution of bulk RNA-seq data based on single-cell reference data. Eight bulk deconvolution methods, along with eight normalization methods and four transformation methods are available.
scdecon(
bulk,
ref,
phenodata,
filter_ref = TRUE,
marker_genes = NULL,
genes_to_remove = NULL,
min_pct_ct = 0.05,
decon_method = c("scaden", "CIBERSORT", "OLS", "nnls", "FARDEEP", "RLR", "MuSiC",
"SCDC", "scTAPE"),
norm_method_sc = c("none", "LogNormalize", "SCTransform", "scran", "scater", "Linnorm"),
norm_method_bulk = c("none", "TMM", "median_ratios", "TPM"),
trans_method_sc = c("none", "log2", "sqrt", "vst"),
trans_method_bulk = c("none", "log2", "sqrt", "vst"),
gene_length = NULL,
lfc_markers = log2(1.5),
marker_strategy = c("all", "pos_fc", "top_50p_logFC", "top_50p_AveExpr"),
to_remove = NULL,
ffpe_artifacts = FALSE,
model = NULL,
prop = NULL,
cibersortpath = NULL,
pythonpath = NULL,
tmpdir = NULL,
remove_tmpdir = TRUE,
seed = 1234,
nsamples = 1000,
return_value_only = FALSE,
verbose = FALSE
)
a matrix or data.frame of unnormalizaed & untransformed bulk RNA-seq gene expression values with rows representing genes, columns representing samples
a matrix or data.frame of untransformed scRNA-seq gene expression counts with rows representing genes, columns representing cells. This data will be used to deconvolute provided bulk RNA-seq data.
a data.frame with rows representing cells, columns representing cell attributes. It should at least contain the first three columns as:
cell barcodes
cell types
subject ids
logical value indicating whether outlier genes & cells should be removed from the provided reference data. Defaults to TRUE
a data.frame of two columns. First column represents cell types in ref; second column represents gene names of marker genes. If specified, those genes will be used to construct signature matrix for mark-gene based deconvolution methods, such as CIBERSORT, OLS, nnls, FARDEEP and RLR. Default to NULL, carry out differential analysis to identify marker genes for each cell type in ref.
a vector of gene names to remove from the reference scRNA-seq data. Default to NULL.
a numeric value indicating the minimum required proportion of expressing cells per cell type for marker gene identification. Only applicable when marker_genes is NULL. Default to 0.05.
character value specifying the deconvolution method to use. Has to be one of "scaden", "CIBERSORT", "OLS", "nnls", "FARDEEP", "RLR", "MuSiC", "SCDC", "scTAPE". See details for more information.
character value specifying the normalization method to use for reference data. Has to be one of "none","LogNormalize", "SCTransform", "scran", "scater", "Linnorm". See details for more information.
character value specifying the normalization method to use for bulk data. Has to be one of "none", "TMM", "median_ratios", "TPM". See details for more information.
character value specifying the transformation method to use for both bulk & reference data. Has to be one of "none", "log2", "sqrt", "vst". See details for more information.
character value specifying the transformation method to use for both bulk & reference data. Has to be one of "none", "log2", "sqrt", "vst". See details for more information.
a data.frame with two columns. The first column represents gene names that match with provided bulk data. The second column represents length of each gene. Only applicable when norm_method is selected as "TPM".
log2 fold change cutoff used to identify marker genes for deconvolution. The option only applicable to marker-gene based approaches, such as CIBERSORT, OLS, nnls, FARDEEP and RLR. Only applicable when marker_genes is NULL.
further strategy in selecting marker genes besides applying the log2 fold change cutoff. Can be chosen from: "all", "pos_fc", "top_50p_logFC" or "top_50p_AveExpr". See details for more information. Only applicable when marker_genes is NULL.
character value representing the cell type to remove from reference data. Only applicable to simulation experiments in evaluating effect of cell type removal from reference.
logical value indicating whether to add simulated ffpe artifacts in the bulk data. Only applicable to simulation experiments in evaluating the effect of FFPE artifacts.
pre-constructed ffpe model data. Can be downloaded from github: https://github.com/Liuy12/SCdeconR/blob/master/data/ffpemodel.rda.
a matrix or data.frame of simulated cell proportion values with rows representing cell types, columns representing samples. Only applicable to simulation experiments in evaluating the effect of cell type removal from reference.
full path to CIBERSORT.R script.
full path to python binary where scaden was installed with.
temporary processing directory for scaden or scTAPE.
a logical value indicating whether to remove tmpdir once scaden is completed. Default to TRUE.
random seed used for simulating FFPE artifacts. Only applicable when ffpe_artifacts is set to TRUE.
number of artificial bulk samples to simulate for scaden. Default to 1000.
return a list of values only without performing deconvolution. This could be helpful in cases where the user want to apply their own deconvolution algorithms. Default to FALSE.
a logical value indicating whether to print messages. Default to FALSE.
a list containing two or four elements.
a data.frame of predicted cell-type proportions, with rows representing cell types, columns representing samples.
a data.frame of fitting errors of the algorithm; first column represents sample names, second column represents RMSEs.
a data.frame of simulated cell proportion after removing the specified cell_type. Only applicable to simulation experiments.
a data.frame of marker genes used for deconvolution. Only applicable to marker-gene based deconvolution methods.
decon_method should be one of the following:
a deep learning based method using three multi-layer deep neural nets. To use scaden,
you need to firstly install scaden via pip
or conda
, the provide the python path to pythonpath
option.
a marker gene based support vectors regression approach. CIBERSOR does not allow redistribution. To use CIBERSORT,
you need to request the source code from the authors & provide the path of CIBERSORT.R script to cibersortpath
option.
ordinary least squares.
non-negative least squares.
robust regression using least trimmed squares
robust regression using an M estimator
multi-subject single-cell deconvolution
an ENSEMBLE method to integrate deconvolution results from different scRNA-seq datasets
Deep autoencoder based deconvolution
norm_method should be one of the following:
no normalization is performed.
LogNormalize
method from seurat.
TMM method from calcNormFactors
function from edgeR.
median ratio method from estimateSizeFactors,DESeqDataSet-method
function from DESeq2.
Transcript per million. TPM has to be chosen if ffpe_artifacts is set to TRUE.
SCTransform
method from Seurat.
computeSumFactors
method from scran.
librarySizeFactors
method from scater.
Linnorm
method from Linnorm.
trans_method should be one of the following:
no transformation is performed.
log2 transformation. 0.1 is added to the data to avoid logarithm of 0s.
square root transformation.
varianceStabilizingTransformation
method from DESeq2.
marker_strategy should be one of the following:
all genes passed fold change threshold will be used.
only genes with positive fold changes will be used.
only genes with top 50 percent positive fold changes will be used.
only genes with top 50 percent average expression will be used.
if (FALSE) {
ref_list <- c(paste0(system.file("extdata", package = "SCdeconR"), "/refdata/sample1"),
paste0(system.file("extdata", package = "SCdeconR"), "/refdata/sample2"))
phenopath1 <- paste0(system.file("extdata", package = "SCdeconR"),
"/refdata/phenodata_sample1.txt")
phenopath2 <- paste0(system.file("extdata", package = "SCdeconR"),
"/refdata/phenodata_sample2.txt")
phenodata_list <- c(phenopath1,phenopath2)
# construct integrated reference using harmony algorithm
refdata <- construct_ref(ref_list = ref_list,
phenodata_list = phenodata_list,
data_type = "cellranger",
method = "harmony",
group_var = "subjectid",
nfeature_rna = 50,
vars_to_regress = "percent_mt", verbose = FALSE)
phenodata <- data.frame(cellid = colnames(refdata),
celltypes = refdata$celltype,
subjectid = refdata$subjectid)
bulk_sim <- bulk_generator(ref = GetAssayData(refdata, slot = "data", assay = "SCT"),
phenodata = phenodata,
num_mixtures = 20,
num_mixtures_sprop = 1)
## perform deconvolution based on "OLS" algorithm
decon_res <- scdecon(bulk = bulk_sim[[1]],
ref = GetAssayData(refdata, slot = "data", assay = "SCT"),
phenodata = phenodata,
filter_ref = TRUE,
decon_method = "OLS",
norm_method = "none",
trans_method = "none",
marker_strategy = "all")
}