R/simulate_bulk.R
bulk_generator.Rd
Generate artificial bulk RNA-seq samples with random or pre-defined cell-type proportions for benchmarking deconvolution algorithms
bulk_generator(
ref,
phenodata,
num_mixtures = 500,
num_mixtures_sprop = 10,
pool_size = 100,
seed = 1234,
prop = NULL,
replace = FALSE
)
a matrix-like object of gene expression values with rows representing genes, columns representing cells.
a data.frame with rows representing cells, columns representing cell attributes. It should at least contain the first two columns as:
cell barcodes
cell types
total number of simulated bulk samples. Have to be multiple of num_mixtures_sprop
. Default to 500.
number of simulated bulk samples with the same simulated cell type proportions. Only applicable when prop
is not specified.
Those samples will be used to estimate bias & variance. Default to 10.
number of cells to use to construct each artificial bulk sample. Default to 100.
seed to use for simulation. Default to 1234.
a data.frame with two columns. The first column includes unique cell types in phenodata; the second column includes cell type proportions. If specified, bulk samples will be simulated based on the specified cell proportions.
logical value indicating whether to sample cells with replacement. Default to FALSE, to sample cells without replacement.
a list of two objects:
simulated bulk RNA-seq data, with rows representing genes, columns representing samples
cell type proportions used to simulate the bulk RNA-seq data, with rows representing cell types, columns representing samples
If prop
is not specified, cell type proportions will be firstly randomly generated with at least two cell types present. Then, for each cell proportion
vector, num_mixtures_sprop
number of samples is simulated. Eventually, a total of num_mixtures
number of samples is simulated. If prop is
specified, then a total of num_mixtures
number of samples will be simulated based on the same cell proportion vector specified.
if (FALSE) {
ref_list <- c(paste0(system.file("extdata", package = "SCdeconR"), "/refdata/sample1"),
paste0(system.file("extdata", package = "SCdeconR"), "/refdata/sample2"))
phenopath1 <- paste0(system.file("extdata", package = "SCdeconR"),
"/refdata/phenodata_sample1.txt")
phenopath2 <- paste0(system.file("extdata", package = "SCdeconR"),
"/refdata/phenodata_sample2.txt")
phenodata_list <- c(phenopath1,phenopath2)
# construct integrated reference using harmony algorithm
refdata <- construct_ref(ref_list = ref_list,
phenodata_list = phenodata_list,
data_type = "cellranger",
method = "harmony",
group_var = "subjectid",
nfeature_rna = 50,
vars_to_regress = "percent_mt", verbose = FALSE)
phenodata <- data.frame(cellid = colnames(refdata),
celltypes = refdata$celltype,
subjectid = refdata$subjectid)
bulk_sim <- bulk_generator(ref = GetAssayData(refdata, slot = "data", assay = "SCT"),
phenodata = phenodata,
num_mixtures = 20,
num_mixtures_sprop = 1)
}