Integration of scRNA-seq or snRNA-seq data using either harmony or seurat.

construct_ref(
  ref_list,
  phenodata_list,
  data_type = c("cellranger", "h5", "matrix"),
  method = c("harmony", "seurat"),
  group_var,
  nfeature_rna = 200,
  percent_mt = 40,
  vars_to_regress = c("percent_mt", "phase"),
  ex_features = NULL,
  cluster = TRUE,
  resolution = 0.8,
  verbose = TRUE,
  ...
)

Arguments

ref_list

a character vector of data paths to scRNA-seq/snRNA-seq. See data_type for accepted data types.

phenodata_list

a character vector of data paths to metadata for elements in ref_list. All metadata within phenodata_list should have consistent column names. Columns represent cell attributes, such as cell type, rows represent cells. Each element in phenodata_list should at least contain the first two columns as:

  1. cell barcodes

  2. cell types

data_type

data type of the input scRNA-seq/snRNA-seq data. Could be either a single character value from "cellranger", "h5", "matrix", or a vector/list of values with the same length as ref_list indicating the data type for each element.

method

character value specifying the method to use. Has to be one of "harmony" or "seurat". See details for more information.

group_var

a vector of character values indicating which variables within phenodata_list metadata to use for integration. Only applicable when method is set to "harmony".

nfeature_rna

minimum # of features with non-zero UMIs. Cells with # of features lower than nfeature_rna will be removed. Default to 200.

percent_mt

maximum percentage of mitochondria (MT) mapped UMIs. Cells with MT percentage higher than percent_mt will be removed. Default to 40.

vars_to_regress

a list of character values indicating the variables to regress for SCTransform normalization step. Default is to regress out MT percentage ("percent_mt") & cell cycle effects ("phase")

ex_features

a vector of character values indicating genes to exclude from anchor features. Those genes will not be considered as anchor features for integration, but will still be present in the integrated data.

cluster

logical value indicating whether to perform clustering on the integrated data. If TRUE, unsupervised clustering will be performed, and the results will be saved in "seurat_clusters" metadata in the output Seurat object.

resolution

numeric value specifying resolution to use when cluster is set to TRUE.

verbose

logical value indicating whether to print messages.

...

additional parameters passed to SCTransform.

Value

a Seurat-class object.

Details

data_type can be chosen from:

cellranger

path to a directory containing the matrix.mtx, genes.tsv (or features.tsv), and barcodes.tsv files outputted by 10x's cell-ranger

h5

path to .h5 file outputted by 10x's cell-ranger

matrix

path to a matrix-like file, with rows representing genes, columns representing cells.

SCTransform with vst.flavor = "v2" is used for normalization of individual data. Integration methods can be chosen from either "harmony" or "seurat". Harmony typically is more memory efficient and, recommended if you have large # of cells for integration.

Examples

if (FALSE) {
## random subset of two scRNA-seq datasets for breast tissue
ref_list <- c(paste0(system.file("extdata", package = "SCdeconR"), "/refdata/sample1"),
              paste0(system.file("extdata", package = "SCdeconR"), "/refdata/sample2"))
phenopath1 <- paste0(system.file("extdata", package = "SCdeconR"),
"/refdata/phenodata_sample1.txt")
phenopath2 <- paste0(system.file("extdata", package = "SCdeconR"),
"/refdata/phenodata_sample2.txt")
phenodata_list <- c(phenopath1,phenopath2)

## Register backend for parallel processing
registerDoFuture()
plan("multisession", workers = 4)

## construct integrated reference data
refdata <- construct_ref(ref_list = ref_list,
                         phenodata_list = phenodata_list,
                         data_type = "cellranger",
                         method = "harmony",
                         group_var = "subjectid",
                         nfeature_rna = 50,
                         vars_to_regress = "percent_mt")
}