Preprocess RNA-seq data

Collect filtered raw counts, normalized sample-level data, phenotypic data, RNA-seq metadata, covariates, and outliers associated with a given tissue.

Usage

transcript_prep_data(
  tissue,
  sex = NULL,
  covariates = c("pct_globin", "RIN", "pct_umi_dup", "median_5_3_bias"),
  outliers = na.omit(MotrpacRatTraining6moData::OUTLIERS$viallabel),
  adjust_covariates = TRUE,
  center_scale = FALSE
)

Arguments

tissue: character, tissue abbreviation, one of MotrpacRatTraining6moData::TISSUE_ABBREV
sex: character, one of 'male' or 'female'
covariates: character vector of covariates that correspond to column names of MotrpacRatTraining6moData::TRNSCRPT_META. Defaults to covariates that were used for the manuscript.
outliers: vector of viallabels to exclude from the returned data. Defaults to [MotrpacRatTraining6moData::OUTLIERS]$viallabel
adjust_covariates: boolean, whether to adjust covariates using fix_covariates(). Only applies if covariates is not NULL.
center_scale: boolean, whether to center and scale continuous covariates within fix_covariates(). Only applies if adjust_covariates is also TRUE.

Value

named list of five items:

metadata: data frame of combined MotrpacRatTraining6moData::PHENO and MotrpacRatTraining6moData::TRNSCRPT_META, filtered to samples in tissue. If adjust_covariates = TRUE, missing values in covariates are imputed. If also center_scale = TRUE, continuous variables named by covariates are centered and scaled.
covariates: character vector of covariates to adjust for during differential analysis. For all tissues except VENACV, this vector is a (sub)set of the input list of covariates. Covariates are removed from this vector if there are too many missing values or if all values are constant. See fix_covariates() for more details. If tissue = "VENACV", the Ensembl ID for Ucp1 is also added as a covariate.
counts: data frame of raw counts with Ensembl IDs (which are also TRNSCRPT feature_IDs) as row names and vial labels as column names. See MotrpacRatTraining6moData::TRNSCRPT_RAW_COUNTS for details.
norm_data: data frame of TMM-normalized data with Ensembl IDs (which are also TRNSCRPT feature_IDs) as row names and vial labels as column names. See MotrpacRatTraining6moData::TRNSCRPT_NORM_DATA for details.
outliers: subset of outliers in input removed from the data

Examples

# Process gastrocnemius RNA-seq data with default parameters, i.e., return data from both 
# sexes, remove established outliers, impute missing values in default covariates 
gastroc_data1 = transcript_prep_data("SKM-GN")
#> TRNSCRPT_SKMGN_RAW_COUNTS
#> TRNSCRPT_SKMGN_NORM_DATA

# Same as above but do not remove outliers if they exist 
gastroc_data2 = transcript_prep_data("SKM-GN", outliers = NULL)
#> TRNSCRPT_SKMGN_RAW_COUNTS
#> TRNSCRPT_SKMGN_NORM_DATA

# Same as above but do not adjust existing variables in the metadata  
gastroc_data3 = transcript_prep_data("SKM-GN", covariates = NULL, outliers = NULL)
#> TRNSCRPT_SKMGN_RAW_COUNTS
#> TRNSCRPT_SKMGN_NORM_DATA

# Same as above but only return data from male samples
gastroc_data4 = transcript_prep_data("SKM-GN", covariates = NULL, outliers = NULL, sex = "male")
#> TRNSCRPT_SKMGN_RAW_COUNTS
#> TRNSCRPT_SKMGN_NORM_DATA

# Same as gastroc_data2 but also center and scale default continuous covariates 
# in the returned metadata, which is also done within [run_deseq()] 
# (called by [transcript_timewise_dea()]) 
gastroc_data4 = transcript_prep_data("SKM-GN", outliers = NULL, center_scale = TRUE)
#> TRNSCRPT_SKMGN_RAW_COUNTS
#> TRNSCRPT_SKMGN_NORM_DATA

Usage

Arguments

Value

See also

Examples