Skip to contents

Identify samples that fall outside of the specified range for principal components that explain some minimum variance.

Usage

call_pca_outliers(
  norm,
  min_pc_ve,
  scale = TRUE,
  plot = TRUE,
  verbose = TRUE,
  iqr_coef = 3,
  M = Inf,
  title = NULL
)

Arguments

norm

feature by sample data frame of normalized data

min_pc_ve

numeric, minimum percent variance explained by a PC to check it for outliers

scale

bool, whether to scale input data before PCA. TRUE by default.

plot

bool, whether to print PC plots before and after removing outliers. TRUE by default.

verbose

bool, whether to print descriptive strings. TRUE by default.

iqr_coef

numeric, flag PC outliers if they are outside of IQR * iqr_coef

M

integer, select M most variable features

title

character, substring to include in PC plot titles

Value

named list of four items:

pca_outliers

character vector of viallabels identified as outliers

prcomp_obj

result returned by prcomp() from PCA of normalized data without outliers removed

num_pcs

integer, number of PCs checked for outliers

pc_outliers_report

matrix of results with one row per outlier

Examples

bat_rna_data = transcript_prep_data("BAT", covariates = NULL, outliers = NULL)
#> TRNSCRPT_BAT_RAW_COUNTS
#> TRNSCRPT_BAT_NORM_DATA
bat_rna_outliers = call_pca_outliers(bat_rna_data$norm_data, 
                                     min_pc_ve=0.05, 
                                     iqr_coef=5, 
                                     M=1000, 
                                     title="Brown Adipose")
#>     PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8     PC9    PC10 
#> 0.17851 0.12381 0.09078 0.04275 0.02831 0.02600 0.02306 0.02091 0.02076 0.01946 
#> The first 3 PCs were selected to identify outliers.
#> PC outliers:
#>      PC    viallabel     PC value
#> [1,] "PC3" "90266016902" "23.213"
#> [2,] "PC3" "90416016902" "54.24" 
#> Plotting PCs with outliers flagged...

#> Plotting PCs with outliers removed...