Skip to contents

Wrapper for multi-tissue, multi-omic pathway enrichment of clustering or graphical results with a user-supplied list of pathways. Pathway enrichment is performed using a hypergeometric test separately for each unique combination of tissue, assay/ome, and cluster.

Usage

custom_cluster_pathway_enrichment(
  cluster_res,
  pathway_member_list,
  source = "custom",
  feature_to_gene = MotrpacRatTraining6moData::FEATURE_TO_GENE_FILT,
  gene_identifier_type = "gene_symbol",
  universe = MotrpacRatTraining6moData::GENE_UNIVERSES$gene_symbol,
  add_ensembl_intersection = TRUE,
  min_input_set_size = 1,
  min_pw_set_size = 10,
  max_pw_set_size = 200,
  adjust_p = TRUE,
  num_cores = NULL,
  logfile = "/dev/null"
)

Arguments

cluster_res

Either a data frame or a list of lists. If a data frame, it needs at least two columns: "feature" and "cluster". The "feature" column should be in the format 'MotrpacRatTraining6moData::ASSAY_ABBREV;MotrpacRatTraining6moData::TISSUE_ABBREV;feature_ID'. If a list of lists, each sublist must be named with the cluster name (character string), and the values must be features in the format 'MotrpacRatTraining6moData::ASSAY_ABBREV;MotrpacRatTraining6moData::TISSUE_ABBREV;feature_ID'.

pathway_member_list

named list of character vectors where names are pathway names and values are pathway members. Pathway members must match values in the gene_identifer_type column of feature_to_gene.

source

optional character string to define the source of pathway_member_list

feature_to_gene

data frame, map between intersection_id_type and gene symbols. Columns must include "feature_ID", "gene_symbol", "ensembl_gene", and "kegg_id". MotrpacRatTraining6moData::FEATURE_TO_GENE_FILT by default.

gene_identifier_type

character, column in feature_to_gene that matches the gene identifier type in universe. "gene_symbol" by default.

universe

list of lists of character vectors, named first by assay (i.e., MotrpacRatTraining6moData::ASSAY_ABBREV) and then by tissue (i.e., MotrpacRatTraining6moData::TISSUE_ABBREV). Vectors provide the full set of gene symbols associated with features tested during differential analysis. For example, universe$TRNSCRPT$LUNG should be a character vector of expressed genes in the lung, where the type of gene identifier matches gene_identifier_type. [MotrpacRatTraining6moData::GENE_UNIVERSES]$gene_symbol by default.

add_ensembl_intersection

bool, whether to add a intersection_ensembl column, which converts gene IDs in the intersection to Ensembl IDs

min_input_set_size

integer, input must have this minimum number of unique mappable gene IDs to attempt enrichment

min_pw_set_size

integer, pathway must have at least this many members to attempt enrichment

max_pw_set_size

integer, pathway must have no more than this many members to attempt enrichment

adjust_p

boolean, whether to adjust nominal p-values for multiple testing (IHW by tissue)

num_cores

optional integer, number of cores to register if parallel computing is desired

logfile

optional character, path to log of failed iterations

Value

data frame with enrichment results, or NULL if no enrichment results were returned:

term_size

integer, number of genes that are annotated to the term

query_size

integer, number of genes that were included in the query

intersection_size

integer, the number of genes in the input query that are annotated to the corresponding term

term_id

character, unique term/pathway identifier

source

character, the abbreviation of the data source for the term/pathway

term_name

character, term/pathway name

effective_domain_size

integer, the total number of genes in the universe used for the hypergeometric test

intersection

character, input gene IDs that intersect with the term/pathway

computed_p_value

double, nominal hypergeometric p-value

cluster

character, cluster label

tissue

character, tissue abbreviation, one of MotrpacRatTraining6moData::TISSUE_ABBREV

ome

character, assay abbreviation, one of MotrpacRatTraining6moData::ASSAY_ABBREV

adj_p_value

double, adjusted nominal p-value computed_p_value using IHW with tissue as a covariate

Examples

# Use graphical clusters as an example
cluster_res = extract_main_clusters()
# Pick a single graphical cluster
# Gastrocnemius features up-regulated in both males and females at 8 weeks of training
cluster_res = cluster_res[cluster_res$cluster == "SKM-GN:8w_F1_M1",]

# Make a toy pathway member list with human gene symbols
pathways = list("TCA cycle" = c('ACO2','CS','FH','MDH1','OGDH','PDHA1','PDHA2','SDHC','SUCLG1'))

# Convert human gene symbols to rat gene symbols 
data("RAT_TO_HUMAN_GENE", package = "MotrpacRatTraining6moData")
for (pw in names(pathways)){
  newmembers = c()
  for (m in pathways[[pw]]){
    # get rat symbol
    rat = RAT_TO_HUMAN_GENE$RAT_SYMBOL[RAT_TO_HUMAN_GENE$HUMAN_ORTHOLOG_SYMBOL == m][1]
    if(!is.na(rat)){
      newmembers = c(newmembers, rat)
    }
  }
  pathways[[pw]] = newmembers
}

# Perform pathway enrichment 
custom_cluster_pathway_enrichment(cluster_res, 
                                  pathway_member_list = pathways,
                                  min_pw_set_size = 1)
#> Number of cores not defined with 'num_cores' argument. Setting the number of cores to 1. Explicity define 'num_cores' to take advantage of parallelization. If you aren't sure how many cores are available to you, try 'length(parallel::mcaffinity()).'
#> Running enrichments for 6 gene sets on 1 core(s). This may take a while...
#> 
#> 6 out of 38 identifiers in SKM-GN:8w_F1_M1 ATAC SKM-GN are not in the current universe. Removing these features.
#> [1] "Edn2"           "AABR07030921.1" "U6"             "AABR07004761.1"
#> [5] "AABR07062157.1" "Y_RNA"         
#> 
#> 2 out of 5 identifiers in SKM-GN:8w_F1_M1 METHYL SKM-GN are not in the current universe. Removing these features.
#> [1] "Mir196a"        "AABR07002473.1"
#> Done.
#>   term_size query_size intersection_size   term_id source term_name
#> 1         6        121                 2 TCA_cycle custom TCA cycle
#> 2         9        339                 6 TCA_cycle custom TCA cycle
#>   effective_domain_size                 intersection computed_p_value
#> 1                  3920                      Cs,Mdh1     1.306742e-02
#> 2                  5756 Cs,Fh,Mdh1,Ogdh,Pdha1,Suclg1     2.885974e-06
#>                                                                                                intersection_ensembl
#> 1                                                                             ENSRNOG00000008103,ENSRNOG00000023520
#> 2 ENSRNOG00000003653,ENSRNOG00000008103,ENSRNOG00000025383,ENSRNOG00000005587,ENSRNOG00000023520,ENSRNOG00000005130
#>           cluster tissue     ome  adj_p_value
#> 1 SKM-GN:8w_F1_M1 SKM-GN PHOSPHO 1.306742e-02
#> 2 SKM-GN:8w_F1_M1 SKM-GN    PROT 5.771947e-06