Custom pathway enrichment for graphical clusters

Wrapper for multi-tissue, multi-omic pathway enrichment of clustering or graphical results with a user-supplied list of pathways. Pathway enrichment is performed using a hypergeometric test separately for each unique combination of tissue, assay/ome, and cluster.

Usage

custom_cluster_pathway_enrichment(
  cluster_res,
  pathway_member_list,
  source = "custom",
  feature_to_gene = MotrpacRatTraining6moData::FEATURE_TO_GENE_FILT,
  gene_identifier_type = "gene_symbol",
  universe = MotrpacRatTraining6moData::GENE_UNIVERSES$gene_symbol,
  add_ensembl_intersection = TRUE,
  min_input_set_size = 1,
  min_pw_set_size = 10,
  max_pw_set_size = 200,
  adjust_p = TRUE,
  num_cores = NULL,
  logfile = "/dev/null"
)

Arguments

cluster_res: Either a data frame or a list of lists. If a data frame, it needs at least two columns: "feature" and "cluster". The "feature" column should be in the format 'MotrpacRatTraining6moData::ASSAY_ABBREV;MotrpacRatTraining6moData::TISSUE_ABBREV;feature_ID'. If a list of lists, each sublist must be named with the cluster name (character string), and the values must be features in the format 'MotrpacRatTraining6moData::ASSAY_ABBREV;MotrpacRatTraining6moData::TISSUE_ABBREV;feature_ID'.
pathway_member_list: named list of character vectors where names are pathway names and values are pathway members. Pathway members must match values in the gene_identifer_type column of feature_to_gene.
source: optional character string to define the source of pathway_member_list
feature_to_gene: data frame, map between intersection_id_type and gene symbols. Columns must include "feature_ID", "gene_symbol", "ensembl_gene", and "kegg_id". MotrpacRatTraining6moData::FEATURE_TO_GENE_FILT by default.
gene_identifier_type: character, column in feature_to_gene that matches the gene identifier type in universe. "gene_symbol" by default.
universe: list of lists of character vectors, named first by assay (i.e., MotrpacRatTraining6moData::ASSAY_ABBREV) and then by tissue (i.e., MotrpacRatTraining6moData::TISSUE_ABBREV). Vectors provide the full set of gene symbols associated with features tested during differential analysis. For example, universe$TRNSCRPT$LUNG should be a character vector of expressed genes in the lung, where the type of gene identifier matches gene_identifier_type. [MotrpacRatTraining6moData::GENE_UNIVERSES]$gene_symbol by default.
add_ensembl_intersection: bool, whether to add a intersection_ensembl column, which converts gene IDs in the intersection to Ensembl IDs
min_input_set_size: integer, input must have this minimum number of unique mappable gene IDs to attempt enrichment
min_pw_set_size: integer, pathway must have at least this many members to attempt enrichment
max_pw_set_size: integer, pathway must have no more than this many members to attempt enrichment
adjust_p: boolean, whether to adjust nominal p-values for multiple testing (IHW by tissue)
num_cores: optional integer, number of cores to register if parallel computing is desired
logfile: optional character, path to log of failed iterations

Value

data frame with enrichment results, or NULL if no enrichment results were returned:

term_size: integer, number of genes that are annotated to the term
query_size: integer, number of genes that were included in the query
intersection_size: integer, the number of genes in the input query that are annotated to the corresponding term
term_id: character, unique term/pathway identifier
source: character, the abbreviation of the data source for the term/pathway
term_name: character, term/pathway name
effective_domain_size: integer, the total number of genes in the universe used for the hypergeometric test
intersection: character, input gene IDs that intersect with the term/pathway
computed_p_value: double, nominal hypergeometric p-value
cluster: character, cluster label
tissue: character, tissue abbreviation, one of MotrpacRatTraining6moData::TISSUE_ABBREV
ome: character, assay abbreviation, one of MotrpacRatTraining6moData::ASSAY_ABBREV
adj_p_value: double, adjusted nominal p-value computed_p_value using IHW with tissue as a covariate

Examples

# Use graphical clusters as an example
cluster_res = extract_main_clusters()
# Pick a single graphical cluster
# Gastrocnemius features up-regulated in both males and females at 8 weeks of training
cluster_res = cluster_res[cluster_res$cluster == "SKM-GN:8w_F1_M1",]

# Make a toy pathway member list with human gene symbols
pathways = list("TCA cycle" = c('ACO2','CS','FH','MDH1','OGDH','PDHA1','PDHA2','SDHC','SUCLG1'))

# Convert human gene symbols to rat gene symbols 
data("RAT_TO_HUMAN_GENE", package = "MotrpacRatTraining6moData")
for (pw in names(pathways)){
  newmembers = c()
  for (m in pathways[[pw]]){
    # get rat symbol
    rat = RAT_TO_HUMAN_GENE$RAT_SYMBOL[RAT_TO_HUMAN_GENE$HUMAN_ORTHOLOG_SYMBOL == m][1]
    if(!is.na(rat)){
      newmembers = c(newmembers, rat)
    }
  }
  pathways[[pw]] = newmembers
}

# Perform pathway enrichment 
custom_cluster_pathway_enrichment(cluster_res, 
                                  pathway_member_list = pathways,
                                  min_pw_set_size = 1)
#> Number of cores not defined with 'num_cores' argument. Setting the number of cores to 1. Explicity define 'num_cores' to take advantage of parallelization. If you aren't sure how many cores are available to you, try 'length(parallel::mcaffinity()).'
#> Running enrichments for 6 gene sets on 1 core(s). This may take a while...
#> 
#> 6 out of 38 identifiers in SKM-GN:8w_F1_M1 ATAC SKM-GN are not in the current universe. Removing these features.
#> [1] "Edn2"           "AABR07030921.1" "U6"             "AABR07004761.1"
#> [5] "AABR07062157.1" "Y_RNA"         
#> 
#> 2 out of 5 identifiers in SKM-GN:8w_F1_M1 METHYL SKM-GN are not in the current universe. Removing these features.
#> [1] "Mir196a"        "AABR07002473.1"
#> Done.
#>   term_size query_size intersection_size   term_id source term_name
#> 1         6        121                 2 TCA_cycle custom TCA cycle
#> 2         9        339                 6 TCA_cycle custom TCA cycle
#>   effective_domain_size                 intersection computed_p_value
#> 1                  3920                      Cs,Mdh1     1.306742e-02
#> 2                  5756 Cs,Fh,Mdh1,Ogdh,Pdha1,Suclg1     2.885974e-06
#>                                                                                                intersection_ensembl
#> 1                                                                             ENSRNOG00000008103,ENSRNOG00000023520
#> 2 ENSRNOG00000003653,ENSRNOG00000008103,ENSRNOG00000025383,ENSRNOG00000005587,ENSRNOG00000023520,ENSRNOG00000005130
#>           cluster tissue     ome  adj_p_value
#> 1 SKM-GN:8w_F1_M1 SKM-GN PHOSPHO 1.306742e-02
#> 2 SKM-GN:8w_F1_M1 SKM-GN    PROT 5.771947e-06

Usage

Arguments

Value

See also

Examples