Custom pathway enrichment for graphical clusters
Source:R/pathway_enrichment.R
custom_cluster_pathway_enrichment.Rd
Wrapper for multi-tissue, multi-omic pathway enrichment of clustering or graphical results with a user-supplied list of pathways. Pathway enrichment is performed using a hypergeometric test separately for each unique combination of tissue, assay/ome, and cluster.
Usage
custom_cluster_pathway_enrichment(
cluster_res,
pathway_member_list,
source = "custom",
feature_to_gene = MotrpacRatTraining6moData::FEATURE_TO_GENE_FILT,
gene_identifier_type = "gene_symbol",
universe = MotrpacRatTraining6moData::GENE_UNIVERSES$gene_symbol,
add_ensembl_intersection = TRUE,
min_input_set_size = 1,
min_pw_set_size = 10,
max_pw_set_size = 200,
adjust_p = TRUE,
num_cores = NULL,
logfile = "/dev/null"
)
Arguments
- cluster_res
Either a data frame or a list of lists. If a data frame, it needs at least two columns: "feature" and "cluster". The "feature" column should be in the format 'MotrpacRatTraining6moData::ASSAY_ABBREV;MotrpacRatTraining6moData::TISSUE_ABBREV;feature_ID'. If a list of lists, each sublist must be named with the cluster name (character string), and the values must be features in the format 'MotrpacRatTraining6moData::ASSAY_ABBREV;MotrpacRatTraining6moData::TISSUE_ABBREV;feature_ID'.
- pathway_member_list
named list of character vectors where names are pathway names and values are pathway members. Pathway members must match values in the
gene_identifer_type
column offeature_to_gene
.- source
optional character string to define the source of
pathway_member_list
- feature_to_gene
data frame, map between
intersection_id_type
and gene symbols. Columns must include "feature_ID", "gene_symbol", "ensembl_gene", and "kegg_id". MotrpacRatTraining6moData::FEATURE_TO_GENE_FILT by default.- gene_identifier_type
character, column in
feature_to_gene
that matches the gene identifier type inuniverse
. "gene_symbol" by default.- universe
list of lists of character vectors, named first by assay (i.e., MotrpacRatTraining6moData::ASSAY_ABBREV) and then by tissue (i.e., MotrpacRatTraining6moData::TISSUE_ABBREV). Vectors provide the full set of gene symbols associated with features tested during differential analysis. For example,
universe$TRNSCRPT$LUNG
should be a character vector of expressed genes in the lung, where the type of gene identifier matchesgene_identifier_type
.[MotrpacRatTraining6moData::GENE_UNIVERSES]$gene_symbol
by default.- add_ensembl_intersection
bool, whether to add a
intersection_ensembl
column, which converts gene IDs in the intersection to Ensembl IDs- min_input_set_size
integer, input must have this minimum number of unique mappable gene IDs to attempt enrichment
- min_pw_set_size
integer, pathway must have at least this many members to attempt enrichment
- max_pw_set_size
integer, pathway must have no more than this many members to attempt enrichment
- adjust_p
boolean, whether to adjust nominal p-values for multiple testing (IHW by tissue)
- num_cores
optional integer, number of cores to register if parallel computing is desired
- logfile
optional character, path to log of failed iterations
Value
data frame with enrichment results, or NULL if no enrichment results were returned:
term_size
integer, number of genes that are annotated to the term
query_size
integer, number of genes that were included in the query
intersection_size
integer, the number of genes in the input query that are annotated to the corresponding term
term_id
character, unique term/pathway identifier
source
character, the abbreviation of the data source for the term/pathway
term_name
character, term/pathway name
effective_domain_size
integer, the total number of genes in the universe used for the hypergeometric test
intersection
character, input gene IDs that intersect with the term/pathway
computed_p_value
double, nominal hypergeometric p-value
cluster
character, cluster label
tissue
character, tissue abbreviation, one of MotrpacRatTraining6moData::TISSUE_ABBREV
ome
character, assay abbreviation, one of MotrpacRatTraining6moData::ASSAY_ABBREV
adj_p_value
double, adjusted nominal p-value
computed_p_value
using IHW with tissue as a covariate
Examples
# Use graphical clusters as an example
cluster_res = extract_main_clusters()
# Pick a single graphical cluster
# Gastrocnemius features up-regulated in both males and females at 8 weeks of training
cluster_res = cluster_res[cluster_res$cluster == "SKM-GN:8w_F1_M1",]
# Make a toy pathway member list with human gene symbols
pathways = list("TCA cycle" = c('ACO2','CS','FH','MDH1','OGDH','PDHA1','PDHA2','SDHC','SUCLG1'))
# Convert human gene symbols to rat gene symbols
data("RAT_TO_HUMAN_GENE", package = "MotrpacRatTraining6moData")
for (pw in names(pathways)){
newmembers = c()
for (m in pathways[[pw]]){
# get rat symbol
rat = RAT_TO_HUMAN_GENE$RAT_SYMBOL[RAT_TO_HUMAN_GENE$HUMAN_ORTHOLOG_SYMBOL == m][1]
if(!is.na(rat)){
newmembers = c(newmembers, rat)
}
}
pathways[[pw]] = newmembers
}
# Perform pathway enrichment
custom_cluster_pathway_enrichment(cluster_res,
pathway_member_list = pathways,
min_pw_set_size = 1)
#> Number of cores not defined with 'num_cores' argument. Setting the number of cores to 1. Explicity define 'num_cores' to take advantage of parallelization. If you aren't sure how many cores are available to you, try 'length(parallel::mcaffinity()).'
#> Running enrichments for 6 gene sets on 1 core(s). This may take a while...
#>
#> 6 out of 38 identifiers in SKM-GN:8w_F1_M1 ATAC SKM-GN are not in the current universe. Removing these features.
#> [1] "Edn2" "AABR07030921.1" "U6" "AABR07004761.1"
#> [5] "AABR07062157.1" "Y_RNA"
#>
#> 2 out of 5 identifiers in SKM-GN:8w_F1_M1 METHYL SKM-GN are not in the current universe. Removing these features.
#> [1] "Mir196a" "AABR07002473.1"
#> Done.
#> term_size query_size intersection_size term_id source term_name
#> 1 6 121 2 TCA_cycle custom TCA cycle
#> 2 9 339 6 TCA_cycle custom TCA cycle
#> effective_domain_size intersection computed_p_value
#> 1 3920 Cs,Mdh1 1.306742e-02
#> 2 5756 Cs,Fh,Mdh1,Ogdh,Pdha1,Suclg1 2.885974e-06
#> intersection_ensembl
#> 1 ENSRNOG00000008103,ENSRNOG00000023520
#> 2 ENSRNOG00000003653,ENSRNOG00000008103,ENSRNOG00000025383,ENSRNOG00000005587,ENSRNOG00000023520,ENSRNOG00000005130
#> cluster tissue ome adj_p_value
#> 1 SKM-GN:8w_F1_M1 SKM-GN PHOSPHO 1.306742e-02
#> 2 SKM-GN:8w_F1_M1 SKM-GN PROT 5.771947e-06