Pathway enrichment for graphical clusters — cluster_pathway

Wrapper for multi-tissue, multi-omic pathway enrichment of clustering or graphical results. Pathway enrichment is performed using gprofiler2::gost() separately for each unique combination of tissue, assay/ome, and cluster.

Usage

cluster_pathway_enrichment(
  cluster_res,
  databases = c("REAC", "KEGG"),
  feature_to_gene = MotrpacRatTraining6moData::FEATURE_TO_GENE_FILT,
  gene_identifier_type = "ensembl_gene",
  universe = MotrpacRatTraining6moData::GENE_UNIVERSES$ensembl_gene,
  kegg_db_destination = NULL,
  fella_method = "hypergeom",
  min_input_set_size = 1,
  min_pw_set_size = 10,
  max_pw_set_size = 200,
  adjust_p = TRUE,
  num_cores = NULL,
  logfile = "/dev/null",
  maxattempt = 50
)

Arguments

cluster_res: Either a data frame or a list of lists. If a data frame, it needs at least two columns: "feature" and "cluster". The "feature" column should be in the format 'MotrpacRatTraining6moData::ASSAY_ABBREV;MotrpacRatTraining6moData::TISSUE_ABBREV;feature_ID'. If a list of lists, each sublist must be named with the cluster name (character string), and the values must be features in the format 'MotrpacRatTraining6moData::ASSAY_ABBREV;MotrpacRatTraining6moData::TISSUE_ABBREV;feature_ID'.
databases: character vector of g:Profiler pathway databases to query. "KEGG" and "REAC" (REACTOME) by default. Current options include: GO (GO:BP, GO:MF, GO:CC to select a particular GO branch), KEGG, REAC, TF, MIRNA, CORUM, HP, HPA, WP. See gprofiler2 documentation for an up-to-date list.
feature_to_gene: data frame, map between intersection_id_type and gene symbols. Columns must include "feature_ID", "gene_symbol", "ensembl_gene", and "kegg_id". MotrpacRatTraining6moData::FEATURE_TO_GENE_FILT by default.
gene_identifier_type: character, column in feature_to_gene that matches the gene identifier type in universe. "ensembl_gene" by default.
universe: list of lists of character vectors, named first by assay (i.e., MotrpacRatTraining6moData::ASSAY_ABBREV) and then by tissue (i.e., MotrpacRatTraining6moData::TISSUE_ABBREV). Vectors provide the full set of gene symbols associated with features tested during differential analysis. For example, universe$TRNSCRPT$LUNG should be a character vector of expressed genes in the lung, where the type of gene identifier matches gene_identifier_type. [MotrpacRatTraining6moData::GENE_UNIVERSES]$ensembl_gene by default.
kegg_db_destination: character, target directory for KEGG database used for FELLA pathway enrichment with metabolites. Creates database if it doesn't exist yet.
fella_method: character, enrichment method for FELLA::enrich(), one of "hypergeom" or "diffusion", passed to run_fella()
min_input_set_size: integer, input must have this minimum number of unique mappable gene IDs to attempt enrichment with gprofiler2::gost()
min_pw_set_size: integer, pathway must have at least this many members to attempt enrichment with gprofiler2::gost()
max_pw_set_size: integer, pathway must have no more than this many members to attempt enrichment with gprofiler2::gost()
adjust_p: boolean, whether to adjust nominal p-values for multiple testing (IHW by tissue)
num_cores: optional integer, number of cores to register if parallel computing is desired
logfile: optional character, path to log of failed iterations
maxattempt: integer, max number of consecutive null results from gprofiler2::gost() before giving up

Value

data frame with enrichment results, or NULL if no enrichment results were returned:

query: character, the name of the input query which by default is the order of query with the prefix "query_" (from gprofiler2::gost())
term_size: integer, number of genes that are annotated to the term (from gprofiler2::gost())
query_size: integer, number of genes that were included in the query (from gprofiler2::gost())
intersection_size: integer, the number of genes in the input query that are annotated to the corresponding term (from gprofiler2::gost())
precision: double, the proportion of genes in the input list that are annotated to the function, defined as intersection_size/query_size (from gprofiler2::gost())
recall: double, the proportion of functionally annotated genes that the query recovers, defined as intersection_size/term_size (from gprofiler2::gost())
term_id: character, unique term/pathway identifier (from gprofiler2::gost())
source: character, the abbreviation of the data source for the term/pathway (from gprofiler2::gost())
term_name: character, term/pathway name (from gprofiler2::gost())
effective_domain_size: integer, the total number of genes in the universe used for the hypergeometric test (from gprofiler2::gost())
source_order: integer, numeric order for the term within its data source (from gprofiler2::gost())
parents: list of term IDs that are hierarchically directly above the term. For non-hierarchical data sources this points to an artificial root node (from gprofiler2::gost()).
evidence_codes: character, comma-separated evidence codes (from gprofiler2::gost())
intersection: character, input gene IDs that intersect with the term/pathway (from gprofiler2::gost())
gost_adj_p_value: double, improperly adjusted hypergeometric p-value from gprofiler2::gost(). For reference only; should not be used to filter results unless there was only a single ome/tissue/cluster combination in the input.
computed_p_value: double, nominal hypergeometric p-value, computed from the gprofiler2::gost() output
cluster: character, cluster label
tissue: character, tissue abbreviation, one of MotrpacRatTraining6moData::TISSUE_ABBREV
ome: character, assay abbreviation, one of MotrpacRatTraining6moData::ASSAY_ABBREV
adj_p_value: double, adjusted nominal p-value computed_p_value using IHW with tissue as a covariate

Details

FELLA::enrich() is used for pathway enrichment of metabolites; gprofiler2::gost() is used for all other omes assuming features have been mapped to genes.

Pathway enrichments driven by a single gene are excluded.

This function was used to generate MotrpacRatTraining6moData::GRAPH_PW_ENRICH.

Examples

if (FALSE) { # \dontrun{
# Use graphical clusters as an example
cluster_res = extract_main_clusters()
# Pick a single graphical cluster
# Gastrocnemius features up-regulated in both males and females at 8 weeks of training
cluster_res = cluster_res[cluster_res$cluster == "SKM-GN:8w_F1_M1",]

# Example 1: Run pathway enrichment for this cluster on a single core
pw_enrich = cluster_pathway_enrichment(cluster_res)

# Example 2: Run pathway enrichment for this cluster on 4 cores
pw_enrich = cluster_pathway_enrichment(cluster_res, num_cores = 4)

# Example 3: Same as above, but include metabolites. 
# Use FELLA's hypergeometric method for enrichment. 
pw_enrich = cluster_pathway_enrichment(cluster_res, 
                                       num_cores = 4,
                                       kegg_db_destination = "~/KEGGdb/test",
                                       fella_method = "hypergeom")
} # }