Skip to contents

Wrapper for multi-tissue, multi-omic pathway enrichment of clustering or graphical results. Pathway enrichment is performed using gprofiler2::gost() separately for each unique combination of tissue, assay/ome, and cluster.

Usage

cluster_pathway_enrichment(
  cluster_res,
  databases = c("REAC", "KEGG"),
  feature_to_gene = MotrpacRatTraining6moData::FEATURE_TO_GENE_FILT,
  gene_identifier_type = "ensembl_gene",
  universe = MotrpacRatTraining6moData::GENE_UNIVERSES$ensembl_gene,
  kegg_db_destination = NULL,
  fella_method = "hypergeom",
  min_input_set_size = 1,
  min_pw_set_size = 10,
  max_pw_set_size = 200,
  adjust_p = TRUE,
  num_cores = NULL,
  logfile = "/dev/null",
  maxattempt = 50
)

Arguments

cluster_res

Either a data frame or a list of lists. If a data frame, it needs at least two columns: "feature" and "cluster". The "feature" column should be in the format 'MotrpacRatTraining6moData::ASSAY_ABBREV;MotrpacRatTraining6moData::TISSUE_ABBREV;feature_ID'. If a list of lists, each sublist must be named with the cluster name (character string), and the values must be features in the format 'MotrpacRatTraining6moData::ASSAY_ABBREV;MotrpacRatTraining6moData::TISSUE_ABBREV;feature_ID'.

databases

character vector of g:Profiler pathway databases to query. "KEGG" and "REAC" (REACTOME) by default. Current options include: GO (GO:BP, GO:MF, GO:CC to select a particular GO branch), KEGG, REAC, TF, MIRNA, CORUM, HP, HPA, WP. See gprofiler2 documentation for an up-to-date list.

feature_to_gene

data frame, map between intersection_id_type and gene symbols. Columns must include "feature_ID", "gene_symbol", "ensembl_gene", and "kegg_id". MotrpacRatTraining6moData::FEATURE_TO_GENE_FILT by default.

gene_identifier_type

character, column in feature_to_gene that matches the gene identifier type in universe. "ensembl_gene" by default.

universe

list of lists of character vectors, named first by assay (i.e., MotrpacRatTraining6moData::ASSAY_ABBREV) and then by tissue (i.e., MotrpacRatTraining6moData::TISSUE_ABBREV). Vectors provide the full set of gene symbols associated with features tested during differential analysis. For example, universe$TRNSCRPT$LUNG should be a character vector of expressed genes in the lung, where the type of gene identifier matches gene_identifier_type. [MotrpacRatTraining6moData::GENE_UNIVERSES]$ensembl_gene by default.

kegg_db_destination

character, target directory for KEGG database used for FELLA pathway enrichment with metabolites. Creates database if it doesn't exist yet.

fella_method

character, enrichment method for FELLA::enrich(), one of "hypergeom" or "diffusion", passed to run_fella()

min_input_set_size

integer, input must have this minimum number of unique mappable gene IDs to attempt enrichment with gprofiler2::gost()

min_pw_set_size

integer, pathway must have at least this many members to attempt enrichment with gprofiler2::gost()

max_pw_set_size

integer, pathway must have no more than this many members to attempt enrichment with gprofiler2::gost()

adjust_p

boolean, whether to adjust nominal p-values for multiple testing (IHW by tissue)

num_cores

optional integer, number of cores to register if parallel computing is desired

logfile

optional character, path to log of failed iterations

maxattempt

integer, max number of consecutive null results from gprofiler2::gost() before giving up

Value

data frame with enrichment results, or NULL if no enrichment results were returned:

query

character, the name of the input query which by default is the order of query with the prefix "query_" (from gprofiler2::gost())

term_size

integer, number of genes that are annotated to the term (from gprofiler2::gost())

query_size

integer, number of genes that were included in the query (from gprofiler2::gost())

intersection_size

integer, the number of genes in the input query that are annotated to the corresponding term (from gprofiler2::gost())

precision

double, the proportion of genes in the input list that are annotated to the function, defined as intersection_size/query_size (from gprofiler2::gost())

recall

double, the proportion of functionally annotated genes that the query recovers, defined as intersection_size/term_size (from gprofiler2::gost())

term_id

character, unique term/pathway identifier (from gprofiler2::gost())

source

character, the abbreviation of the data source for the term/pathway (from gprofiler2::gost())

term_name

character, term/pathway name (from gprofiler2::gost())

effective_domain_size

integer, the total number of genes in the universe used for the hypergeometric test (from gprofiler2::gost())

source_order

integer, numeric order for the term within its data source (from gprofiler2::gost())

parents

list of term IDs that are hierarchically directly above the term. For non-hierarchical data sources this points to an artificial root node (from gprofiler2::gost()).

evidence_codes

character, comma-separated evidence codes (from gprofiler2::gost())

intersection

character, input gene IDs that intersect with the term/pathway (from gprofiler2::gost())

gost_adj_p_value

double, improperly adjusted hypergeometric p-value from gprofiler2::gost(). For reference only; should not be used to filter results unless there was only a single ome/tissue/cluster combination in the input.

computed_p_value

double, nominal hypergeometric p-value, computed from the gprofiler2::gost() output

cluster

character, cluster label

tissue

character, tissue abbreviation, one of MotrpacRatTraining6moData::TISSUE_ABBREV

ome

character, assay abbreviation, one of MotrpacRatTraining6moData::ASSAY_ABBREV

adj_p_value

double, adjusted nominal p-value computed_p_value using IHW with tissue as a covariate

Details

FELLA::enrich() is used for pathway enrichment of metabolites; gprofiler2::gost() is used for all other omes assuming features have been mapped to genes.

Pathway enrichments driven by a single gene are excluded.

This function was used to generate MotrpacRatTraining6moData::GRAPH_PW_ENRICH.

Examples

if (FALSE) {
# Use graphical clusters as an example
cluster_res = extract_main_clusters()
# Pick a single graphical cluster
# Gastrocnemius features up-regulated in both males and females at 8 weeks of training
cluster_res = cluster_res[cluster_res$cluster == "SKM-GN:8w_F1_M1",]

# Example 1: Run pathway enrichment for this cluster on a single core
pw_enrich = cluster_pathway_enrichment(cluster_res)

# Example 2: Run pathway enrichment for this cluster on 4 cores
pw_enrich = cluster_pathway_enrichment(cluster_res, num_cores = 4)

# Example 3: Same as above, but include metabolites. 
# Use FELLA's hypergeometric method for enrichment. 
pw_enrich = cluster_pathway_enrichment(cluster_res, 
                                       num_cores = 4,
                                       kegg_db_destination = "~/KEGGdb/test",
                                       fella_method = "hypergeom")
}