Skip to contents

Pathway enrichment results for graphical clusters (nodes, edges, and paths) of interest

Usage

GRAPH_PW_ENRICH

Format

A data frame with 156906 rows and 22 variables:

query

character, not used, carried over from gprofiler2::gost() output

significant

logical, not used, carried over from gprofiler2::gost() output

term_size

double, effective pathway size from gprofiler2::gost() output

query_size

integer, size of input, i.e. list of Ensembl genes associated with differential features

intersection_size

double, size of the intersection between the input and the pathway members

precision

double, the proportion of genes in the input list that are annotated to the function (defined as intersection_size/query_size)

recall

double, the proportion of functionally annotated genes that the query recovers (defined as intersection_size/term_size)

term_id

character, pathway term ID

source

character, database corresponding to the pathway, one of: "KEGG", "REAC"

term_name

character, pathway name

effective_domain_size

integer, size of the custom background Ensembl gene set

source_order

integer, not used, carried over from gprofiler2::gost() output

parents

list, pathway parent(s)

evidence_codes

character, not used, carried over from gprofiler2::gost() output

intersection

character, intersection between input and pathway (Ensembl IDs). NA for metabolomics enrichments

gost_adj_p_value

double, BH-adjusted p-value returned by gprofiler2::gost(), ignored because p-values are only adjusted within each tissue/ome/cluster combination. Use the adj_p_value column instead.

computed_p_value

double, nominal hypergeometric p-value, computed from the gprofiler2::gost() output

cluster

character, graphical cluster (node, edge, or path) name

tissue

character, tissue abbreviation, one of TISSUE_ABBREV. Note that VENACV, OVARY, TESTES, were not included in the graphical representation of differential features due to missing groups (e.g., females trained for 1 week).

ome

character, assay abbreviation, one of ASSAY_ABBREV

kegg_id

character, pathway ID returned from FELLA::enrich()

adj_p_value

double, IHW FDR, calculated using IHW::ihw() with tissue as a covariate

graphical_cluster

character, cluster column with tissue prefix removed

Details

All non-metabolite training-regulated features (5% FDR) were mapped to Ensembl gene symbols using FEATURE_TO_GENE. Training-regulated metabolites were mapped to KEGG IDs. For each graphical cluster of interest (i.e., the ten largest paths, two largest nodes, and two largest single edges with at least 20 features in each tissue, as well as all 8-week nodes), we performed pathway enrichment analysis separately for the Ensembl genes (or KEGG IDs for metabolites) associated with differential features in each ome.

For gene-centric omes (i.e., all but metabolomics) we performed enrichment analysis of KEGG and REACTOME rat pathways (organism "rnorvegicus") using gprofiler2::gost() with custom backgrounds defined by GENE_UNIVERSES. Only pathways with at least 10 and up to 200 members were tested. Because gprofiler2::gost() only returns adjusted p-values, we recalculated nominal p-values using a one-tailed hypergeometric test, which is consistent with how gprofiler2::gost() calculates enrichments. See MotrpacRatTraining6mo::cluster_pathway_enrichment() for implementation.

For metabolites, we performed enrichment of KEGG pathways using the hypergeometric method in FELLA::enrich() with custom backgrounds defined by GENE_UNIVERSES. See MotrpacRatTraining6mo::run_fella() for implementation.

Pathway enrichment analysis p-values were adjusted across all results using Independent Hypothesis Weighting (IHW) with tissue as a covariate.