Pathway enrichment results for graphical clusters (nodes, edges, and paths) of interest
Format
A data frame with 156906 rows and 22 variables:
- query
- character, not used, carried over from - gprofiler2::gost()output
- significant
- logical, not used, carried over from - gprofiler2::gost()output
- term_size
- double, effective pathway size from - gprofiler2::gost()output
- query_size
- integer, size of input, i.e. list of Ensembl genes associated with differential features 
- intersection_size
- double, size of the intersection between the input and the pathway members 
- precision
- double, the proportion of genes in the input list that are annotated to the function (defined as - intersection_size/query_size)
- recall
- double, the proportion of functionally annotated genes that the query recovers (defined as - intersection_size/term_size)
- term_id
- character, pathway term ID 
- source
- character, database corresponding to the pathway, one of: "KEGG", "REAC" 
- term_name
- character, pathway name 
- effective_domain_size
- integer, size of the custom background Ensembl gene set 
- source_order
- integer, not used, carried over from - gprofiler2::gost()output
- parents
- list, pathway parent(s) 
- evidence_codes
- character, not used, carried over from - gprofiler2::gost()output
- intersection
- character, intersection between input and pathway (Ensembl IDs). NA for metabolomics enrichments 
- gost_adj_p_value
- double, BH-adjusted p-value returned by - gprofiler2::gost(), ignored because p-values are only adjusted within each tissue/ome/cluster combination. Use the- adj_p_valuecolumn instead.
- computed_p_value
- double, nominal hypergeometric p-value, computed from the - gprofiler2::gost()output
- cluster
- character, graphical cluster (node, edge, or path) name 
- tissue
- character, tissue abbreviation, one of TISSUE_ABBREV. Note that VENACV, OVARY, TESTES, were not included in the graphical representation of differential features due to missing groups (e.g., females trained for 1 week). 
- ome
- character, assay abbreviation, one of ASSAY_ABBREV 
- kegg_id
- character, pathway ID returned from - FELLA::enrich()
- adj_p_value
- double, IHW FDR, calculated using - IHW::ihw()with- tissueas a covariate
- graphical_cluster
- character, - clustercolumn with tissue prefix removed
Details
All non-metabolite training-regulated features (5% FDR) were mapped to Ensembl gene symbols using FEATURE_TO_GENE. Training-regulated metabolites were mapped to KEGG IDs. For each graphical cluster of interest (i.e., the ten largest paths, two largest nodes, and two largest single edges with at least 20 features in each tissue, as well as all 8-week nodes), we performed pathway enrichment analysis separately for the Ensembl genes (or KEGG IDs for metabolites) associated with differential features in each ome.
For gene-centric omes (i.e., all but metabolomics)
we performed enrichment analysis of KEGG and REACTOME rat pathways (organism "rnorvegicus")
using gprofiler2::gost() with custom backgrounds defined by GENE_UNIVERSES.
Only pathways with at least 10 and up to 200 members were tested. Because gprofiler2::gost()
only returns adjusted p-values, we recalculated nominal p-values using a one-tailed hypergeometric test,
which is consistent with how gprofiler2::gost() calculates enrichments.
See MotrpacRatTraining6mo::cluster_pathway_enrichment() for implementation.
For metabolites,
we performed enrichment of KEGG pathways using the hypergeometric method in FELLA::enrich()
with custom backgrounds defined by GENE_UNIVERSES. See MotrpacRatTraining6mo::run_fella() for implementation.
Pathway enrichment analysis p-values were adjusted across all results using Independent Hypothesis Weighting (IHW) with tissue as a covariate.