Pathway enrichment for graphical clusters
Source:R/pathway_enrichment.R
cluster_pathway_enrichment.Rd
Wrapper for multi-tissue, multi-omic pathway enrichment of clustering or
graphical results.
Pathway enrichment is performed using gprofiler2::gost()
separately for
each unique combination of tissue, assay/ome, and cluster.
Usage
cluster_pathway_enrichment(
cluster_res,
databases = c("REAC", "KEGG"),
feature_to_gene = MotrpacRatTraining6moData::FEATURE_TO_GENE_FILT,
gene_identifier_type = "ensembl_gene",
universe = MotrpacRatTraining6moData::GENE_UNIVERSES$ensembl_gene,
kegg_db_destination = NULL,
fella_method = "hypergeom",
min_input_set_size = 1,
min_pw_set_size = 10,
max_pw_set_size = 200,
adjust_p = TRUE,
num_cores = NULL,
logfile = "/dev/null",
maxattempt = 50
)
Arguments
- cluster_res
Either a data frame or a list of lists. If a data frame, it needs at least two columns: "feature" and "cluster". The "feature" column should be in the format 'MotrpacRatTraining6moData::ASSAY_ABBREV;MotrpacRatTraining6moData::TISSUE_ABBREV;feature_ID'. If a list of lists, each sublist must be named with the cluster name (character string), and the values must be features in the format 'MotrpacRatTraining6moData::ASSAY_ABBREV;MotrpacRatTraining6moData::TISSUE_ABBREV;feature_ID'.
- databases
character vector of g:Profiler pathway databases to query. "KEGG" and "REAC" (REACTOME) by default. Current options include: GO (GO:BP, GO:MF, GO:CC to select a particular GO branch), KEGG, REAC, TF, MIRNA, CORUM, HP, HPA, WP. See gprofiler2 documentation for an up-to-date list.
- feature_to_gene
data frame, map between
intersection_id_type
and gene symbols. Columns must include "feature_ID", "gene_symbol", "ensembl_gene", and "kegg_id". MotrpacRatTraining6moData::FEATURE_TO_GENE_FILT by default.- gene_identifier_type
character, column in
feature_to_gene
that matches the gene identifier type inuniverse
. "ensembl_gene" by default.- universe
list of lists of character vectors, named first by assay (i.e., MotrpacRatTraining6moData::ASSAY_ABBREV) and then by tissue (i.e., MotrpacRatTraining6moData::TISSUE_ABBREV). Vectors provide the full set of gene symbols associated with features tested during differential analysis. For example,
universe$TRNSCRPT$LUNG
should be a character vector of expressed genes in the lung, where the type of gene identifier matchesgene_identifier_type
.[MotrpacRatTraining6moData::GENE_UNIVERSES]$ensembl_gene
by default.- kegg_db_destination
character, target directory for KEGG database used for
FELLA
pathway enrichment with metabolites. Creates database if it doesn't exist yet.- fella_method
character, enrichment method for
FELLA::enrich()
, one of "hypergeom" or "diffusion", passed torun_fella()
- min_input_set_size
integer, input must have this minimum number of unique mappable gene IDs to attempt enrichment with
gprofiler2::gost()
- min_pw_set_size
integer, pathway must have at least this many members to attempt enrichment with
gprofiler2::gost()
- max_pw_set_size
integer, pathway must have no more than this many members to attempt enrichment with
gprofiler2::gost()
- adjust_p
boolean, whether to adjust nominal p-values for multiple testing (IHW by tissue)
- num_cores
optional integer, number of cores to register if parallel computing is desired
- logfile
optional character, path to log of failed iterations
- maxattempt
integer, max number of consecutive null results from
gprofiler2::gost()
before giving up
Value
data frame with enrichment results, or NULL if no enrichment results were returned:
query
character, the name of the input query which by default is the order of query with the prefix "query_" (from
gprofiler2::gost()
)term_size
integer, number of genes that are annotated to the term (from
gprofiler2::gost()
)query_size
integer, number of genes that were included in the query (from
gprofiler2::gost()
)intersection_size
integer, the number of genes in the input query that are annotated to the corresponding term (from
gprofiler2::gost()
)precision
double, the proportion of genes in the input list that are annotated to the function, defined as
intersection_size/query_size
(fromgprofiler2::gost()
)recall
double, the proportion of functionally annotated genes that the query recovers, defined as
intersection_size/term_size
(fromgprofiler2::gost()
)term_id
character, unique term/pathway identifier (from
gprofiler2::gost()
)source
character, the abbreviation of the data source for the term/pathway (from
gprofiler2::gost()
)term_name
character, term/pathway name (from
gprofiler2::gost()
)effective_domain_size
integer, the total number of genes in the universe used for the hypergeometric test (from
gprofiler2::gost()
)source_order
integer, numeric order for the term within its data source (from
gprofiler2::gost()
)parents
list of term IDs that are hierarchically directly above the term. For non-hierarchical data sources this points to an artificial root node (from
gprofiler2::gost()
).evidence_codes
character, comma-separated evidence codes (from
gprofiler2::gost()
)intersection
character, input gene IDs that intersect with the term/pathway (from
gprofiler2::gost()
)gost_adj_p_value
double, improperly adjusted hypergeometric p-value from
gprofiler2::gost()
. For reference only; should not be used to filter results unless there was only a single ome/tissue/cluster combination in the input.computed_p_value
double, nominal hypergeometric p-value, computed from the
gprofiler2::gost()
outputcluster
character, cluster label
tissue
character, tissue abbreviation, one of MotrpacRatTraining6moData::TISSUE_ABBREV
ome
character, assay abbreviation, one of MotrpacRatTraining6moData::ASSAY_ABBREV
adj_p_value
double, adjusted nominal p-value
computed_p_value
using IHW with tissue as a covariate
Details
FELLA::enrich()
is used for pathway enrichment of metabolites; gprofiler2::gost()
is used for
all other omes assuming features have been mapped to genes.
Pathway enrichments driven by a single gene are excluded.
This function was used to generate MotrpacRatTraining6moData::GRAPH_PW_ENRICH.
Examples
if (FALSE) { # \dontrun{
# Use graphical clusters as an example
cluster_res = extract_main_clusters()
# Pick a single graphical cluster
# Gastrocnemius features up-regulated in both males and females at 8 weeks of training
cluster_res = cluster_res[cluster_res$cluster == "SKM-GN:8w_F1_M1",]
# Example 1: Run pathway enrichment for this cluster on a single core
pw_enrich = cluster_pathway_enrichment(cluster_res)
# Example 2: Run pathway enrichment for this cluster on 4 cores
pw_enrich = cluster_pathway_enrichment(cluster_res, num_cores = 4)
# Example 3: Same as above, but include metabolites.
# Use FELLA's hypergeometric method for enrichment.
pw_enrich = cluster_pathway_enrichment(cluster_res,
num_cores = 4,
kegg_db_destination = "~/KEGGdb/test",
fella_method = "hypergeom")
} # }