Perform pathway enrichment for a list of genes.
Wrapper for gprofiler2::gost()
.
Usage
gene_pathway_enrichment(
input,
background,
organism = "rnorvegicus",
databases = c("REAC", "WP", "KEGG"),
min_pw_set_size = 10,
max_pw_set_size = 200,
return_gem = FALSE
)
Arguments
- input
vector of gene identifiers
- background
vector of gene identifiers in universe/background. Must include the input. Should include all genes that were candidates for
input
, e.g., expressed genes. If performing pathway enrichment on genes identified throughMotrpacRatTraining6moData
, see MotrpacRatTraining6moData::GENE_UNIVERSES. Seecustom_bg
argument forgprofiler2::gost()
.- organism
character, species name. Default: "rnorvegicus" See
organism
argument forgprofiler2::gost()
.- databases
character, vector of databases in which to search. See
sources
argument forgprofiler2::gost()
.- min_pw_set_size
integer, exclude pathways smaller than this. Default: 10
- max_pw_set_size
integer, exclude pathways larger than this. Default: 200
- return_gem
bool, whether to return a data frame compatible with the Generic Enrichment Map (GEM) file format, which can be used as an input for the Cytoscape EnrichmentMap application. Default: FALSE
Value
a data frame of pathway enrichment results. If return_gem=TRUE
,
column names are changed add added to be compatible with the Generic Enrichment Map (GEM) file format.
Otherwise, the columns are defined as follows:
query
character, the name of the input query which by default is the order of query with the prefix "query_"
significant
bool, whether the
gprofiler2::gost()
enrichment p-value is less than 0.05term_size
integer, number of genes that are annotated to the term
query_size
integer, number of genes that were included in the query (from
gprofiler2::gost()
)intersection_size
integer, the number of genes in the input query that are annotated to the corresponding term
precision
double, the proportion of genes in the input list that are annotated to the function, defined as
intersection_size/query_size
recall
double, the proportion of functionally annotated genes that the query recovers, defined as
intersection_size/term_size
term_id
character, unique term/pathway identifier
source
character, the abbreviation of the data source for the term/pathway
term_name
character, term/pathway name
effective_domain_size
integer, the total number of genes in the universe used for the hypergeometric test
source_order
integer, numeric order for the term within its data source
parents
list of term IDs that are hierarchically directly above the term. For non-hierarchical data sources this points to an artificial root node
evidence_codes
character, comma-separated evidence codes
intersection
character, input gene IDs that intersect with the term/pathway
gost_adj_p_value
double, improperly adjusted hypergeometric p-value from
gprofiler2::gost()
. For reference only; should not be used to filter results unless there was only a single list of genes in the input.computed_p_value
double, nominal hypergeometric p-value, computed from the
gprofiler2::gost()
outputBH_adj_p_value
double, BH-adjusted p-values, calculated on
computed_p_value
Examples
# Perform pathway enrichment for differential transcripts in the liver
diff = MotrpacRatTraining6moData::TRAINING_REGULATED_FEATURES
input_feat = diff$feature_ID[diff$tissue == "LIVER" & diff$assay == "TRNSCRPT"]
map = MotrpacRatTraining6moData::FEATURE_TO_GENE_FILT
input = unique(map$gene_symbol[map$feature_ID %in% input_feat])
background = MotrpacRatTraining6moData::GENE_UNIVERSES$gene_symbol$TRNSCRPT$LIVER
res = gene_pathway_enrichment(input, background)
head(res)
#> query significant term_size query_size intersection_size precision
#> 1 query_1 TRUE 18 253 8 0.03162055
#> 2 query_1 TRUE 17 253 8 0.03162055
#> 3 query_1 TRUE 17 253 6 0.02371542
#> 4 query_1 TRUE 62 253 9 0.03557312
#> 5 query_1 TRUE 13 253 5 0.01976285
#> 6 query_1 TRUE 104 253 11 0.04347826
#> recall term_id source term_name
#> 1 0.4444444 KEGG:00100 KEGG Steroid biosynthesis
#> 2 0.4705882 WP:WP632 WP Cholesterol metabolism
#> 3 0.3529412 REAC:R-RNO-191273 REAC Cholesterol biosynthesis
#> 4 0.1451613 REAC:R-RNO-8957322 REAC Metabolism of steroids
#> 5 0.3846154 REAC:R-RNO-3371571 REAC HSF1-dependent transactivation
#> 6 0.1057692 REAC:R-RNO-211859 REAC Biological oxidations
#> effective_domain_size source_order parents
#> 1 13525 13 KEGG:00000
#> 2 13525 120 WP:000000
#> 3 13525 275 REAC:R-RNO-8957322
#> 4 13525 826 REAC:R-RNO-556833
#> 5 13525 626 REAC:R-RNO-3371556
#> 6 13525 171 REAC:R-RNO-1430728
#> evidence_codes
#> 1 KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG
#> 2 WP,WP,WP,WP,WP,WP,WP,WP
#> 3 REAC,REAC,REAC,REAC,REAC,REAC
#> 4 REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC
#> 5 REAC,REAC,REAC,REAC,REAC
#> 6 REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC
#> intersection
#> 1 Hsd17b7,Cyp51,Sqle,Dhcr7,Tm7sf2,Fdft1,Msmo1,Lss
#> 2 Idi1,Apoc2,Dhcr7,Fdft1,Msmo1,Fdps,Mvk,Lss
#> 3 Cyp51,Sqle,Dhcr7,Tm7sf2,Fdft1,Msmo1
#> 4 Cyp51,Serpina6,Sqle,Slc27a5,Stard4,Dhcr7,Tm7sf2,Fdft1,Msmo1
#> 5 Fkbp4,Cryab,Hsp90ab1,Hspb8,Hspa8
#> 6 Tpst1,Nnmt,Cyp51,Ugp2,Cyb5r3,Glyat,Gclm,Slc26a2,Hsp90ab1,Cyp2a1,Cyp4f39
#> gost_adj_p_value computed_p_value BH_adj_p_value
#> 1 1.299433e-07 4.997818e-10 1.536829e-07
#> 2 1.834283e-08 2.821974e-10 1.536829e-07
#> 3 9.165948e-05 4.204563e-07 8.619355e-05
#> 4 2.732293e-04 2.078412e-06 3.083229e-04
#> 5 2.732293e-04 2.506691e-06 3.083229e-04
#> 6 3.311635e-04 3.797747e-06 3.892691e-04