Perform pathway enrichment for a list of genes.
Wrapper for gprofiler2::gost().
Usage
gene_pathway_enrichment(
input,
background,
organism = "rnorvegicus",
databases = c("REAC", "WP", "KEGG"),
min_pw_set_size = 10,
max_pw_set_size = 200,
return_gem = FALSE
)Arguments
- input
vector of gene identifiers
- background
vector of gene identifiers in universe/background. Must include the input. Should include all genes that were candidates for
input, e.g., expressed genes. If performing pathway enrichment on genes identified throughMotrpacRatTraining6moData, see MotrpacRatTraining6moData::GENE_UNIVERSES. Seecustom_bgargument forgprofiler2::gost().- organism
character, species name. Default: "rnorvegicus" See
organismargument forgprofiler2::gost().- databases
character, vector of databases in which to search. See
sourcesargument forgprofiler2::gost().- min_pw_set_size
integer, exclude pathways smaller than this. Default: 10
- max_pw_set_size
integer, exclude pathways larger than this. Default: 200
- return_gem
bool, whether to return a data frame compatible with the Generic Enrichment Map (GEM) file format, which can be used as an input for the Cytoscape EnrichmentMap application. Default: FALSE
Value
a data frame of pathway enrichment results. If return_gem=TRUE,
column names are changed add added to be compatible with the Generic Enrichment Map (GEM) file format.
Otherwise, the columns are defined as follows:
querycharacter, the name of the input query which by default is the order of query with the prefix "query_"
significantbool, whether the
gprofiler2::gost()enrichment p-value is less than 0.05term_sizeinteger, number of genes that are annotated to the term
query_sizeinteger, number of genes that were included in the query (from
gprofiler2::gost())intersection_sizeinteger, the number of genes in the input query that are annotated to the corresponding term
precisiondouble, the proportion of genes in the input list that are annotated to the function, defined as
intersection_size/query_sizerecalldouble, the proportion of functionally annotated genes that the query recovers, defined as
intersection_size/term_sizeterm_idcharacter, unique term/pathway identifier
sourcecharacter, the abbreviation of the data source for the term/pathway
term_namecharacter, term/pathway name
effective_domain_sizeinteger, the total number of genes in the universe used for the hypergeometric test
source_orderinteger, numeric order for the term within its data source
parentslist of term IDs that are hierarchically directly above the term. For non-hierarchical data sources this points to an artificial root node
evidence_codescharacter, comma-separated evidence codes
intersectioncharacter, input gene IDs that intersect with the term/pathway
gost_adj_p_valuedouble, improperly adjusted hypergeometric p-value from
gprofiler2::gost(). For reference only; should not be used to filter results unless there was only a single list of genes in the input.computed_p_valuedouble, nominal hypergeometric p-value, computed from the
gprofiler2::gost()outputBH_adj_p_valuedouble, BH-adjusted p-values, calculated on
computed_p_value
Examples
# Perform pathway enrichment for differential transcripts in the liver
diff = MotrpacRatTraining6moData::TRAINING_REGULATED_FEATURES
input_feat = diff$feature_ID[diff$tissue == "LIVER" & diff$assay == "TRNSCRPT"]
map = MotrpacRatTraining6moData::FEATURE_TO_GENE_FILT
input = unique(map$gene_symbol[map$feature_ID %in% input_feat])
background = MotrpacRatTraining6moData::GENE_UNIVERSES$gene_symbol$TRNSCRPT$LIVER
res = gene_pathway_enrichment(input, background)
head(res)
#> query significant term_size query_size intersection_size precision
#> 1 query_1 TRUE 18 253 8 0.03162055
#> 2 query_1 TRUE 17 253 8 0.03162055
#> 3 query_1 TRUE 17 253 6 0.02371542
#> 4 query_1 TRUE 62 253 9 0.03557312
#> 5 query_1 TRUE 13 253 5 0.01976285
#> 6 query_1 TRUE 104 253 11 0.04347826
#> recall term_id source term_name
#> 1 0.4444444 KEGG:00100 KEGG Steroid biosynthesis
#> 2 0.4705882 WP:WP632 WP Cholesterol metabolism
#> 3 0.3529412 REAC:R-RNO-191273 REAC Cholesterol biosynthesis
#> 4 0.1451613 REAC:R-RNO-8957322 REAC Metabolism of steroids
#> 5 0.3846154 REAC:R-RNO-3371571 REAC HSF1-dependent transactivation
#> 6 0.1057692 REAC:R-RNO-211859 REAC Biological oxidations
#> effective_domain_size source_order parents
#> 1 13525 13 KEGG:00000
#> 2 13525 120 WP:000000
#> 3 13525 275 REAC:R-RNO-8957322
#> 4 13525 826 REAC:R-RNO-556833
#> 5 13525 626 REAC:R-RNO-3371556
#> 6 13525 171 REAC:R-RNO-1430728
#> evidence_codes
#> 1 KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG
#> 2 WP,WP,WP,WP,WP,WP,WP,WP
#> 3 REAC,REAC,REAC,REAC,REAC,REAC
#> 4 REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC
#> 5 REAC,REAC,REAC,REAC,REAC
#> 6 REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC
#> intersection
#> 1 Hsd17b7,Cyp51,Sqle,Dhcr7,Tm7sf2,Fdft1,Msmo1,Lss
#> 2 Idi1,Apoc2,Dhcr7,Fdft1,Msmo1,Fdps,Mvk,Lss
#> 3 Cyp51,Sqle,Dhcr7,Tm7sf2,Fdft1,Msmo1
#> 4 Cyp51,Serpina6,Sqle,Slc27a5,Stard4,Dhcr7,Tm7sf2,Fdft1,Msmo1
#> 5 Fkbp4,Cryab,Hsp90ab1,Hspb8,Hspa8
#> 6 Tpst1,Nnmt,Cyp51,Ugp2,Cyb5r3,Glyat,Gclm,Slc26a2,Hsp90ab1,Cyp2a1,Cyp4f39
#> gost_adj_p_value computed_p_value BH_adj_p_value
#> 1 1.299433e-07 4.997818e-10 1.536829e-07
#> 2 1.834283e-08 2.821974e-10 1.536829e-07
#> 3 9.165948e-05 4.204563e-07 8.619355e-05
#> 4 2.732293e-04 2.078412e-06 3.083229e-04
#> 5 2.732293e-04 2.506691e-06 3.083229e-04
#> 6 3.311635e-04 3.797747e-06 3.892691e-04