Gene pathway enrichment — gene_pathway_enrichment • MotrpacRatTraining6mo

Perform pathway enrichment for a list of genes. Wrapper for gprofiler2::gost().

Usage

gene_pathway_enrichment(
  input,
  background,
  organism = "rnorvegicus",
  databases = c("REAC", "WP", "KEGG"),
  min_pw_set_size = 10,
  max_pw_set_size = 200,
  return_gem = FALSE
)

Arguments

input: vector of gene identifiers
background: vector of gene identifiers in universe/background. Must include the input. Should include all genes that were candidates for input, e.g., expressed genes. If performing pathway enrichment on genes identified through MotrpacRatTraining6moData, see MotrpacRatTraining6moData::GENE_UNIVERSES. See custom_bg argument for gprofiler2::gost().
organism: character, species name. Default: "rnorvegicus" See organism argument for gprofiler2::gost().
databases: character, vector of databases in which to search. See sources argument for gprofiler2::gost().
min_pw_set_size: integer, exclude pathways smaller than this. Default: 10
max_pw_set_size: integer, exclude pathways larger than this. Default: 200
return_gem: bool, whether to return a data frame compatible with the Generic Enrichment Map (GEM) file format, which can be used as an input for the Cytoscape EnrichmentMap application. Default: FALSE

Value

a data frame of pathway enrichment results. If return_gem=TRUE, column names are changed add added to be compatible with the Generic Enrichment Map (GEM) file format. Otherwise, the columns are defined as follows:

query: character, the name of the input query which by default is the order of query with the prefix "query_"
significant: bool, whether the gprofiler2::gost() enrichment p-value is less than 0.05
term_size: integer, number of genes that are annotated to the term
query_size: integer, number of genes that were included in the query (from gprofiler2::gost())
intersection_size: integer, the number of genes in the input query that are annotated to the corresponding term
precision: double, the proportion of genes in the input list that are annotated to the function, defined as intersection_size/query_size
recall: double, the proportion of functionally annotated genes that the query recovers, defined as intersection_size/term_size
term_id: character, unique term/pathway identifier
source: character, the abbreviation of the data source for the term/pathway
term_name: character, term/pathway name
effective_domain_size: integer, the total number of genes in the universe used for the hypergeometric test
source_order: integer, numeric order for the term within its data source
parents: list of term IDs that are hierarchically directly above the term. For non-hierarchical data sources this points to an artificial root node
evidence_codes: character, comma-separated evidence codes
intersection: character, input gene IDs that intersect with the term/pathway
gost_adj_p_value: double, improperly adjusted hypergeometric p-value from gprofiler2::gost(). For reference only; should not be used to filter results unless there was only a single list of genes in the input.
computed_p_value: double, nominal hypergeometric p-value, computed from the gprofiler2::gost() output
BH_adj_p_value: double, BH-adjusted p-values, calculated on computed_p_value

Examples

# Perform pathway enrichment for differential transcripts in the liver
diff = MotrpacRatTraining6moData::TRAINING_REGULATED_FEATURES
input_feat = diff$feature_ID[diff$tissue == "LIVER" & diff$assay == "TRNSCRPT"]
map = MotrpacRatTraining6moData::FEATURE_TO_GENE_FILT
input = unique(map$gene_symbol[map$feature_ID %in% input_feat])
background = MotrpacRatTraining6moData::GENE_UNIVERSES$gene_symbol$TRNSCRPT$LIVER
res = gene_pathway_enrichment(input, background)
head(res)
#>     query significant term_size query_size intersection_size  precision
#> 1 query_1        TRUE        18        253                 8 0.03162055
#> 2 query_1        TRUE        17        253                 8 0.03162055
#> 3 query_1        TRUE        17        253                 6 0.02371542
#> 4 query_1        TRUE        62        253                 9 0.03557312
#> 5 query_1        TRUE        13        253                 5 0.01976285
#> 6 query_1        TRUE       104        253                11 0.04347826
#>      recall            term_id source                      term_name
#> 1 0.4444444         KEGG:00100   KEGG           Steroid biosynthesis
#> 2 0.4705882           WP:WP632     WP         Cholesterol metabolism
#> 3 0.3529412  REAC:R-RNO-191273   REAC       Cholesterol biosynthesis
#> 4 0.1451613 REAC:R-RNO-8957322   REAC         Metabolism of steroids
#> 5 0.3846154 REAC:R-RNO-3371571   REAC HSF1-dependent transactivation
#> 6 0.1057692  REAC:R-RNO-211859   REAC          Biological oxidations
#>   effective_domain_size source_order            parents
#> 1                 13525           13         KEGG:00000
#> 2                 13525          120          WP:000000
#> 3                 13525          275 REAC:R-RNO-8957322
#> 4                 13525          826  REAC:R-RNO-556833
#> 5                 13525          626 REAC:R-RNO-3371556
#> 6                 13525          171 REAC:R-RNO-1430728
#>                                           evidence_codes
#> 1                KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG
#> 2                                WP,WP,WP,WP,WP,WP,WP,WP
#> 3                          REAC,REAC,REAC,REAC,REAC,REAC
#> 4           REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC
#> 5                               REAC,REAC,REAC,REAC,REAC
#> 6 REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC
#>                                                              intersection
#> 1                         Hsd17b7,Cyp51,Sqle,Dhcr7,Tm7sf2,Fdft1,Msmo1,Lss
#> 2                               Idi1,Apoc2,Dhcr7,Fdft1,Msmo1,Fdps,Mvk,Lss
#> 3                                     Cyp51,Sqle,Dhcr7,Tm7sf2,Fdft1,Msmo1
#> 4             Cyp51,Serpina6,Sqle,Slc27a5,Stard4,Dhcr7,Tm7sf2,Fdft1,Msmo1
#> 5                                        Fkbp4,Cryab,Hsp90ab1,Hspb8,Hspa8
#> 6 Tpst1,Nnmt,Cyp51,Ugp2,Cyb5r3,Glyat,Gclm,Slc26a2,Hsp90ab1,Cyp2a1,Cyp4f39
#>   gost_adj_p_value computed_p_value BH_adj_p_value
#> 1     1.299433e-07     4.997818e-10   1.536829e-07
#> 2     1.834283e-08     2.821974e-10   1.536829e-07
#> 3     9.165948e-05     4.204563e-07   8.619355e-05
#> 4     2.732293e-04     2.078412e-06   3.083229e-04
#> 5     2.732293e-04     2.506691e-06   3.083229e-04
#> 6     3.311635e-04     3.797747e-06   3.892691e-04