Skip to contents

Perform pathway enrichment for a list of genes. Wrapper for gprofiler2::gost().

Usage

gene_pathway_enrichment(
  input,
  background,
  organism = "rnorvegicus",
  databases = c("REAC", "WP", "KEGG"),
  min_pw_set_size = 10,
  max_pw_set_size = 200,
  return_gem = FALSE
)

Arguments

input

vector of gene identifiers

background

vector of gene identifiers in universe/background. Must include the input. Should include all genes that were candidates for input, e.g., expressed genes. If performing pathway enrichment on genes identified through MotrpacRatTraining6moData, see MotrpacRatTraining6moData::GENE_UNIVERSES. See custom_bg argument for gprofiler2::gost().

organism

character, species name. Default: "rnorvegicus" See organism argument for gprofiler2::gost().

databases

character, vector of databases in which to search. See sources argument for gprofiler2::gost().

min_pw_set_size

integer, exclude pathways smaller than this. Default: 10

max_pw_set_size

integer, exclude pathways larger than this. Default: 200

return_gem

bool, whether to return a data frame compatible with the Generic Enrichment Map (GEM) file format, which can be used as an input for the Cytoscape EnrichmentMap application. Default: FALSE

Value

a data frame of pathway enrichment results. If return_gem=TRUE, column names are changed add added to be compatible with the Generic Enrichment Map (GEM) file format. Otherwise, the columns are defined as follows:

query

character, the name of the input query which by default is the order of query with the prefix "query_"

significant

bool, whether the gprofiler2::gost() enrichment p-value is less than 0.05

term_size

integer, number of genes that are annotated to the term

query_size

integer, number of genes that were included in the query (from gprofiler2::gost())

intersection_size

integer, the number of genes in the input query that are annotated to the corresponding term

precision

double, the proportion of genes in the input list that are annotated to the function, defined as intersection_size/query_size

recall

double, the proportion of functionally annotated genes that the query recovers, defined as intersection_size/term_size

term_id

character, unique term/pathway identifier

source

character, the abbreviation of the data source for the term/pathway

term_name

character, term/pathway name

effective_domain_size

integer, the total number of genes in the universe used for the hypergeometric test

source_order

integer, numeric order for the term within its data source

parents

list of term IDs that are hierarchically directly above the term. For non-hierarchical data sources this points to an artificial root node

evidence_codes

character, comma-separated evidence codes

intersection

character, input gene IDs that intersect with the term/pathway

gost_adj_p_value

double, improperly adjusted hypergeometric p-value from gprofiler2::gost(). For reference only; should not be used to filter results unless there was only a single list of genes in the input.

computed_p_value

double, nominal hypergeometric p-value, computed from the gprofiler2::gost() output

BH_adj_p_value

double, BH-adjusted p-values, calculated on computed_p_value

Examples

# Perform pathway enrichment for differential transcripts in the liver
diff = MotrpacRatTraining6moData::TRAINING_REGULATED_FEATURES
input_feat = diff$feature_ID[diff$tissue == "LIVER" & diff$assay == "TRNSCRPT"]
map = MotrpacRatTraining6moData::FEATURE_TO_GENE_FILT
input = unique(map$gene_symbol[map$feature_ID %in% input_feat])
background = MotrpacRatTraining6moData::GENE_UNIVERSES$gene_symbol$TRNSCRPT$LIVER
res = gene_pathway_enrichment(input, background)
head(res)
#>     query significant term_size query_size intersection_size  precision
#> 1 query_1        TRUE        18        253                 8 0.03162055
#> 2 query_1        TRUE        17        253                 8 0.03162055
#> 3 query_1        TRUE        17        253                 6 0.02371542
#> 4 query_1        TRUE        62        253                 9 0.03557312
#> 5 query_1        TRUE        13        253                 5 0.01976285
#> 6 query_1        TRUE       104        253                11 0.04347826
#>      recall            term_id source                      term_name
#> 1 0.4444444         KEGG:00100   KEGG           Steroid biosynthesis
#> 2 0.4705882           WP:WP632     WP         Cholesterol metabolism
#> 3 0.3529412  REAC:R-RNO-191273   REAC       Cholesterol biosynthesis
#> 4 0.1451613 REAC:R-RNO-8957322   REAC         Metabolism of steroids
#> 5 0.3846154 REAC:R-RNO-3371571   REAC HSF1-dependent transactivation
#> 6 0.1057692  REAC:R-RNO-211859   REAC          Biological oxidations
#>   effective_domain_size source_order            parents
#> 1                 13525           13         KEGG:00000
#> 2                 13525          120          WP:000000
#> 3                 13525          275 REAC:R-RNO-8957322
#> 4                 13525          826  REAC:R-RNO-556833
#> 5                 13525          626 REAC:R-RNO-3371556
#> 6                 13525          171 REAC:R-RNO-1430728
#>                                           evidence_codes
#> 1                KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG,KEGG
#> 2                                WP,WP,WP,WP,WP,WP,WP,WP
#> 3                          REAC,REAC,REAC,REAC,REAC,REAC
#> 4           REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC
#> 5                               REAC,REAC,REAC,REAC,REAC
#> 6 REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC,REAC
#>                                                              intersection
#> 1                         Hsd17b7,Cyp51,Sqle,Dhcr7,Tm7sf2,Fdft1,Msmo1,Lss
#> 2                               Idi1,Apoc2,Dhcr7,Fdft1,Msmo1,Fdps,Mvk,Lss
#> 3                                     Cyp51,Sqle,Dhcr7,Tm7sf2,Fdft1,Msmo1
#> 4             Cyp51,Serpina6,Sqle,Slc27a5,Stard4,Dhcr7,Tm7sf2,Fdft1,Msmo1
#> 5                                        Fkbp4,Cryab,Hsp90ab1,Hspb8,Hspa8
#> 6 Tpst1,Nnmt,Cyp51,Ugp2,Cyb5r3,Glyat,Gclm,Slc26a2,Hsp90ab1,Cyp2a1,Cyp4f39
#>   gost_adj_p_value computed_p_value BH_adj_p_value
#> 1     1.299433e-07     4.997818e-10   1.536829e-07
#> 2     1.834283e-08     2.821974e-10   1.536829e-07
#> 3     9.165948e-05     4.204563e-07   8.619355e-05
#> 4     2.732293e-04     2.078412e-06   3.083229e-04
#> 5     2.732293e-04     2.506691e-06   3.083229e-04
#> 6     3.311635e-04     3.797747e-06   3.892691e-04