Skip to contents

Worker function run by custom_cluster_pathway_enrichment(). Not intended to be run independently.

Usage

pathway_hypergeom_test(
  feature_to_gene,
  universe,
  cluster_res,
  pathway_member_list,
  source,
  gene_identifier_type,
  iterations,
  i,
  min_input_set_size,
  logfile,
  add_ensembl_intersection
)

Arguments

feature_to_gene

data frame, map between intersection_id_type and gene symbols. Columns must include "feature_ID", "gene_symbol", "ensembl_gene", and "kegg_id".

universe

list of lists of character vectors, named first by assay (i.e., MotrpacRatTraining6moData::ASSAY_ABBREV) and then by tissue (i.e., MotrpacRatTraining6moData::TISSUE_ABBREV). Vectors provide the full set of gene symbols associated with features tested during differential analysis. For example, universe$TRNSCRPT$LUNG should be a character vector of expressed genes in the lung, where the type of gene identifier matches gene_identifier_type. [MotrpacRatTraining6moData::GENE_UNIVERSES]$gene_symbol by default.

cluster_res

Either a data frame or a list of lists. If a data frame, it needs at least two columns: "feature" and "cluster". The "feature" column should be in the format 'MotrpacRatTraining6moData::ASSAY_ABBREV;MotrpacRatTraining6moData::TISSUE_ABBREV;feature_ID'. If a list of lists, each sublist must be named with the cluster name (character string), and the values must be features in the format 'MotrpacRatTraining6moData::ASSAY_ABBREV;MotrpacRatTraining6moData::TISSUE_ABBREV;feature_ID'.

pathway_member_list

named list of character vectors where names are pathway names and values are pathway members. Pathway members must match values in the gene_identifer_type column of feature_to_gene.

source

optional character string to define the source of pathway_member_list

gene_identifier_type

character, column in feature_to_gene that matches the gene identifier type in universe. "gene_symbol" by default.

iterations

data frame passed from custom_cluster_pathway_enrichment() that defines each iteration of this function

i

integer index passed from custom_cluster_pathway_enrichment() that specifies which row of iterations to use

min_input_set_size

integer, input must have this minimum number of unique mappable gene IDs to attempt enrichment

logfile

optional character, path to log of failed iterations

add_ensembl_intersection

bool, whether to add a intersection_ensembl column, which converts gene IDs in the intersection to Ensembl IDs

Value

data table with enrichment results, or NULL if no enrichment results were returned:

term_size

integer, number of genes that are annotated to the term

query_size

integer, number of genes that were included in the query

intersection_size

integer, the number of genes in the input query that are annotated to the corresponding term

term_id

character, unique term/pathway identifier

source

character, the abbreviation of the data source for the term/pathway

term_name

character, term/pathway name

effective_domain_size

integer, the total number of genes in the universe used for the hypergeometric test

intersection

character, input gene IDs that intersect with the term/pathway

computed_p_value

double, nominal hypergeometric p-value

cluster

character, cluster label

tissue

character, tissue abbreviation, one of MotrpacRatTraining6moData::TISSUE_ABBREV

ome

character, assay abbreviation, one of MotrpacRatTraining6moData::ASSAY_ABBREV