Worker function run by custom_cluster_pathway_enrichment()
.
Not intended to be run independently.
Usage
pathway_hypergeom_test(
feature_to_gene,
universe,
cluster_res,
pathway_member_list,
source,
gene_identifier_type,
iterations,
i,
min_input_set_size,
logfile,
add_ensembl_intersection
)
Arguments
- feature_to_gene
data frame, map between
intersection_id_type
and gene symbols. Columns must include "feature_ID", "gene_symbol", "ensembl_gene", and "kegg_id".- universe
list of lists of character vectors, named first by assay (i.e., MotrpacRatTraining6moData::ASSAY_ABBREV) and then by tissue (i.e., MotrpacRatTraining6moData::TISSUE_ABBREV). Vectors provide the full set of gene symbols associated with features tested during differential analysis. For example,
universe$TRNSCRPT$LUNG
should be a character vector of expressed genes in the lung, where the type of gene identifier matchesgene_identifier_type
.[MotrpacRatTraining6moData::GENE_UNIVERSES]$gene_symbol
by default.- cluster_res
Either a data frame or a list of lists. If a data frame, it needs at least two columns: "feature" and "cluster". The "feature" column should be in the format 'MotrpacRatTraining6moData::ASSAY_ABBREV;MotrpacRatTraining6moData::TISSUE_ABBREV;feature_ID'. If a list of lists, each sublist must be named with the cluster name (character string), and the values must be features in the format 'MotrpacRatTraining6moData::ASSAY_ABBREV;MotrpacRatTraining6moData::TISSUE_ABBREV;feature_ID'.
- pathway_member_list
named list of character vectors where names are pathway names and values are pathway members. Pathway members must match values in the
gene_identifer_type
column offeature_to_gene
.- source
optional character string to define the source of
pathway_member_list
- gene_identifier_type
character, column in
feature_to_gene
that matches the gene identifier type inuniverse
. "gene_symbol" by default.- iterations
data frame passed from
custom_cluster_pathway_enrichment()
that defines each iteration of this function- i
integer index passed from
custom_cluster_pathway_enrichment()
that specifies which row ofiterations
to use- min_input_set_size
integer, input must have this minimum number of unique mappable gene IDs to attempt enrichment
- logfile
optional character, path to log of failed iterations
- add_ensembl_intersection
bool, whether to add a
intersection_ensembl
column, which converts gene IDs in the intersection to Ensembl IDs
Value
data table with enrichment results, or NULL if no enrichment results were returned:
term_size
integer, number of genes that are annotated to the term
query_size
integer, number of genes that were included in the query
intersection_size
integer, the number of genes in the input query that are annotated to the corresponding term
term_id
character, unique term/pathway identifier
source
character, the abbreviation of the data source for the term/pathway
term_name
character, term/pathway name
effective_domain_size
integer, the total number of genes in the universe used for the hypergeometric test
intersection
character, input gene IDs that intersect with the term/pathway
computed_p_value
double, nominal hypergeometric p-value
cluster
character, cluster label
tissue
character, tissue abbreviation, one of MotrpacRatTraining6moData::TISSUE_ABBREV
ome
character, assay abbreviation, one of MotrpacRatTraining6moData::ASSAY_ABBREV