Analyze Fuzzy C-Means Clustering Results with CAMERA-PR
run_cluster_cameraPR.RdTest for localization of molecular signatures to cluster
centroids. Performs modified upper-tail signed rank tests with
TMSig::cameraPR.matrix on the matrices of cluster membership
probabilities obtained from fuzzy c-means (FCM) clustering. Identifies
molecular signatures that closely follow the trajectories of each cluster's
centroid.
Usage
run_cluster_cameraPR(
FCM,
selected_omes = c("transcript-rna-seq", "prot-pr", "prot-ph", "metab"),
selected_tissues = "all",
database = names(MotrpacHumanPreSuspensionAnalysis::MOLECULAR_SIGNATURES),
path_to_gmt = NULL,
min_size = 5L,
overlap_cutoff = 0.7
)Arguments
- FCM
fuzzy c-means (FCM) clustering results. Output of
run_cmeans. This should be a named list of objects of classfclust.- selected_omes
character; one or more character strings selected from the following options:
"transcript-rna-seq","prot-pr","prot-ph", and"metab"(all metabolomics platforms). Passed toload_differential_analysis.- selected_tissues
character; passed to
load_differential_analysis. One or more of the following:"all","muscle","adipose", or"blood".- database
character; one or more names specifying the database(s) to test. Options are (case insensitive)
"BIOCARTA","KEGG_MEDICUS","PID","REACTOME","WP"(WikiPathways database),"GOBP","GOCC","GOMF","MITOCARTA"(MitoCarta3.0 database),"PSP"(PhosphoSitePlus kinases; only valid whenselected_omescontains"prot-ph"), or"REFMET"(RefMet chemical subclasses; only valid whenselected_omescontains"metab"). SeeMOLECULAR_SIGNATURESfor details.- path_to_gmt
character; (optional) path to one or more GMT files. Passed to
TMSig::readGMT. If provided,databaseis ignored.- min_size
integer; the minimum set size for testing.
- overlap_cutoff
numeric; the minimum proportion of genes in each set that must appear in a given dataset. Used to pre-filter sets. Does not affect
"metab"or"prot-ph"results. This will always be 0.1 for"prot-ol"results.
Value
An object of class data.frame with the following columns:
tissuefactor; the tissue.
assayfactor; the omics assay.
clusterfactor; the cluster number. The clusters have the same meaning across omics assays measured in the same tissue, but they have no such relationship across tissues.
collectionfactor; the broad molecular signature collection. See
SET_TO_IDfor details.databasefactor; the molecular signature database. See
SET_TO_IDfor details.set_idcharacter; a unique ID for the molecular signature. See
SET_TO_IDfor details.setcharacter; the molecular signature being tested. For global proteomics and transcriptomics, these are gene sets. For phosphoproteomics, these are kinase sets.
set_shortcharacter; a shortened version of
set. SeeSET_TO_IDfor details.set_sizeinteger; the number of molecules in the set that were present in the DA results for that specific tissue/assay combination.
set_size_DBinteger; the number of molecules in the set, as defined in the GMT file.
size_rationumeric; the ratio of
set_sizetoset_size_DB, rounded to the nearest thousandth. A measure of confidence that the gene set being tested is correctly described by the entry in thesetcolumn. While smaller values do not necessarily indicate that the results are unreliable, terms from the gene set databases should be treated with caution.p_valuenumeric; the upper-tail p-value.
adj_p_valuenumeric; the BH-adjusted p-value. P-values are adjusted within each combination of tissue, assay, collection, and cluster.
Examples
if (FALSE) { # \dontrun{
FCM <- run_cmeans()
# Run CAMERA-PR with all available molecular signatures
cluster_res <- run_cluster_cameraPR(FCM = FCM)
head(cluster_res)
} # }