Take a set of differential analysis results, either for a specific tissue and ome or from a user-supplied table, convert to GCT format, and write to file for GSEA.
Usage
prepare_gsea_input(
tissue = NULL,
assay = NULL,
gene_id_type = "gene_symbol",
outdir = ".",
outfile_prefix = NULL,
input = NULL,
cast_vars = c("sex", "comparison_group"),
feature_to_gene_map = NULL
)
Arguments
- tissue
character, tissue abbreviation, one of MotrpacRatTraining6moData::TISSUE_ABBREV. Must be specified if cusotm
input
is not provided.- assay
character, assay abbreviation, one of "PROT", "PHOSPHO", "TRNSCRPT", "ACETYL", "UBIQ"
- gene_id_type
character, gene identifier type. Must match the gene ID type in the gene set database you plan on using for GSEA. One of "gene_symbol", "entrez", "ensembl", "refseq". Default: "gene_symbol"
- outdir
character, output directory for GCT file. The directory is created if it does not already exist. Current directory by default.
- outfile_prefix
character, prefix for output GCT file. By default, this prefix includes the specified tissue and assay and current date. Must be specified for custom input data.
- input
optional data frame if the user wants to perform this analysis for a custom set of differential analysis results. Required columns are "tscore", "feature_ID", and
cast_vars
OR "tscore",cast_vars
, and the gene identifier indicated bygene_id_type
. If a "feature_ID" column exists but not a column corresponding togene_id_type
, thenfeature_to_gene_map
must map between "feature_ID" andgene_id_type
.- cast_vars
character vector of column names in the differential analysis results that are used to convert the table from long to wide format, with t-scores as the value variable. See
data.table::dcast()
for more details. Default: "sex", "comparison_group"- feature_to_gene_map
data frame, map between "feature_ID" and
gene_id_type
. MotrpacRatTraining6moData::FEATURE_TO_GENE_FILT if not otherwise specified.
Details
T-scores from the timewise differential analysis results are used for scores. Feature-level data is summarized into gene-level data using the maximum absolute t-score.
Examples
# Input for GSEA where gene set databases use gene symbols as IDs.
# This applies for the default gene set database
# when 'method="gsea"' in 'ssGSEA2_wrapper()'.
prepare_gsea_input("HEART","PROT",outdir="/tmp")
#> PROT_HEART_DA
#> Saving file to /tmp/MotrpacRatTraining6mo_gsea_HEART_PROT_20240930.gct
#> Dimensions of matrix: [8760x8]
#> Setting precision to 4
#> Saved.
#> [1] "/tmp/MotrpacRatTraining6mo_gsea_HEART_PROT_20240930.gct"
prepare_gsea_input("LIVER","PHOSPHO",outdir="/tmp")
#> PHOSPHO_LIVER_DA
#> Saving file to /tmp/MotrpacRatTraining6mo_gsea_LIVER_PHOSPHO_20240930.gct
#> Dimensions of matrix: [8120x8]
#> Setting precision to 4
#> Saved.
#> [1] "/tmp/MotrpacRatTraining6mo_gsea_LIVER_PHOSPHO_20240930.gct"
prepare_gsea_input("LIVER","ACETYL",outdir="/tmp")
#> ACETYL_LIVER_DA
#> Saving file to /tmp/MotrpacRatTraining6mo_gsea_LIVER_ACETYL_20240930.gct
#> Dimensions of matrix: [2403x8]
#> Setting precision to 4
#> Saved.
#> [1] "/tmp/MotrpacRatTraining6mo_gsea_LIVER_ACETYL_20240930.gct"
prepare_gsea_input("LIVER","UBIQ",outdir="/tmp")
#> UBIQ_LIVER_DA
#> Saving file to /tmp/MotrpacRatTraining6mo_gsea_LIVER_UBIQ_20240930.gct
#> Dimensions of matrix: [2529x8]
#> Setting precision to 4
#> Saved.
#> [1] "/tmp/MotrpacRatTraining6mo_gsea_LIVER_UBIQ_20240930.gct"
prepare_gsea_input("LIVER","TRNSCRPT",outdir="/tmp")
#> TRNSCRPT_LIVER_DA
#> Saving file to /tmp/MotrpacRatTraining6mo_gsea_LIVER_TRNSCRPT_20240930.gct
#> Dimensions of matrix: [14276x8]
#> Setting precision to 4
#> Saved.
#> [1] "/tmp/MotrpacRatTraining6mo_gsea_LIVER_TRNSCRPT_20240930.gct"
# Input for GSEA with Mitocarta (i.e., 'method="gsea_mitocarta"'
# in 'ssGSEA2_wrapper()'), which uses RefSeq IDs
prepare_gsea_input("LIVER","TRNSCRPT",outdir="/tmp",gene_id_type="refseq")
#> TRNSCRPT_LIVER_DA
#> Saving file to /tmp/MotrpacRatTraining6mo_gsea_LIVER_TRNSCRPT_20240930.gct
#> Dimensions of matrix: [17079x8]
#> Setting precision to 4
#> Saved.
#> [1] "/tmp/MotrpacRatTraining6mo_gsea_LIVER_TRNSCRPT_20240930.gct"
# "Custom" input
res = combine_da_results(tissues = "KIDNEY", assays = "PROT")
#> PROT_KIDNEY_DA
# add dummy column
res$gene_symbol = res$feature_ID
prepare_gsea_input(input=res, outdir="/tmp", outfile_prefix="KIDNEY_PROT")
#> Saving file to /tmp/KIDNEY_PROT.gct
#> Dimensions of matrix: [9852x8]
#> Setting precision to 4
#> Saved.
#> [1] "/tmp/KIDNEY_PROT.gct"