Skip to contents

Take a set of differential analysis results, either for a specific tissue and ome or from a user-supplied table, convert to GCT format, and write to file for GSEA.

Usage

prepare_gsea_input(
  tissue = NULL,
  assay = NULL,
  gene_id_type = "gene_symbol",
  outdir = ".",
  outfile_prefix = NULL,
  input = NULL,
  cast_vars = c("sex", "comparison_group"),
  feature_to_gene_map = NULL
)

Arguments

tissue

character, tissue abbreviation, one of MotrpacRatTraining6moData::TISSUE_ABBREV. Must be specified if cusotm input is not provided.

assay

character, assay abbreviation, one of "PROT", "PHOSPHO", "TRNSCRPT", "ACETYL", "UBIQ"

gene_id_type

character, gene identifier type. Must match the gene ID type in the gene set database you plan on using for GSEA. One of "gene_symbol", "entrez", "ensembl", "refseq". Default: "gene_symbol"

outdir

character, output directory for GCT file. The directory is created if it does not already exist. Current directory by default.

outfile_prefix

character, prefix for output GCT file. By default, this prefix includes the specified tissue and assay and current date. Must be specified for custom input data.

input

optional data frame if the user wants to perform this analysis for a custom set of differential analysis results. Required columns are "tscore", "feature_ID", and cast_vars OR "tscore", cast_vars, and the gene identifier indicated by gene_id_type. If a "feature_ID" column exists but not a column corresponding to gene_id_type, then feature_to_gene_map must map between "feature_ID" and gene_id_type.

cast_vars

character vector of column names in the differential analysis results that are used to convert the table from long to wide format, with t-scores as the value variable. See data.table::dcast() for more details. Default: "sex", "comparison_group"

feature_to_gene_map

data frame, map between "feature_ID" and gene_id_type. MotrpacRatTraining6moData::FEATURE_TO_GENE_FILT if not otherwise specified.

Value

character, path of the GCT file

Details

T-scores from the timewise differential analysis results are used for scores. Feature-level data is summarized into gene-level data using the maximum absolute t-score.

Examples

# Input for GSEA where gene set databases use gene symbols as IDs.
# This applies for the default gene set database 
# when 'method="gsea"' in 'ssGSEA2_wrapper()'.
prepare_gsea_input("HEART","PROT",outdir="/tmp")
#> PROT_HEART_DA
#> Saving file to  /tmp/MotrpacRatTraining6mo_gsea_HEART_PROT_20231110.gct 
#> Dimensions of matrix: [8760x8]
#> Setting precision to 4
#> Saved.
#> [1] "/tmp/MotrpacRatTraining6mo_gsea_HEART_PROT_20231110.gct"
prepare_gsea_input("LIVER","PHOSPHO",outdir="/tmp")
#> PHOSPHO_LIVER_DA
#> Saving file to  /tmp/MotrpacRatTraining6mo_gsea_LIVER_PHOSPHO_20231110.gct 
#> Dimensions of matrix: [8120x8]
#> Setting precision to 4
#> Saved.
#> [1] "/tmp/MotrpacRatTraining6mo_gsea_LIVER_PHOSPHO_20231110.gct"
prepare_gsea_input("LIVER","ACETYL",outdir="/tmp")
#> ACETYL_LIVER_DA
#> Saving file to  /tmp/MotrpacRatTraining6mo_gsea_LIVER_ACETYL_20231110.gct 
#> Dimensions of matrix: [2403x8]
#> Setting precision to 4
#> Saved.
#> [1] "/tmp/MotrpacRatTraining6mo_gsea_LIVER_ACETYL_20231110.gct"
prepare_gsea_input("LIVER","UBIQ",outdir="/tmp")
#> UBIQ_LIVER_DA
#> Saving file to  /tmp/MotrpacRatTraining6mo_gsea_LIVER_UBIQ_20231110.gct 
#> Dimensions of matrix: [2529x8]
#> Setting precision to 4
#> Saved.
#> [1] "/tmp/MotrpacRatTraining6mo_gsea_LIVER_UBIQ_20231110.gct"
prepare_gsea_input("LIVER","TRNSCRPT",outdir="/tmp")
#> TRNSCRPT_LIVER_DA
#> Saving file to  /tmp/MotrpacRatTraining6mo_gsea_LIVER_TRNSCRPT_20231110.gct 
#> Dimensions of matrix: [14276x8]
#> Setting precision to 4
#> Saved.
#> [1] "/tmp/MotrpacRatTraining6mo_gsea_LIVER_TRNSCRPT_20231110.gct"

# Input for GSEA with Mitocarta (i.e., 'method="gsea_mitocarta"' 
# in 'ssGSEA2_wrapper()'), which uses RefSeq IDs 
prepare_gsea_input("LIVER","TRNSCRPT",outdir="/tmp",gene_id_type="refseq")
#> TRNSCRPT_LIVER_DA
#> Saving file to  /tmp/MotrpacRatTraining6mo_gsea_LIVER_TRNSCRPT_20231110.gct 
#> Dimensions of matrix: [17079x8]
#> Setting precision to 4
#> Saved.
#> [1] "/tmp/MotrpacRatTraining6mo_gsea_LIVER_TRNSCRPT_20231110.gct"

# "Custom" input
res = combine_da_results(tissues = "KIDNEY", assays = "PROT")
#> PROT_KIDNEY_DA
# add dummy column
res$gene_symbol = res$feature_ID
prepare_gsea_input(input=res, outdir="/tmp", outfile_prefix="KIDNEY_PROT")
#> Saving file to  /tmp/KIDNEY_PROT.gct 
#> Dimensions of matrix: [9852x8]
#> Setting precision to 4
#> Saved.
#> [1] "/tmp/KIDNEY_PROT.gct"