Load raw counts and return the filtered and normalized data. Alternatively, the user can provide a numeric data frame of raw RNA-seq counts.
Usage
transcript_normalize_counts(
tissue,
min_cpm = 0.5,
min_num_samples = 2,
norm_method = "TMM",
counts = NULL
)
Arguments
- tissue
character, tissue abbreviation, one of MotrpacRatTraining6moData::TISSUE_ABBREV
- min_cpm
double, retain genes with more than
min_cpm
counts per million in at leastmin_num_samples
samples- min_num_samples
double, retain genes with more than
min_cpm
counts per million in at leastmin_num_samples
samples- norm_method
character, one of
c("TMM","TMMwsp","RLE","upperquartile","none")
. "TMM" by default.- counts
optional user-supplied numeric data frame or matrix where row names are gene IDs and column names are sample identifiers
Details
Note that while this function is identical to the code used to generate the
normalized RNA-seq data tables (MotrpacRatTraining6moData::TRNSCRPT_NORM_DATA)
and the normalized RNA-seq data available through the MoTrPAC Data Hub,
transcript_normalize_counts(tissue)
yields slightly fewer genes than its
corresponding MotrpacRatTraining6moData::TRNSCRPT_NORM_DATA object.
Investigation of this discrepancy suggests minor functional differences in the
version of edgeR::cpm()
used ~2.5 years apart. Find more details in
this GitHub issue.
Examples
norm_data = transcript_normalize_counts("LUNG")
# Simulate "user-supplied data"
counts = load_sample_data("LUNG", "TRNSCRPT", normalized=FALSE)
#> TRNSCRPT_LUNG_RAW_COUNTS
counts = df_to_numeric(counts)
norm_data = transcript_normalize_counts(counts = counts)