Meta-regression of metabolomics differential analysis results — METAB_DA

Timewise summary statistics and training FDR from differential analysis (DA) that tests the effect of training on each metabolite within each sex. One data frame per tissue.

For metabolites measured on more than one platform, the consensus result may be provided. For metabolites measured on a single platform, these results are identical to METAB_DA.

Usage

METAB_ADRNL_DA_METAREG

METAB_BAT_DA_METAREG

METAB_COLON_DA_METAREG

METAB_CORTEX_DA_METAREG

METAB_HEART_DA_METAREG

METAB_HIPPOC_DA_METAREG

METAB_HYPOTH_DA_METAREG

METAB_KIDNEY_DA_METAREG

METAB_LIVER_DA_METAREG

METAB_LUNG_DA_METAREG

METAB_OVARY_DA_METAREG

METAB_PLASMA_DA_METAREG

METAB_SKMGN_DA_METAREG

METAB_SKMVL_DA_METAREG

METAB_SMLINT_DA_METAREG

METAB_SPLEEN_DA_METAREG

METAB_TESTES_DA_METAREG

METAB_VENACV_DA_METAREG

METAB_WATSC_DA_METAREG

Format

A data frame with 30 variables:

feature: character, unique feature identifier in the format 'ASSAY_ABBREV;TISSUE_ABBREV;feature_ID' only for training-regulated features at 5% IHW FDR. For redundant differential features, 'feature_ID' is prepended with the specific platform to make unique identifiers. See REPEATED_FEATURES for details.
assay: character, assay abbreviation, one of ASSAY_ABBREV
assay_code: character, assay code used in data release. See MotrpacBicQC::assay_codes.
tissue: character, tissue abbreviation, one of TISSUE_ABBREV
tissue_code: character, tissue code used in data release. See MotrpacBicQC::bic_animal_tissue_code.
feature_ID: character, MoTrPAC feature identifier
dataset: character, metabolomics platform (metab-u-ionpneg, metab-u-lrpneg, metab-u-lrppos, metab-u-hilicpos, metab-u-rpneg, metab-u-rppos, metab-t-amines, metab-t-oxylipneg, metab-t-tca, metab-t-nuc, metab-t-acoa, metab-t-ka) the feature was measured in. 'meta-reg' specifies results from the metabolomics meta-regression for repeated features.
site: character, Chemical Analysis Site (CAS) name
is_targeted: logical, is this a targeted platform?
sex: character, one of 'male' or 'female'
comparison_group: character, time point of trained animals compared to the sex-matched sedentary control animals, one of '1w', '2w', '4w', '8w'
p_value: double, unadjusted p-value for the difference between 'comparison_group' and the group of sex-matched sedentary control animals
adj_p_value: double, adjusted p-value from 'p_value' column. P-values are BY-adjusted across all datasets within a given assay/ome.
logFC: double, log fold-change where the numerator is 'comparison_group', e.g., '1w', and the denominator is the group of sex-matched sedentary control animals
logFC_se: double, standard error of the log fold-change
tscore: double, t statistic
zscore: double, z statistic
covariates: character, comma-separated list of adjustment variables or NA
comparison_average_intensity: double, average intensity among the replicates in 'comparison_group'
reference_average_intensity: double, average intensity among the replicates in the group of sex-matched sedentary control animals
metabolite_refmet: character, RefMet name of metabolite
cv: double, feature coefficient of variation in the dataset
metabolite: character, name of metabolite as appears in the CAS's data
control_cv: double, feature coefficient of variation in the dataset
mz: double, mass over charge
rt: double, retention time
neutral_mass: double, neutral mass
meta_reg_het_p: double, for metabolites with multiple measurements, the meta-regression heterogeneity p-value, where a smaller p-value indicates more disagreement between platforms. One value per feature (not per training group).
meta_reg_pvalue: double, for metabolites with multiple measurements, the meta-regression p-value. One value per feature (not per training group).
selection_fdr: double, adjusted training p-value used to select training-regulated analytes. P-values are IHW-adjusted across all datasets within a given assay with tissue as a covariate.

An object of class data.frame with 9086 rows and 30 columns.

An object of class data.frame with 1888 rows and 30 columns.

An object of class data.frame with 1672 rows and 30 columns.

An object of class data.frame with 9872 rows and 30 columns.

An object of class data.frame with 8920 rows and 30 columns.

An object of class data.frame with 1736 rows and 30 columns.

An object of class data.frame with 10376 rows and 30 columns.

An object of class data.frame with 10264 rows and 30 columns.

An object of class data.frame with 10208 rows and 30 columns.

An object of class data.frame with 912 rows and 30 columns.

An object of class data.frame with 8696 rows and 30 columns.

An object of class data.frame with 8472 rows and 30 columns.

An object of class data.frame with 1584 rows and 30 columns.

An object of class data.frame with 1800 rows and 30 columns.

An object of class data.frame with 1816 rows and 30 columns.

An object of class data.frame with 872 rows and 30 columns.

An object of class data.frame with 1278 rows and 30 columns.

An object of class data.frame with 8808 rows and 30 columns.

Details

The metabolomics data were collected using different platforms from six different sites. Moreover, some platforms were targeted (i.e., quantified a predefined small set of metabolites of interest), whereas other platforms were untargeted. There were 1116 cases in which at least two sites measured the same metabolite in the same tissue. In these cases, we used meta-regression to integrate the differential analysis results, implemented using the metafor R package.

For a given metabolite \(m\), the input to this analysis included the timewise effect sizes \(y_{g,p}\) and their variances \(v_{g,p}\) where \(g\) denotes the analysis group, which is a combination of the training time point and the sex for which the summary statistics were computed using the regression models explained above, and \(p \in (1,...,n_{m})\) denotes the platform. If \(m\) had data from at least three platforms, of which at least one was untargeted and at least one was targeted, then we added nested random effects for both the platform and the targeted status. That is, in metafor’s notation we used: "mods ~ 0+analysis_group" and "random = list(~analysis_group|platform, ~analysis_group|is_targeted)". The inner|outer notation defines a blockwise dependence structure for the random effects, where different outer values are assumed to be independent, and the same outer values may be dependent based on their inner values. If the targeted status was redundant (e.g., we only had two platforms, one was targeted and the other was untargeted) then we kept the platform-level random effects only.

In practice, we observed that in 154 cases the default metafor optimizer failed to converge. In these cases, we modified the default parameters to optimizer = "nloptr", algorithm = "NLOPT_LN_SBPLX". Still, optimization failed in 61 additional cases. When both the default and alternative optimizers failed, we opted for a standard fixed effect model without random effects. Other than these cases, we had 361 models with random effects for both the platform and the targeted status, and 694 models with a platform-only random effect.

As a summary of each model we kept the overall model p-value (i.e., the modifiers p-value) \(QMp\) (used as the training p-value), and the residual heterogeneity p-value \(QEp\). We flagged 103 models with \(QEp < 0.001\) as having excessive heterogeneity. Using this definition we partitioned the meta-analysis results into three classes: (1) excessive heterogeneity, and has a targeted platform (57 cases), (2) excessive heterogeneity, without a targeted platform (46 cases), and (3) low heterogeneity (1013 cases). For class (2) we discarded the meta-analysis and kept the all platform-level results for m as-is. For class (1) we discarded the meta-analysis and kept only the targeted platform-level results for m. For class (3) we kept the meta-analysis results only (i.e., discarded the platform-level results), and used the meta-regression model to calculate the timewise summary statistics, i.e., time- and sex-specific meta-analysis effect sizes, their variance, and their p-values.

Reproduce our analysis with MotrpacRatTraining6mo::metab_meta_regression().