Get and Validate the Entire RefMet Database from Metabolomics Workbench
Source:R/metabolomics_data_dictionary.R
get_and_validate_mdd.RdThis function fetches and validates the Metabolomics Data Dictionary from the Metabolomics Workbench. It provides options to remove duplicates.
Value
Returns a data frame with the following columns:
refmet_nameCharacter; the name standarized refmet name
pubchem_cidCharacter; the PubChem compound ID.
lm_idCharacter; the LIPID MAPS ID.
inchi_keyCharacter; the International Chemical Identifier Key.
exactmassNumeric; the exact mass of the metabolite.
formulaCharacter; the chemical formula of the metabolite.
super_classCharacter; the superclass category of the metabolite.
main_classCharacter; the main class category of the metabolite.
sub_classCharacter; the subclass category of the metabolite.
hmdb_idCharacter; the Human Metabolome Database ID.
kegg_idCharacter; the Kyoto Encyclopedia of Genes and Genomes ID.
Each row of the data frame represents a unique metabolite entry from the Metabolomics Workbench Data Dictionary.
Details
This function downloads the entire RefMet database from the Metabolomics
Workbench using their REST API. The data is initially fetched in JSON format and
then converted to a data frame. The function checks for the presence of a 'name'
column in the data frame, renaming it to 'refmet_name' for consistency. It also
provides an option to remove duplicate entries based on the 'refmet_name' column.
If duplicates are found and remove_duplications is FALSE, the function will
list the duplicated IDs but will not remove them. This can be helpful for reviewing
the data quality and consistency.
Examples
if (FALSE) { # \dontrun{
refmet <- get_and_validate_mdd(remove_duplications = TRUE, verbose = TRUE)
head(refmet)
} # }