R/metabolomics_data_dictionary.R
get_and_validate_mdd.Rd
This function fetches and validates the Metabolomics Data Dictionary from the Metabolomics Workbench. It provides options to remove duplicates.
get_and_validate_mdd(remove_duplications = FALSE, verbose = TRUE)
Logical; if TRUE
, removes duplicate entries based on
the refmet_name
column.
Logical; if TRUE
(default), displays progress messages and warnings
during the function execution.
Returns a data frame with the following columns:
refmet_name
Character; the name standarized refmet name
pubchem_cid
Character; the PubChem compound ID.
lm_id
Character; the LIPID MAPS ID.
inchi_key
Character; the International Chemical Identifier Key.
exactmass
Numeric; the exact mass of the metabolite.
formula
Character; the chemical formula of the metabolite.
super_class
Character; the superclass category of the metabolite.
main_class
Character; the main class category of the metabolite.
sub_class
Character; the subclass category of the metabolite.
hmdb_id
Character; the Human Metabolome Database ID.
kegg_id
Character; the Kyoto Encyclopedia of Genes and Genomes ID.
Each row of the data frame represents a unique metabolite entry from the Metabolomics Workbench Data Dictionary.
This function downloads the entire RefMet database from the Metabolomics
Workbench using their REST API. The data is initially fetched in JSON format and
then converted to a data frame. The function checks for the presence of a 'name'
column in the data frame, renaming it to 'refmet_name' for consistency. It also
provides an option to remove duplicate entries based on the 'refmet_name' column.
If duplicates are found and remove_duplications
is FALSE
, the function will
list the duplicated IDs but will not remove them. This can be helpful for reviewing
the data quality and consistency.
if (FALSE) {
refmet <- get_and_validate_mdd(remove_duplications = TRUE, verbose = TRUE)
head(refmet)
}