This feature-to-gene map associates every feature tested in differential analysis with a gene and includes all current gene identifiers available in RGD as of 11/12/2020.
A data frame with 4044034 rows and 9 variables:
double, Entrez gene ID
character, MoTrPAC feature identifier
integer, RGD gene ID
character, official gene symbol
character, semicolon-separated list of deprecated or alias gene symbols
character, Ensembl gene ID
double, for ATAC and METHYL features only. Distance from the closest edge of the feature to the start or end of the closest gene, whichever is closer. A value of 0 means there is non-zero overlap between the feature and the gene. A negative value means the feature is upstream of "geneStart". A a positive value means the feature is downstream of "geneEnd". Note that "geneStart" and "geneEnd" are strand-agnostic, i.e. "geneStart" is always less than "geneEnd", even if the gene is on the negative strand ("geneStrand" == 2).
character, a version of the
annotations with many corrections. Values include: "Distal Intergenic", "Promoter (<=1kb)", "Exon", "Promoter (1-2kb)", "Downstream (<5kb)", "Upstream (<5kb)", "5' UTR", "Intron", "3' UTR", "Overlaps Gene"kegg_id
character, KEGG ID for METAB features only. See MotrpacBicQC::metabolomics_data_dictionary for more details.
All proteomics feature IDs (RefSeq accessions) were mapped to gene
symbols and Entrez IDs using NCBI’s "gene2refseq" mapping files
(, downloaded on 2020/12/18).
Epigenomics features were mapped to the nearest gene using ChIPseeker::annotatePeak()
with Ensembl gene annotation (Rattus norvegicus release 96).
See MotrpacRatTraining6mo::get_peak_annotations for implementation.
Gene symbols, Entrez IDs, Ensembl IDs, and RGD IDs were mapped to each
other using RGD’s rat gene annotation
(, generated on 2021/11/12).
fast(er) indexing, convert this object to a [data.table::data.table()] and use
For ::setkey()] to set the key to the column you are matching.
[data.table This dramatically improves performance.