Skip to contents

This feature-to-gene map associates every feature tested in differential analysis with a gene and includes all current gene identifiers available in RGD as of 11/12/2020.

Usage

FEATURE_TO_GENE

Format

A data frame with 4044034 rows and 9 variables:

entrez_gene

double, Entrez gene ID

feature_ID

character, MoTrPAC feature identifier

rgd_gene

integer, RGD gene ID

gene_symbol

character, official gene symbol

old_gene_symbol

character, semicolon-separated list of deprecated or alias gene symbols

ensembl_gene

character, Ensembl gene ID

relationship_to_gene

double, for ATAC and METHYL features only. Distance from the closest edge of the feature to the start or end of the closest gene, whichever is closer. A value of 0 means there is non-zero overlap between the feature and the gene. A negative value means the feature is upstream of "geneStart". A a positive value means the feature is downstream of "geneEnd". Note that "geneStart" and "geneEnd" are strand-agnostic, i.e. "geneStart" is always less than "geneEnd", even if the gene is on the negative strand ("geneStrand" == 2).

custom_annotation

character, a version of the ChIPseeker annotations with many corrections. Values include: "Distal Intergenic", "Promoter (<=1kb)", "Exon", "Promoter (1-2kb)", "Downstream (<5kb)", "Upstream (<5kb)", "5' UTR", "Intron", "3' UTR", "Overlaps Gene"

kegg_id

character, KEGG ID for METAB features only. See MotrpacBicQC::metabolomics_data_dictionary for more details.

Source

pass1b-06/analysis/resources/master_feature_to_gene_20211116.RData

Details

All proteomics feature IDs (RefSeq accessions) were mapped to gene symbols and Entrez IDs using NCBI’s "gene2refseq" mapping files (https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz, downloaded on 2020/12/18). Epigenomics features were mapped to the nearest gene using ChIPseeker::annotatePeak() with Ensembl gene annotation (Rattus norvegicus release 96). See MotrpacRatTraining6mo::get_peak_annotations for implementation. Gene symbols, Entrez IDs, Ensembl IDs, and RGD IDs were mapped to each other using RGD’s rat gene annotation (https://download.rgd.mcw.edu/data_release/RAT/GENES_RAT.txt, generated on 2021/11/12).

For fast(er) indexing, convert this object to a [data.table::data.table()] and use 
[data.table::setkey()] to set the key to the column you are matching. 
This dramatically improves performance.