Subset of FEATURE_TO_GENE that excludes non-differential epigenetic features.
Format
A data frame with 241572 rows and 9 variables:
entrez_gene
double, Entrez gene ID
feature_ID
character, MoTrPAC feature identifier
rgd_gene
integer, RGD gene ID
gene_symbol
character, official gene symbol
old_gene_symbol
character, semicolon-separated list of deprecated or alias gene symbols
ensembl_gene
character, Ensembl gene ID
relationship_to_gene
double, for ATAC and METHYL features only. Distance from the closest edge of the feature to the start or end of the closest gene, whichever is closer. A value of 0 means there is non-zero overlap between the feature and the gene. A negative value means the feature is upstream of "geneStart". A a positive value means the feature is downstream of "geneEnd". Note that "geneStart" and "geneEnd" are strand-agnostic, i.e. "geneStart" is always less than "geneEnd", even if the gene is on the negative strand ("geneStrand" == 2).
custom_annotation
character, a version of the
ChIPseeker
annotations with many corrections. Values include: "Distal Intergenic", "Promoter (<=1kb)", "Exon", "Promoter (1-2kb)", "Downstream (<5kb)", "Upstream (<5kb)", "5' UTR", "Intron", "3' UTR", "Overlaps Gene"kegg_id
character, KEGG ID for METAB features only. See MotrpacBicQC::metabolomics_data_dictionary for more details.
Details
All proteomics feature IDs (RefSeq accessions) were mapped to gene
symbols and Entrez IDs using NCBI’s "gene2refseq" mapping files
(https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz, downloaded on 2020/12/18).
Epigenomics features were mapped to the nearest gene using ChIPseeker::annotatePeak()
with Ensembl gene annotation (Rattus norvegicus release 96).
See MotrpacRatTraining6mo::get_peak_annotations for implementation.
Gene symbols, Entrez IDs, Ensembl IDs, and RGD IDs were mapped to each
other using RGD’s rat gene annotation
(https://download.rgd.mcw.edu/data_release/RAT/GENES_RAT.txt, generated on 2021/11/12).
fast(er) indexing, convert this object to a [data.table::data.table()] and use
For ::setkey()] to set the key to the column you are matching.
[data.table This dramatically improves performance.