RRBS feature annotation
Format
A data frame with 7585076 rows and 12 variables:
Chr
integer, chromosome
Locus
character, base pair range of feature
EntrezID
character, Entrez ID of closest gene
Symbol
character, gene symbol of closest gene
Strand
character, strand
Width
integer, width of the genomic locus represented by the feature
NumSites
integer, the number of sites merged to generate the feature, sites are merged by their correlation pattern in the data using an unsupervised analysis
Sites
character, a comma separated string with the specific sites that were merged to generate the feature
LocStart
integer, feature start in base pairs
LocEnd
integer, feature end in base pairs
tissue
character, tissue abbreviation, one of TISSUE_ABBREV
feature_ID
character, MoTrPAC feature identifier
Details
METHYL feature annotation is only available via download from Google Cloud Storage:
https://storage.googleapis.com/motrpac-rat-training-6mo-extdata/epigen-rda/METHYL_FEATURE_ANNOT.rda.
You can use MotrpacRatTraining6mo::load_methyl_feature_annotation()
to download and return this file.
Only CpG sites with methylation coverage of >=10 in all samples were included for downstream analysis, and normalization was performed separately in each tissue. Individual CpG sites were divided into 500 base-pair windows and were clustered using the Markov Clustering algorithm via the MCL R package (Jager, 2015). To apply MCL, for each 500 base-pair window an undirected graph was constructed, linking individual sites if their correlation was >=0.7. MCL was chosen for this task as it: (1) determines the number of clusters internally, (2) identifies homogeneous clusters, and (3) keeps single sites that are not correlated with either sites as singletons (clusters of size one).
Given these sites, this table was generated using MotrpacRatTraining6mo::get_peak_annotations()
.
relationship_to_gene
is the shortest distance between the feature and the start or end of the closest gene.
It is 0 if the feature has any overlap with the gene.
custom_annotation
fixes many issues with the ChIPseeker
annotation (v1.22.1).