Sequence-identity-based mapping between rat and human protein phosphorylation sites
Format
A data frame with 202610 rows and 2 variables:
ptm_id_rat_refseq
character, RefSeq ID for rat phosphosite
ptm_id_human_uniprot
character, Uniprot ID for human phosphosite
Details
We used the NCBI Reference Protein Sequence database (RefSeq) to annotate protein IDs. Most of the Post-Translational Modification (PTM) resources and tools available are for humans; rat annotation is lacking. To leverage information from humans, we mapped PTM sites from rats to humans following a bioinformatics approach. Briefly, we used BLASTp to align all rat sequences to the human review UniProt fasta sequence database (download date: 02/03/2021). The median protein sequence identity between rats and humans is 85%. Only alignments with a sequence identity greater than 60% were included for mapping. For most proteins, BLASTp outputs multiple pairwise alignments (one-to-many). In those cases, we selected the alignment with the larger "positives" and "identities" values and required an exact match for the S/T/Y residues identified in this study. As a result, we could map with confidence 73.5% of all the phosphorylation sites we identified.