Abstract
We have analyzed existing methodologies and created novel methodologies for the automatic assignment of S-adenosylmethionine (AdoMet)-dependent methyltransferase functionality to genomic open reading frames based on predicted protein sequences. A large class of the AdoMet-dependent methyltransferases shares a common binding motif for the AdoMet cofactor in the form of a seven-strand twisted beta-sheet; this structural similarity is mirrored in a degenerate sequence similarity that we refer to as methyltransferase signature motifs. These motifs are the basis of our assignments. We find that simple pattern matching based on the motif sequence is of limited utility and that a new method of "sensitized matrices for scoring methyltransferases" (SM2) produced with modified versions of the MEME and MAST tools gives greatly improved results for the Saccharomyces cerevisiae yeast genome. From our analysis, we conclude that this class of methyltransferases makes up approximately 0.6-1.6% of the genes in the yeast, human, mouse, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, and Escherichia coli genomes. We provide lists of unidentified genes that we consider to have a high probability of being methyltransferases for future biochemical analyses.