Reference
Citation
Lin, M.F., Jungreis, I., Kellis, M. (2011). PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions.  Bioinformatics 27(13): i275--i282.
FlyBase ID
FBrf0213957
Publication Type
Research paper
Abstract
As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multispecies nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models.We show that PhyloCSF's classification performance in 12-species Drosophila genome alignments exceeds all other methods we compared in a previous study. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE, and as interest grows in long non-coding RNAs, often initially recognized by their lack of protein coding potential rather than conserved RNA secondary structures.The Objective Caml source code and executables for GNU/Linux and Mac OS X are freely available at http://compbio.mit.edu/PhyloCSF CONTACT: mlinmit.edu; manolimit.edu.
PubMed ID
PubMed Central ID
PMC3117341 (PMC) (EuropePMC)
Associated Information
Comments
Associated Files
Other Information
Secondary IDs
    Language of Publication
    English
    Additional Languages of Abstract
    Parent Publication
    Publication Type
    Journal
    Abbreviation
    Bioinformatics
    Title
    Bioinformatics
    Publication Year
    1998-
    ISBN/ISSN
    1367-4803
    Data From Reference
    Genes (1)