FlyBase Genome Annotators, (2019-). Gene model assessment based on new PhyloCSF data. 
FlyBase ID
Publication Type
FlyBase analysis
PubMed ID
PubMed Central ID
Related Publication(s)
Research paper

Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci.
Mudge et al., 2019, Genome Res. 9(12): 2073--2087 [FBrf0244540]

Associated Information

FlyBase curators have been re-evaluating existing gene model annotations and creating new annotations based on conserved genomic extents with protein-coding signatures, using an updated PhyloCSF analysis by Irwin Jungreis and colleagues; see: and

As of release 6.32, this analysis has resulted in 42 new protein-coding gene annotations, 8 of which correspond to lncRNA genes reannotated as coding and 16 of which are dicistronic or polycistronic. Thirty-one new pseudogenes have been annotated, including 6 protein-coding genes reannotated as pseudogenes and 4 lncRNA genes reannotated as pseudogenes. Five existing protein-coding genes were newly identified as mutations in the sequenced strain; 2 new protein-coding annotations correspond to a gene split supported by the PhyloCSF data. Almost 150 additional gene models were improved or corrected, including 35 new stop-codon readthroughs and 2 cases annotated with a non-AUG start codon. For 22 genes, a comment has been added indicating that a non-AUG translation start may be used, but such an alternative start was not annotated. Fifteen calls correspond to prior updates, including 10 stop-codon read-throughs based on a similar analysis. The PhyloCSF assessment also flags regions of known mutations in the strain; these have not been included in the list of associated genes below.

Associated Files
Other Information
Secondary IDs
    Language of Publication
    Additional Languages of Abstract
    Parent Publication
    Publication Type
    Data From Reference
    Genes (259)
