Annotation of snoRNAs in CAF1 genome assemblies ----------------------------------------------- Non redundant sets of snoRNA predictions in the CAF1 Drosophila genomes are available from: ftp.sanger.ac.uk/pub4/sgj/annotation/caf1/1.0/ Annotating D. melanogaster snoRNAs ---------------------------------- Methods ------- Mapping of Drosophila melanogaster snoRNAs available from the public databases (EMBL and RefSeq Genome). D. melanogaster genome assembly used is from: http://rana.lbl.gov/drosophila/caf1/all_caf1.tar.gz. - 291 Sequences with snoRNA annotations available from EMBL and RefSeq Genome databases obtained (14/07/06). - snoRNA fragments, whole gene, multiple gene submissions accounted for elsewhere in dataset manually removed. - 276 sequences mapped to the CAF1 assembly using BLAST (WUBLASTN 2.0 -- params "W=3 -kap"). - Overlapping hits were manually inspected and the longer of the two snoRNA annotations retained. Details of overlapping mappings are provided below. Some of these have redundant Flybase entries. - All BLAST annotations were extended to the full length of the snoRNA query sequence. - 250 snoRNA annotations provided in file dmel_snorna.gff3 ***** NOTES ***** - Numerical summary of 276 sequences -> 250 mappings in the CAF1 genome: 19 mappings excluded due to overlapping hits (redundant Flybase entries) 5 mappings excluded due to overlapping hits (non-redundant Flybase entries) 2 two sequences failed to map to the D. melanogaster CAF1 genome 250 non redundant genome mapping --- 276 database sequences - In the ''Attributes' field. The Name tag used for the snoRNA is that for the sequence that was mapped. The other synonym (in some cases the more well known name) is given under the Alias tag. The Flybase identifiers and Genbank accessions for the mapped and overlapping/shorter sequences are provided where available under the Dbxref tag. - The 63 snoRNAs with NR Genbank accessions correspond to the Flybase release 4.3 gff annotations. - Remaining unresolved annotations: (1) Sequences AJ784385 and NR_002093. These sequences overlap by only 30 bp. Currently we have mapped both to the CAF1 assembly as it was unclear why both entries exist and which should be retained. The longer sequence AJ784385 has been experimentally verified and is possibly the preferred annotation. It does not currently have a Flybase entry. Unresolved overlapping annotations: Genbank Name Flybase --------- --------- --------- AJ784385 Me28S-Gm980 . NR_002093 snoRNA:M FBgn0044508 (2) 2 Genbank sequences failed to map to the D.melanogaster CAF1 genome. Neither of these sequences have Flybase entries. Unmapped sequences: Genbank Name Flybase --------- --------- --------- AJ809564 psi28s-2996 . AJ784386 Me18S-Cm419 . Tables of overlapping snoRNA mappings Column headers: (1) Genbank accession of mapped sequence (2) snoRNA name for mapped sequence (3) Flybase identifier for the mapped sequence (4) Genbank accession for duplicate, overlapping or shorter sequences (5) Synonym for snoRNA currently used in FlyBase (6) Current Flybase identifier for this snoRNA 19 instances of overlapping hits with redundant Flybase entries: AJ809562 psi28s-1180 FBgn0065063 NR_002542 snoRNA:535 FBgn0083002 AJ629273 psi18S-1347c FBgn0065075 NR_002546 snoRNA:203 FBgn0083052 AJ809560 psi18s-176 FBgn0065068 NR_002554 snoRNA:314 FBgn0083043 AJ629267 psi28S-1837a FBgn0065079 NR_002477 snoRNA:11 FBgn0082996 AJ629276 psi28S-3327c FBgn0065062 NR_002485 snoRNA:586 FBgn0082965 AJ809571 psi28s-2566 FBgn0065072 NR_002489 snoRNA:269 FBgn0082985 AJ809574 psi28s-3186 FBgn0065077 NR_002491 snoRNA:165 FBgn0082977 AJ809591 psi18s-1275 FBgn0065050 NR_002494 snoRNA:783 FBgn0083056 AJ629212 psi28S-2149 FBgn0065078 NR_002505 snoRNA:143 FBgn0082992 AJ629207 psi18S-1377a FBgn0065067 NR_002506 snoRNA:328 FBgn0083051 AJ629277 psi28S-2179 FBgn0065054 NR_002459 snoRNA:734 FBgn0082991 AJ809593 psi28s-3342 FBgn0065061 NR_002461 snoRNA:644 FBgn0082964 AJ629266 psi28S-3436b FBgn0065052 NR_002463 snoRNA:75 FBgn0082955 AJ629265 psi28S-3436a FBgn0065056 NR_002464 snoRNA:708 FBgn0082956 AY805215 orphan_CD2 FBgn0065065 NR_002541 snoRNA:461 FBgn0082920 AJ629258 psi18S-841a FBgn0065060 NR_002551 snoRNA:66 FBgn0083019 AJ809569 psi28s-2622 FBgn0065069 NR_002472 snoRNA:3 FBgn0082984 AJ629201 psi28S-2876 FBgn0065049 NR_002467 snoRNA:825 FBgn0082982 AJ809558 DmOr_aca5 FBgn0065074 NR_002540 snoRNA:227 FBgn0082921 4 instances of overlapping hits (multiple Genbank accessions under one Flybase entry) AJ809549 psi28s-2648 FBgn0065055 NR_002512 snoRNA:72 . AJ809592 psi28s-291 FBgn0065064 NR_002496 snoRNA:50 . NR_001718 snoRNA:Z1 FBgn0015543 U46015 . . AF089836 snoRNA H1 FBgn0026169 NR_001911, AJ809561 . . Mapping annotated melanogaster snoRNAs to the other CAF1 genomes ---------------------------------------------------------------- Methods ------- Drosophila CAF1 genome assemblies from: http://rana.lbl.gov/drosophila/caf1/all_caf1.tar.gz. - A non redundant set of 250 snoRNA sequences were annotated in the D. melanogaster CAF1 assembly previously. 2 additional D. melanogaster snoRNA sequences avaliable from the public databases which failed to map to the D. melanogaster CAF1 assembly were included in this analysis of the other 11 genomes. - Regions in the genome assemblies with similarity to each of the 252 D. melanogaster snoRNA sequences were identified using BLAST (WUBLASTN 2.0 -- params "W=3 -kap") - All regions with BLAST hit evalues < 1e-6 were accepted as significant hits. - Regions of BLAST hit similarity > 1e-6 and < 1e-2 were accepted only if syntenic with the D. melanogaster snoRNA genome mappings. The MAVID/MERCATOR whole genome alignments were used for the synteny analysis. Only BLAST hits which mapped to the same genome alignment fragment with the same orientation as the D. melanogaster snoRNA were accepted as significant. - Only the regions of the HSP alignment have been provided in the annotations. The genome co-ordinates provided have not been extended to the full length of the query sequences and in many instances these annotations will not represent the complete snoRNA gene. - The highest scoring BLAST hit for each genome region was used to provide the putative snoRNA annotations.