FB2026_01 , released March 12, 2026
FB2026_01 , released March 12, 2026
Reference Report
Open Close
Reference
Citation
Levis, R. (2016.3.18). Correcting FlyBase sequence locations for BG and KG insertions. 
FlyBase ID
FBrf0231396
Publication Type
Personal communication to FlyBase
Abstract
PubMed ID
PubMed Central ID
Text of Personal Communication
I used the FlyBase datasets for BG and KG lines to get a list of FBti's for all of the BG and KG lines with FB records. I then used the FlyBase batch download tool to download FB data for these FBti's and imported the data into Excel spreadsheets. I deleted the rows of records for which there was no stock available. I filtered for sequence locations that either incorrect (> 10 bp discrepancy), missing, or for which the FlyBase sequence locations was entered as a range. For this round of corrections for the lines with missing sequence locations, I only considered insertions for which FlyBase had a GenBank accession for the insertion flank. I checked the mapping in the GDP database to make sure that the mapping wasn't ambiguous. With the exception of the insertions in histone genes (which I've considered separately), none of the insertions mapped to tandemly repeated segments.
I've broken down the corrections by category and used a separate spreadsheet in the attached Excel workbook for each category.
1) Location correction - the rows are sorted in ascending order of the discrepancy between the GDP and FB locations (column AC). All 41 of these are single insertions, as far as we know, so these aren't cases in which the locations of two insertions got switched. There are nine insertions for which the discrepancy is < 25 bp and you may not even want to bother correcting these. At the other extreme, there are 18 insertions for which the discrepancy is > 99 kb. All 18 of these are on either the X or 2R scaffolds and it is apparent to me that these are cases of the locations being recorded in FlyBase with the wrong release number. The errors were then propagated with each new assembly release. In some cases, the FlyBase gene is recorded correctly, in other cases it isn't. I've given the GDP gene assignments based on a gene model release that is from 2014, I think. The GDP_gene_1 and GDP_gene_2 entries are for insertions within the annotated limits of the gene. The GDP_upstream_gene_1 and GDP_upstream_gene_2 entries are for insertions that are upstream of the 5' end of the gene, within 500 bp.
2) Location correction - range - There are seven KG lines on this spreadsheet for which the FB sequence location is recorded as a range of 151 - 926 bp (column L). None of these are cases in which the two flanks map at the min and max of the range.
3) No current loc - There are 13 KG insertions on this list. Ten of these are double insertions and I think these are probably cases in which the GDP submitted both flanks to GenBank, but submitted the sequence location to FlyBase/BDSC for only one of the insertions. Of the ten double insertion lines, FlyBase has two records for seven of them and will have to create new records for the other three (column D).
4) histone - These are five KG lines that have an insertion in one of the tandemly repeated histone genes. I think that the data for all of these were submitted at a time when the reference genomic sequence had only one histone repeat unit as a placeholder for the array. FlyBase has no location or gene for KG02480. FlyBase has a unique sequence location for each of the other four histone insertions. I've given a release 6 location minimum and maximum based for each of these five insertions. I've entered the gene name and FBgn for the gene family, rather than the individual copy.
DOI
Associated Information
Comments
Associated Files
File date: 2016.3.18 ; File size: 38784 ; File format: xlsx ; File name: BG_KG_location_correction_forFB.xlsx
Other Information
Secondary IDs
    Language of Publication
    English
    Additional Languages of Abstract
    Parent Publication
    Publication Type
    Abbreviation
    Title
    ISBN/ISSN
    Data From Reference
    Alleles (5)
    Genes (66)
    Natural transposons (5)
    Insertions (71)
    Transgenic Constructs (1)