FB2026_02 , released June 18, 2026
Reference Report
Open Close
Reference
Citation
Carvalho, A.B., Kim, B.Y., Uno, F. (2026). Strong bias in long-read sequencing prevents assembly of Drosophila melanogaster Y-linked genes.  Genome Res. 36(1): 71--82.
FlyBase ID
FBrf0264314
Publication Type
Research paper
Abstract
Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are generally considered free from sequence composition bias, a key factor, alongside read length, that explains their success in producing high-quality genome assemblies. Indeed, there had been very few reports of bias, the clearest one against GA-rich repeats in the human genome. However, our study reveals a systematic failure of both technologies to sequence and assemble specific exons of Drosophila melanogaster genes, indicating an overlooked limitation. Namely, multiple Y-linked exons are nearly or completely absent from raw reads produced by deep sequencing with state-of-the-art Nanopore (10.4 flow cells, 200× coverage) and PacBio (HiFi 50×). The same exons are accurately assembled using Illumina 67× coverage. We find that these missing exons are consistently located near simple satellite sequences, in which sequencing fails at multiple levels: read initiation (very few reads start within satellite regions), read elongation (satellite-containing reads are shorter on average), and basecalling (quality scores drop as sequencing enters a satellite sequence). These findings challenge the assumption that long-read technologies are unbiased and reveal a critical barrier to assembling sequences near repetitive regions. As large-scale sequencing projects move toward telomere-to-telomere assemblies in a wide range of organisms, recognizing and addressing these biases will be important to achieving truly complete and accurate genomes. Additionally, the underrepresented Y-linked exons provide a valuable benchmark for refining those sequencing technologies while improving the assembly of the highly heterochromatic and often neglected Drosophila Y Chromosome.
PubMed ID
PubMed Central ID
PMC12758392 (PMC) (EuropePMC)
Associated Information
Comments
Associated Files
Other Information
Secondary IDs
    Language of Publication
    English
    Additional Languages of Abstract
    Parent Publication
    Publication Type
    Journal
    Abbreviation
    Genome Res.
    Title
    Genome Research
    Publication Year
    1995-
    ISBN/ISSN
    1088-9051
    Data From Reference