cDNA clones from a variety of libraries (LD_pBS_cDNA
), were analyzed together to generate a non-redundant set of full length cDNA clones. First, 5' ESTs from clones were sequenced and grouped by sequence to select clones that extended furthest upstream. The remaining 9,080 clones were clustered on the basis of their 3' EST sequences. Duplicate clones were eliminated, as were clones lacking a poly(A) tail, and chimeric clones in which the 5' and 3' EST did not align to the genome with some proximity to each other. In this way, a total of 5,849 validated cDNA clones (average insert 2.2kb) were selected to generate the Drosophila Gene Collection.