Abstract
In the present work we show that in the
Drosophila genome (which covers a 37–51% GC range at a DNA size of approx.50
kb) a linear correlation holds between GC (or GC
3) levels of individual coding sequences and the GC levels of the long (>50
kb) genomic sequences embedding them. This correlation allows us to position the two compositional distributions of (a) coding sequences, and (b) of long DNA segments relative to each other and to calculate gene concentration across the compositional range of the
Drosophila genome. Using this approach, we show that gene concentration increases with increasing GC of the regions embedding the genes, reaching a 7-fold higher level in the GC-richest regions compared with the GC-poorest regions. The gene distribution of the
Drosophila genome is, therefore, similar to (although less striking than) that of the human genome, whereas it is very different from those of the
Arabidopsis genome, which has about the same size as the
Drosophila genome.