Abstract
Voice conversion (VC) is a process which modifies the speech signal produced by one source speaker so that it sounds like another target speaker. In this paper the transformation is determined by using equal Arabic utterances from source and target speakers; these utterances are time-aligned using dynamic time warping algorithm. A conversion function based on Gaussian mixture model (GMM) is used for transforming the spectral envelope described by line spectral frequencies (LSF) and the residuals are converted using three residual prediction techniques. We also compare between these techniques in the conversion of some Arabic utterances. The quality of the transformed utterances is measured using subjective and objective evaluations.