Abstract
It is our believe that joint optimization of acoustic and language models meets the inherent correlation between them, and thus expected to achieve better recognition performance. This nice approach should be effective in achieving robust speech recognition where the testing conditions are different from those of training. The acoustic and language models are integrated together into a unified decoding graph using weighted finite state transducers. In this paper, we report experimental results of the joint optimization of acoustic and language models on the Resource Management (RM1) continuous speech recognition. The results show that the proposed joint optimization approach is effective under noisy conditions for unseen testing utterances and achieved relative word error rate reduction from 7% to 17% for different noise levels. These results emphasize our expectation about the robustness of the proposed joint optimization approach.