Authors: Mohamed Abou-Zleikha, Zheng-Hua Tan, Mads Græsbøll Christensen, Søren Holdt Jensen
Type: Conference paper
Abstract: Statistical speech recognition based on hidden Markov model (HMM) has been used in many applications with a high degree of success. However, the dissimilarity between the training and test data is known to have a considerable ef- fect on the recognition accuracy. The purpose of this work is to propose a cluster-based acoustic model adaptation method to solve this problem. We use density forest – a very promis- ing ensemble data clustering method – to cluster the data and use maximum a posteriori (MAP) method to build a cluster- based adapted Gaussian mixture models (GMMs) in HMM speech recognition. Specifically, a set of bagged versions of the training data for each state in the HMM is generated, and each of these versions is used to generate one GMM and one tree in the density forest. Thereafter, an acoustic model forest is built by replacing the data of each leaf (cluster) in each tree with the corresponding GMM adapted by the leaf data using the MAP method.
The results show that the proposed approach achieves about 3.8% (absolute) lower phone error rate (PER) com- pared with the standard HMM/GMM and about 0.8% (abso- lute) lower PER compared with bagged HMM/GMM.