Authors: Mohamed Abou-Zleikha, Zheng-Hua Tan, Mads Græsbøll Christensen, Søren Holdt Jensen
Type: Conference paper
Abstract:
Detecting the speaker changing points is considered an important stage for several audio and speech applications. Traditionally, BIC-based approach isthe most widely used for this task with a very good degree of success. The maincriticism that faces BIC-based approaches is the existence of a penalty parame-ter in the BIC function. This parameter requires to be tuned for certain environ-mental and acoustic conditions. This tuning requirement makes the BIC-basedapproaches more biased to the data that used in the tuning process.
In this paper, we propose an ensemble-based approach for speaker segmentation.A forest of segmentation trees is constructed where each tree is trained using asampled version of the speech segment. During each tree building process, a ran-domly selected points are examined as a potential segmentation point, and theone that has the highest ∆BIC is consider as a potential segmentation point andsame process is applied the left and right. The stopping criterion is the segmentis smaller than a threshold. At each node, the highest ∆BIC with the associ-ated point index is stored. After building the model, and using all trees, the ac-cumulated ∆BIC for each point is calculated. Then the positions of the localmaximums are considered as speaker changing points.