Active learning for the prediction of prosodic phrase boundaries in Chinese speech synthesis systems using conditional random fields

Prosodic structure contributes to speech production and comprehension. One of the crucial problems in achieving natural-sounding synthesized speech is the prediction of appropriate phrase boundaries. Unfortunately, obtaining human annotations of prosodic phrases to train a supervised system can be laborious and costly. Active learning has been proven effective in reducing labeling efforts for supervised learning.

This study explores active learning techniques with the objective to reduce the amount of human-annotated data needed to attain a given level of performance. It presents an approach based on active learning to predict the Chinese prosodic phrase boundaries in unrestricted Chinese text. Experiments show that for most of the cases considered, the active selection strategies for labeling the prosodic phrase boundaries are as good as or exceed the performance of random data selection.

Share this post