Prosodic structure contributes to speech production and comprehension. One of the crucial problems in achieving natural-sounding synthesized speech is the prediction of appropriate phrase boundaries. Unfortunately, obtaining human annotations of prosodic phrases to train a supervised system can be laborious and costly. Active learning has been proven effective in reducing labeling efforts for supervised learning.
This study explores active learning techniques with the objective to reduce the amount of human-annotated data needed to attain a given level of performance. It presents an approach based on active learning to predict the Chinese prosodic phrase boundaries in unrestricted Chinese text. Experiments show that for most of the cases considered, the active selection strategies for labeling the prosodic phrase boundaries are as good as or exceed the performance of random data selection.