×

Modelling of Lithuanian speech diphthongs. (English) Zbl 1262.68157

Summary: The goal of the paper is to get a method of Lithuanian speech diphthong modelling. We use a formant-based synthesizer for this modelling. The second order quasipolynomial has been chosen as the formant model in time domain. A general diphthong model is a multi-input and single-output (MISO) system, that consists of two parts where the first part corresponds to the first vowel of the diphthong and the second one – to the other vowel. The system is excited by semi-periodic impulses with a smooth transition from one vowel to the other.
We derive the parametric input-output equations in the case of quasipolynomial formants, define a new notion of the convoluted basic signal matrix, derive parametric minimization functional formulas for the convoluted output data. The new formant parameter estimation algorithm for convoluted data, based on Levenberg-Marquardt approach, is derived and its stepwise form is presented. Lithuanian diphthong /ai/ was selected as an example. This diphthong was recorded with the following parameters: PCM 48 kHz, 16 bit, stereo. Two characteristic pitches of the vowels /a/ and /i/ are chosen. Equidistant samples of these pitches are used for estimating parameters of MISO formant models of the vowels. Transition from the vowel /a/ to the vowel /i/ is achieved by changing excitation impulse amplitudes by the arctangent law. The method is audio tested, and the Fourier transforms of the real data and output of the MISO model are compared. It was impossible to distinguish between the real and simulated diphthongs. The magnitude and phase responses only show small differences.

MSC:

68T10 Pattern recognition, speech recognition
68T50 Natural language processing
68U15 Computing methodologies for text processing; mathematical typography
PDFBibTeX XMLCite