Just tested the vowel-onset approach — it's working!
-
Just tested the vowel-onset approach — it's working! Words like 'Chevron' sound more connected, transitions are smoother. One side effect: vowels feel slightly faster since we're adding 20ms onset segments. Tweaking the timing now.
-
Just tested the vowel-onset approach — it's working! Words like 'Chevron' sound more connected, transitions are smoother. One side effect: vowels feel slightly faster since we're adding 20ms onset segments. Tweaking the timing now.
@Tamasg A suggestion is to add a locus timing parameter in the formant table. Some vowels have longer or shorter locus times.
-
@Tamasg A suggestion is to add a locus timing parameter in the formant table. Some vowels have longer or shorter locus times.
@rommix0 High vowels like /i/ probably need shorter transitions than low vowels like /ɑ/. We currently have a global 10-20ms onset. Adding it to the phoneme table makes sense, would that be something like:
i:
cf1: 310
cf2: 2020
locusTimeMs: 12 # Short for high vowelsɑ:
cf1: 784
cf2: 1552
locusTimeMs: 25 # Longer for low vowels
Any rules of thumb for which vowels need longer/shorter locus times? -
@rommix0 High vowels like /i/ probably need shorter transitions than low vowels like /ɑ/. We currently have a global 10-20ms onset. Adding it to the phoneme table makes sense, would that be something like:
i:
cf1: 310
cf2: 2020
locusTimeMs: 12 # Short for high vowelsɑ:
cf1: 784
cf2: 1552
locusTimeMs: 25 # Longer for low vowels
Any rules of thumb for which vowels need longer/shorter locus times?@Tamasg DECTalk actually has locus times usually about 35 ms to 50 ms. Longer than what you're suggesting. From my testing, between 30 ms and 70 ms is most natural. 50 ms is most common.
-
@Tamasg DECTalk actually has locus times usually about 35 ms to 50 ms. Longer than what you're suggesting. From my testing, between 30 ms and 70 ms is most natural. 50 ms is most common.
@rommix0 So roughly:
• Short vowels / fast speech: ~30ms
• Normal: ~40-50ms
• Long vowels / careful speech: ~60-70ms
Does F1 vs F2/F3 have the same locus time, or do they transition at different rates? -
@rommix0 So roughly:
• Short vowels / fast speech: ~30ms
• Normal: ~40-50ms
• Long vowels / careful speech: ~60-70ms
Does F1 vs F2/F3 have the same locus time, or do they transition at different rates?@Tamasg Same timing.
-
@rommix0 Ha! Went from 10ms (too fast) to 30-40ms (gritted/thick mouth, especially on stops like 'button').
Thinking maybe different consonant types need different locus times? Like fricatives need longer (sustained noise → vowel) but stops need shorter (burst → quick transition)?
Or is it that stops handle locus differently because of the closure/burst? Hmm hmm. -
@rommix0 Ha! Went from 10ms (too fast) to 30-40ms (gritted/thick mouth, especially on stops like 'button').
Thinking maybe different consonant types need different locus times? Like fricatives need longer (sustained noise → vowel) but stops need shorter (burst → quick transition)?
Or is it that stops handle locus differently because of the closure/burst? Hmm hmm.@Tamasg that sounds right. closures happen quick.
-
@rommix0
In our synthesizer, each phoneme token has ONE set of formant values. The DSP interpolates between tokens via a fade parameter.
When we tried inserting a short 'onset' segment at locus values before vowels, it created artifacts - either 'gritted' sound (segment too long) or 'shaky' wobble (multiple segments accumulating). How does DECTalk/Klatt handle this? Does each segment have separate START and END formant values that the synth interpolates between? Or is there some other mechanism for smooth within-vowel transitions? -
R AodeRelay shared this topic