wow.
-
wow. this has smoothened out the voice a lot. No more thump on "glottal sharpness."
DSP v7: Replace biquad resonators with trapezoidal SVF, add cosine/log-domain interpolationTrapezoidal State-Variable Filter (SVF) Resonators ==
The cascade and parallel formant resonators now use Andrew Simper's
trapezoidal-integrated SVF method (Cytomic) instead of the classic
Klatt biquad (Direct Form II).Benefits:
- Zipper-free formant sweeps: frequency and Q are independent parameters,
so continuously varying formant frequencies during coarticulation and
diphthongs no longer risk coefficient discontinuities or clicks.
- Better low-sample-rate stability: the old biquad lost coefficient
precision near Nyquist, contributing to "old cell phone" artifacts at
11025 Hz. The SVF spreads precision more evenly across the spectrum.
- Nyquist proximity damping: a quadratic bandwidth widening curve above
60% of Nyquist prevents the SVF from over-resonating where the old
biquad naturally degraded. This keeps fricatives clean at 11025 Hz
while having zero effect at 22050+ Hz.The anti-resonator (nasal zero, rN0) uses a dedicated FIR (all-zero)
filter rather than the SVF notch output. The SVF notch places zeros on
the unit circle (infinitely deep null), which is too aggressive for
speech nasalization. The FIR places zeros inside the unit circle at a
depth controlled by bandwidth, matching the behavior expected by existing
phoneme data. -
wow. this has smoothened out the voice a lot. No more thump on "glottal sharpness."
DSP v7: Replace biquad resonators with trapezoidal SVF, add cosine/log-domain interpolationTrapezoidal State-Variable Filter (SVF) Resonators ==
The cascade and parallel formant resonators now use Andrew Simper's
trapezoidal-integrated SVF method (Cytomic) instead of the classic
Klatt biquad (Direct Form II).Benefits:
- Zipper-free formant sweeps: frequency and Q are independent parameters,
so continuously varying formant frequencies during coarticulation and
diphthongs no longer risk coefficient discontinuities or clicks.
- Better low-sample-rate stability: the old biquad lost coefficient
precision near Nyquist, contributing to "old cell phone" artifacts at
11025 Hz. The SVF spreads precision more evenly across the spectrum.
- Nyquist proximity damping: a quadratic bandwidth widening curve above
60% of Nyquist prevents the SVF from over-resonating where the old
biquad naturally degraded. This keeps fricatives clean at 11025 Hz
while having zero effect at 22050+ Hz.The anti-resonator (nasal zero, rN0) uses a dedicated FIR (all-zero)
filter rather than the SVF notch output. The SVF notch places zeros on
the unit circle (infinitely deep null), which is too aggressive for
speech nasalization. The FIR places zeros inside the unit circle at a
depth controlled by bandwidth, matching the behavior expected by existing
phoneme data.@Tamasg The SVF makes a lot of sense, they modulate well and direct form biquads don't. If I stay with formant+bandwidth I'll also switch to that. However, I'm currently exploring line spectral pairs so might not.
I'm also wondering if you should drive this at a higher sampling rate. The BLT gives you frequency compression near Nyquist, which is not present in the analog source. I think you'd want to add an additional lowpass around 4-5kHz for voiced.
-
@Tamasg The SVF makes a lot of sense, they modulate well and direct form biquads don't. If I stay with formant+bandwidth I'll also switch to that. However, I'm currently exploring line spectral pairs so might not.
I'm also wondering if you should drive this at a higher sampling rate. The BLT gives you frequency compression near Nyquist, which is not present in the analog source. I think you'd want to add an additional lowpass around 4-5kHz for voiced.
@raph My trapezoidal SVF has the same issue since it's mathematically
equivalent. At 11025 Hz the upper formants (F5/F6) are getting squeezed noticeably near Nyquist. Pre-warping fixes center frequency but not bandwidth shape.The voiced lowpass idea has merit. I have spectral tilt on the source already but a proper LP at 4-5 kHz before the resonator bank could help. Oversampling just the voiced path is tempting too, since noise sources don't alias the same way.
-
R AodeRelay shared this topic