Tried to extract my own glottal pulse to make the synth sound more human.
-
Tried to extract my own glottal pulse to make the synth sound more human. Learned my voice is too gentle for radio. Sadness fills my soul. That's probably why I didn't stick with radio shows.
I recorded sustained vowels and used IAIF (Iterative Adaptive Inverse Filtering) to extract my glottal waveform - the raw "buzz" before your throat shapes it into vowels.
What I expected: Rich, characterful human excitation to replace the mathematical model.
What I got: A softer, breathier sound than pure math!
The mathematical LF model with sharpness cranked to 10 actually produces MORE harmonics than my actual voice does. That "chest resonant radio announcer" sound? That's aggressive glottal snap that not everyone has.@Tamasg I have a reasonably good voice for that sort of junk, used to do voiceover/imaging professionally. Just for sake of curiosity, what needs to be captured for input?
-
@Tamasg I have a reasonably good voice for that sort of junk, used to do voiceover/imaging professionally. Just for sake of curiosity, what needs to be captured for input?
@BorrisInABox Oh cool! For the extraction I recorded 5 sounds:
"ahh" sustained at normal pitch (~5 sec)
2. "ahh" sustained at low pitch (~5 sec)
3. "ahh" sustained at high pitch (~5 sec)
4. "shhh" sustained fricative (~5 sec)
5. "th" sustained unvoiced (~3 sec)
The "ahh" vowels are for glottal pulse extraction at different F0s. The "sh" and "th" are for noise/frication characteristics.
Recording tips:
• Condenser or dynamic mic (I used a Blue Snowball, AT2005 was too noisy)
• Peaks around -5 to -8 dB (NOT quiet - my first attempt at -30 dB was useless)
• Steady volume, no vibrato
• Quiet room
• 44100 Hz, mono
The key is getting a clean, loud, boring sustained vowel - no expression, just pure steady tone. The more monotone the better for extraction! -
R AodeRelay shared this topic
-
@BorrisInABox Oh cool! For the extraction I recorded 5 sounds:
"ahh" sustained at normal pitch (~5 sec)
2. "ahh" sustained at low pitch (~5 sec)
3. "ahh" sustained at high pitch (~5 sec)
4. "shhh" sustained fricative (~5 sec)
5. "th" sustained unvoiced (~3 sec)
The "ahh" vowels are for glottal pulse extraction at different F0s. The "sh" and "th" are for noise/frication characteristics.
Recording tips:
• Condenser or dynamic mic (I used a Blue Snowball, AT2005 was too noisy)
• Peaks around -5 to -8 dB (NOT quiet - my first attempt at -30 dB was useless)
• Steady volume, no vibrato
• Quiet room
• 44100 Hz, mono
The key is getting a clean, loud, boring sustained vowel - no expression, just pure steady tone. The more monotone the better for extraction!@Tamasg @BorrisInABox Should an aggressive autotune be used to make it nice and uniform? Or would that introduce too much distortion?
-
@BorrisInABox Oh cool! For the extraction I recorded 5 sounds:
"ahh" sustained at normal pitch (~5 sec)
2. "ahh" sustained at low pitch (~5 sec)
3. "ahh" sustained at high pitch (~5 sec)
4. "shhh" sustained fricative (~5 sec)
5. "th" sustained unvoiced (~3 sec)
The "ahh" vowels are for glottal pulse extraction at different F0s. The "sh" and "th" are for noise/frication characteristics.
Recording tips:
• Condenser or dynamic mic (I used a Blue Snowball, AT2005 was too noisy)
• Peaks around -5 to -8 dB (NOT quiet - my first attempt at -30 dB was useless)
• Steady volume, no vibrato
• Quiet room
• 44100 Hz, mono
The key is getting a clean, loud, boring sustained vowel - no expression, just pure steady tone. The more monotone the better for extraction!@Tamasg Not a prob. I can do something later tonight when it's less noisy here. I'm just curious to see what it does.
-
@Tamasg @BorrisInABox Should an aggressive autotune be used to make it nice and uniform? Or would that introduce too much distortion?
@x0 @BorrisInABox Don't think so. It messes with the waveform shape (the thing we're trying to capture!)
1. We detect F0 (fundamental frequency) from the recording
2. Find individual glottal periods (pitch peak to pitch peak)
3. Resample each period to a common length
4. Average them together
So natural pitch drift is fine - the averaging smooths it out. Recorded at ~85 Hz but wobbled between 82-88? Doesn't matter, we're extracting the shape not the pitch. -
@x0 @BorrisInABox Don't think so. It messes with the waveform shape (the thing we're trying to capture!)
1. We detect F0 (fundamental frequency) from the recording
2. Find individual glottal periods (pitch peak to pitch peak)
3. Resample each period to a common length
4. Average them together
So natural pitch drift is fine - the averaging smooths it out. Recorded at ~85 Hz but wobbled between 82-88? Doesn't matter, we're extracting the shape not the pitch.@Tamasg @BorrisInABox Lol what would be totally amusing is if you used not at all glottal pulses for the waveforms, like the input to wavetable synths. Like I've got one from a washing machine that sounds like a screaming metallic monster, and it would be epic to have that as a direct vocal source, more direct than a mere vocoder.
-
@Tamasg @BorrisInABox Lol what would be totally amusing is if you used not at all glottal pulses for the waveforms, like the input to wavetable synths. Like I've got one from a washing machine that sounds like a screaming metallic monster, and it would be epic to have that as a direct vocal source, more direct than a mere vocoder.
@x0 @BorrisInABox Lol! The wild thing is... it would technically work? The formant filters don't care what you feed them. They just shape whatever harmonic-rich input they get. So we could get: WASHING MACHINE DEMON → formant filters → "h̷̰͝e̵̢͠l̷̨͘l̷͚̚o̵̱͝" The formants would still try to impose vowel shapes on the chaos. It would be cursed.

-
@x0 @BorrisInABox Lol! The wild thing is... it would technically work? The formant filters don't care what you feed them. They just shape whatever harmonic-rich input they get. So we could get: WASHING MACHINE DEMON → formant filters → "h̷̰͝e̵̢͠l̷̨͘l̷͚̚o̵̱͝" The formants would still try to impose vowel shapes on the chaos. It would be cursed.

@Tamasg @BorrisInABox It would sound fantastic for certain kinds of production effects, though. Although it can't give you fricatives. Pretty sure that already exists though, there are some formant plug-ins, like sonivox vocalizer. But I think they only do three formants.
-
@Tamasg @BorrisInABox It would sound fantastic for certain kinds of production effects, though. Although it can't give you fricatives. Pretty sure that already exists though, there are some formant plug-ins, like sonivox vocalizer. But I think they only do three formants.
@Tamasg @BorrisInABox Maybe that should be a side project, an accessible formant shaper plug-in that you could put as a send from a wavetable synth and get total epicness out of it :). Although VST development is also cursed. Maybe CLAP?
-
@Tamasg @BorrisInABox Lol what would be totally amusing is if you used not at all glottal pulses for the waveforms, like the input to wavetable synths. Like I've got one from a washing machine that sounds like a screaming metallic monster, and it would be epic to have that as a direct vocal source, more direct than a mere vocoder.
@x0 @BorrisInABox could give it a try for you and give you a speechplayer.dll for it. Would be nuts. Would sound broken. Never for production. But I have a version of the speechplayer with glottal table support, so I just use librosa to extract what I need and hand you a speechplayer.dll lol
-
@x0 @BorrisInABox Lol! The wild thing is... it would technically work? The formant filters don't care what you feed them. They just shape whatever harmonic-rich input they get. So we could get: WASHING MACHINE DEMON → formant filters → "h̷̰͝e̵̢͠l̷̨͘l̷͚̚o̵̱͝" The formants would still try to impose vowel shapes on the chaos. It would be cursed.

-
@x0 @BorrisInABox could give it a try for you and give you a speechplayer.dll for it. Would be nuts. Would sound broken. Never for production. But I have a version of the speechplayer with glottal table support, so I just use librosa to extract what I need and hand you a speechplayer.dll lol
@Tamasg @BorrisInABox Or it's like what softvoice did, all the different sources. Wait a minute! Is that the problem you're having with a female source? Do you need a real female glottal pulse to start from?
-
@BorrisInABox @Tamasg I got A.Liv on the Surge discord to kindly work with my source material and the results are now called Exocat's Metalodon in the 3rd-party wavetables folder of Surge's factory data.
-
Tried to extract my own glottal pulse to make the synth sound more human. Learned my voice is too gentle for radio. Sadness fills my soul. That's probably why I didn't stick with radio shows.
I recorded sustained vowels and used IAIF (Iterative Adaptive Inverse Filtering) to extract my glottal waveform - the raw "buzz" before your throat shapes it into vowels.
What I expected: Rich, characterful human excitation to replace the mathematical model.
What I got: A softer, breathier sound than pure math!
The mathematical LF model with sharpness cranked to 10 actually produces MORE harmonics than my actual voice does. That "chest resonant radio announcer" sound? That's aggressive glottal snap that not everyone has.@Tamasg I wonder how I would sound. Interesting.
-
Tried to extract my own glottal pulse to make the synth sound more human. Learned my voice is too gentle for radio. Sadness fills my soul. That's probably why I didn't stick with radio shows.
I recorded sustained vowels and used IAIF (Iterative Adaptive Inverse Filtering) to extract my glottal waveform - the raw "buzz" before your throat shapes it into vowels.
What I expected: Rich, characterful human excitation to replace the mathematical model.
What I got: A softer, breathier sound than pure math!
The mathematical LF model with sharpness cranked to 10 actually produces MORE harmonics than my actual voice does. That "chest resonant radio announcer" sound? That's aggressive glottal snap that not everyone has.@Tamasg Ooo, audio please.
-
@Tamasg @BorrisInABox Or it's like what softvoice did, all the different sources. Wait a minute! Is that the problem you're having with a female source? Do you need a real female glottal pulse to start from?
@x0 @BorrisInABox sadly female I realized would require Formant frequency tuning for the phonemes. Right now if we just put a female glottal shape over that, at best it would just sound aliased on top of the deeper, male-characteristic voice. theoretically... a female voice with a sharper glottal closure would actually give us MORE harmonics to work with, not fewer! Would be genuinely interesting to compare though - extract a female glottal pulse and see if the shape is meaningfully different.
-
@mcourcel lol sounds horrible!
-
@Tamasg Hehehehehe lolol, sort of like E-Speak.
-
-
@mcourcel yep. This gave me some real good insight into what Espeak did to fuck up SpeechPlayer, mainly changing its glottal source a lot. Hahahaha good lesson-learning!