Tried to extract my own glottal pulse to make the synth sound more human.
-
@x0 @BorrisInABox Don't think so. It messes with the waveform shape (the thing we're trying to capture!)
1. We detect F0 (fundamental frequency) from the recording
2. Find individual glottal periods (pitch peak to pitch peak)
3. Resample each period to a common length
4. Average them together
So natural pitch drift is fine - the averaging smooths it out. Recorded at ~85 Hz but wobbled between 82-88? Doesn't matter, we're extracting the shape not the pitch.@Tamasg @BorrisInABox Lol what would be totally amusing is if you used not at all glottal pulses for the waveforms, like the input to wavetable synths. Like I've got one from a washing machine that sounds like a screaming metallic monster, and it would be epic to have that as a direct vocal source, more direct than a mere vocoder.
-
@Tamasg @BorrisInABox Lol what would be totally amusing is if you used not at all glottal pulses for the waveforms, like the input to wavetable synths. Like I've got one from a washing machine that sounds like a screaming metallic monster, and it would be epic to have that as a direct vocal source, more direct than a mere vocoder.
@x0 @BorrisInABox Lol! The wild thing is... it would technically work? The formant filters don't care what you feed them. They just shape whatever harmonic-rich input they get. So we could get: WASHING MACHINE DEMON → formant filters → "h̷̰͝e̵̢͠l̷̨͘l̷͚̚o̵̱͝" The formants would still try to impose vowel shapes on the chaos. It would be cursed.

-
@x0 @BorrisInABox Lol! The wild thing is... it would technically work? The formant filters don't care what you feed them. They just shape whatever harmonic-rich input they get. So we could get: WASHING MACHINE DEMON → formant filters → "h̷̰͝e̵̢͠l̷̨͘l̷͚̚o̵̱͝" The formants would still try to impose vowel shapes on the chaos. It would be cursed.

@Tamasg @BorrisInABox It would sound fantastic for certain kinds of production effects, though. Although it can't give you fricatives. Pretty sure that already exists though, there are some formant plug-ins, like sonivox vocalizer. But I think they only do three formants.
-
@Tamasg @BorrisInABox It would sound fantastic for certain kinds of production effects, though. Although it can't give you fricatives. Pretty sure that already exists though, there are some formant plug-ins, like sonivox vocalizer. But I think they only do three formants.
@Tamasg @BorrisInABox Maybe that should be a side project, an accessible formant shaper plug-in that you could put as a send from a wavetable synth and get total epicness out of it :). Although VST development is also cursed. Maybe CLAP?
-
@Tamasg @BorrisInABox Lol what would be totally amusing is if you used not at all glottal pulses for the waveforms, like the input to wavetable synths. Like I've got one from a washing machine that sounds like a screaming metallic monster, and it would be epic to have that as a direct vocal source, more direct than a mere vocoder.
@x0 @BorrisInABox could give it a try for you and give you a speechplayer.dll for it. Would be nuts. Would sound broken. Never for production. But I have a version of the speechplayer with glottal table support, so I just use librosa to extract what I need and hand you a speechplayer.dll lol
-
@x0 @BorrisInABox Lol! The wild thing is... it would technically work? The formant filters don't care what you feed them. They just shape whatever harmonic-rich input they get. So we could get: WASHING MACHINE DEMON → formant filters → "h̷̰͝e̵̢͠l̷̨͘l̷͚̚o̵̱͝" The formants would still try to impose vowel shapes on the chaos. It would be cursed.

-
@x0 @BorrisInABox could give it a try for you and give you a speechplayer.dll for it. Would be nuts. Would sound broken. Never for production. But I have a version of the speechplayer with glottal table support, so I just use librosa to extract what I need and hand you a speechplayer.dll lol
@Tamasg @BorrisInABox Or it's like what softvoice did, all the different sources. Wait a minute! Is that the problem you're having with a female source? Do you need a real female glottal pulse to start from?
-
@BorrisInABox @Tamasg I got A.Liv on the Surge discord to kindly work with my source material and the results are now called Exocat's Metalodon in the 3rd-party wavetables folder of Surge's factory data.
-
Tried to extract my own glottal pulse to make the synth sound more human. Learned my voice is too gentle for radio. Sadness fills my soul. That's probably why I didn't stick with radio shows.
I recorded sustained vowels and used IAIF (Iterative Adaptive Inverse Filtering) to extract my glottal waveform - the raw "buzz" before your throat shapes it into vowels.
What I expected: Rich, characterful human excitation to replace the mathematical model.
What I got: A softer, breathier sound than pure math!
The mathematical LF model with sharpness cranked to 10 actually produces MORE harmonics than my actual voice does. That "chest resonant radio announcer" sound? That's aggressive glottal snap that not everyone has.@Tamasg I wonder how I would sound. Interesting.
-
Tried to extract my own glottal pulse to make the synth sound more human. Learned my voice is too gentle for radio. Sadness fills my soul. That's probably why I didn't stick with radio shows.
I recorded sustained vowels and used IAIF (Iterative Adaptive Inverse Filtering) to extract my glottal waveform - the raw "buzz" before your throat shapes it into vowels.
What I expected: Rich, characterful human excitation to replace the mathematical model.
What I got: A softer, breathier sound than pure math!
The mathematical LF model with sharpness cranked to 10 actually produces MORE harmonics than my actual voice does. That "chest resonant radio announcer" sound? That's aggressive glottal snap that not everyone has.@Tamasg Ooo, audio please.
-
@Tamasg @BorrisInABox Or it's like what softvoice did, all the different sources. Wait a minute! Is that the problem you're having with a female source? Do you need a real female glottal pulse to start from?
@x0 @BorrisInABox sadly female I realized would require Formant frequency tuning for the phonemes. Right now if we just put a female glottal shape over that, at best it would just sound aliased on top of the deeper, male-characteristic voice. theoretically... a female voice with a sharper glottal closure would actually give us MORE harmonics to work with, not fewer! Would be genuinely interesting to compare though - extract a female glottal pulse and see if the shape is meaningfully different.
-
@mcourcel lol sounds horrible!
-
@Tamasg Hehehehehe lolol, sort of like E-Speak.
-
-
@mcourcel yep. This gave me some real good insight into what Espeak did to fuck up SpeechPlayer, mainly changing its glottal source a lot. Hahahaha good lesson-learning!
-
@BorrisInABox @Tamasg I got A.Liv on the Surge discord to kindly work with my source material and the results are now called Exocat's Metalodon in the 3rd-party wavetables folder of Surge's factory data.
@BorrisInABox @Tamasg This is the raw source material, which I later trimmed and did some noise reduction on, and then A.Liv carefully turned it into something that was a consistent period to be turned into wavetables, I think at 2048 samples per frame.
-
@BorrisInABox @Tamasg This is the raw source material, which I later trimmed and did some noise reduction on, and then A.Liv carefully turned it into something that was a consistent period to be turned into wavetables, I think at 2048 samples per frame.
@BorrisInABox @Tamasg SO if you actually wanted to do that for whatever reason, I can send you the wavetables which are already fixed length single-cycle waveforms, unless you already have surge.
-
-
@BorrisInABox @Tamasg This is the raw source material, which I later trimmed and did some noise reduction on, and then A.Liv carefully turned it into something that was a consistent period to be turned into wavetables, I think at 2048 samples per frame.
@x0 @BorrisInABox @Tamasg Hehehe lololol! The spin sounds cool. Like a light saber.
-
@BorrisInABox @Tamasg SO if you actually wanted to do that for whatever reason, I can send you the wavetables which are already fixed length single-cycle waveforms, unless you already have surge.
@x0 @BorrisInABox lol! Can't even explain what it did, but it definitely introduces a metallic quality unlike any I've heard in speech synthesis before. Not even as tube-like as when I tried mine was, but boy is it bad. That grindyness really shows through.