Sadness fills my soul.
-
Sadness fills my soul. SpeechBox will never gonna sound like Eloquence. Might as well abandon it. People still find it too sharp and sybilant. Better synths will come along, perhaps something neural anway. It's just a time waste in the end. Sadness.
@Tamasg You don't need to abandon it. Just because its not eloquence doesn't mean your work was wasted.
-
@Tamasg I'm so glad there are things that *don't* sound like eloquence. The more things that don't, the better. Carry on.
@FreakyFwoof lol but Eloquence is the gold standard, the one that so many can listen to without their ears going into exhaustion. I think in the community probably DECTalk is the second close, I usually find people in either group, and then the more nitche groups who like harsher things like Espeak or the hybrid formant-concatenated stuff. SpeechBox for many just falls too below DecTalk, above Espeak, but comfortably so that it can't become their daily driver, because RHVoice or the other options are "good enough" for their ears that formant stuff is too robotic. Fair point in that way.
-
@FreakyFwoof lol but Eloquence is the gold standard, the one that so many can listen to without their ears going into exhaustion. I think in the community probably DECTalk is the second close, I usually find people in either group, and then the more nitche groups who like harsher things like Espeak or the hybrid formant-concatenated stuff. SpeechBox for many just falls too below DecTalk, above Espeak, but comfortably so that it can't become their daily driver, because RHVoice or the other options are "good enough" for their ears that formant stuff is too robotic. Fair point in that way.
@Tamasg It's the opposite of my gold standard because I didn't grow up hearing it. I have NV Speech Player, but not the one you're talking about, unless it's the same thing? Always happy to have options.
-
Sadness fills my soul. SpeechBox will never gonna sound like Eloquence. Might as well abandon it. People still find it too sharp and sybilant. Better synths will come along, perhaps something neural anway. It's just a time waste in the end. Sadness.
@Tamasg It's still worthwhile to implement a synthesizer that runs with the same efficiency that Eloquence had. There are some appliances, medical devices, etc. that implement a GUI on a microcontroller. If sighted people can have a UI on a device with limited computing power, then we should be able to have a speech interface on that same class of device. That's why I don't like the idea of giving up on formant TTS and resigning ourselves to neural TTS being *the* future.
-
@Tamasg It's the opposite of my gold standard because I didn't grow up hearing it. I have NV Speech Player, but not the one you're talking about, unless it's the same thing? Always happy to have options.
@FreakyFwoof ha, SpeechBox (mostly) sounds the same, I've reduced a lot of the clickyness the old Sawtooth engine had especially on the "D" phoneme and "T" endings. Very clicky. I am glad at least that I could accomplish that in the Speechbox version, but still couldn't move the needle on it sounding "more like Eloquence from the 90s than Eloquence from the 2000s" as someone commented to me the other day.
-
@Tamasg It's still worthwhile to implement a synthesizer that runs with the same efficiency that Eloquence had. There are some appliances, medical devices, etc. that implement a GUI on a microcontroller. If sighted people can have a UI on a device with limited computing power, then we should be able to have a speech interface on that same class of device. That's why I don't like the idea of giving up on formant TTS and resigning ourselves to neural TTS being *the* future.
@Tamasg Of course, it's up to you to decide whether you still find the project worthwhile.
-
@FreakyFwoof ha, SpeechBox (mostly) sounds the same, I've reduced a lot of the clickyness the old Sawtooth engine had especially on the "D" phoneme and "T" endings. Very clicky. I am glad at least that I could accomplish that in the Speechbox version, but still couldn't move the needle on it sounding "more like Eloquence from the 90s than Eloquence from the 2000s" as someone commented to me the other day.
@Tamasg Good. A new generation of blind kids can grow up with a new sound. Perfectly acceptable thing to do. Moving the needle forward not backward is very much acceptable.
-
Sadness fills my soul. SpeechBox will never gonna sound like Eloquence. Might as well abandon it. People still find it too sharp and sybilant. Better synths will come along, perhaps something neural anway. It's just a time waste in the end. Sadness.
@Tamasg Why so quick to give up?
-
Sadness fills my soul. SpeechBox will never gonna sound like Eloquence. Might as well abandon it. People still find it too sharp and sybilant. Better synths will come along, perhaps something neural anway. It's just a time waste in the end. Sadness.
@Tamasg To my ears, and they are not the best ears, its to muddy. I have a hard time teasing words apart. Sound is fine, I think it’s diction. But this is coming from the guy that uses Samantha compact on everything except for work computer where I can't install any speech engines besides eloquence
-
@rommix0 well, now I think the DSP isn't the issue, it's the phonemes, the passes, DECTalk knows things like "desktop" gets a 60% reduction in fricative burst than the word "start." That's what I'm missing. Coarticulation is great, but it just pulls vowels and consonants more naturally to their targets. These other engines had way more sophisticated rules than just, "same fricative / affricate durrations" passed to the DSP. I've understood a lot more on how pitch controls prosody, but there's this engreediant missing and I don't think I'll ever find it lol
-
@rommix0 well, now I think the DSP isn't the issue, it's the phonemes, the passes, DECTalk knows things like "desktop" gets a 60% reduction in fricative burst than the word "start." That's what I'm missing. Coarticulation is great, but it just pulls vowels and consonants more naturally to their targets. These other engines had way more sophisticated rules than just, "same fricative / affricate durrations" passed to the DSP. I've understood a lot more on how pitch controls prosody, but there's this engreediant missing and I don't think I'll ever find it lol
@Tamasg Have you considered adding formant target readjustment rules to your program? That's something DECTalk did, especially for back vowels after alveolar sounds.
-
Sadness fills my soul. SpeechBox will never gonna sound like Eloquence. Might as well abandon it. People still find it too sharp and sybilant. Better synths will come along, perhaps something neural anway. It's just a time waste in the end. Sadness.
@Tamasg It doesn't need to sound like Eloquence, I really like how is becoming more intelligible with each version. It has the potential to be even better than Espeak! At it's current state is way more pleasant to hear than Espeak for sure, it just need to inprobe pronounciation.
-
@Tamasg Have you considered adding formant target readjustment rules to your program? That's something DECTalk did, especially for back vowels after alveolar sounds.
@rommix0 yeah, I think the CMUDict work is leading me towards this. It's shown me that part of my problem is just getting vowel stress and cluster targets from Espeak's IPA rather than actual broken down words made by linguists studying it deeply. So the data file is just all the words, broken down into IPA notation through a Python script into Espeak tie-bars and such. Things like, "'frisco ˈfɹɪskoʊ" - rewriting the rules like that first and then not doing an overlay to "correct" for Espeak's quirks will be where that moves, along which I think can come some more formant target passes. We have the EndCF1-3 and EndPF1-3 wired up per frame now, but obviously wiring it up isn't the same thing as using it right.
-
@rommix0 yeah, I think the CMUDict work is leading me towards this. It's shown me that part of my problem is just getting vowel stress and cluster targets from Espeak's IPA rather than actual broken down words made by linguists studying it deeply. So the data file is just all the words, broken down into IPA notation through a Python script into Espeak tie-bars and such. Things like, "'frisco ˈfɹɪskoʊ" - rewriting the rules like that first and then not doing an overlay to "correct" for Espeak's quirks will be where that moves, along which I think can come some more formant target passes. We have the EndCF1-3 and EndPF1-3 wired up per frame now, but obviously wiring it up isn't the same thing as using it right.
@Tamasg You might want to check this file too. It's got some good stuff on target adjustment.
https://github.com/dectalk/DECtalkMini/blob/dectalk-develop/include/p_us_st0.c -
@Tamasg It doesn't need to sound like Eloquence, I really like how is becoming more intelligible with each version. It has the potential to be even better than Espeak! At it's current state is way more pleasant to hear than Espeak for sure, it just need to inprobe pronounciation.
@muchanchoasado @Tamasg Exactly what I was thinking, over time this is getting better.
-
@Tamasg You might want to check this file too. It's got some good stuff on target adjustment.
https://github.com/dectalk/DECtalkMini/blob/dectalk-develop/include/p_us_st0.c@rommix0 so looks like DECTalk used a 3-layer approach to modifying this. I have Layer 2 well defined, but not the first layer and the third one. Layer 1 is specific, large Hz offsets for known phoneme pairs. Not computed from a formula though but hard-coded. Then Layer 3, the forward and backward rules. Very helpful there to know.
-
@muchanchoasado @Tamasg Exactly what I was thinking, over time this is getting better.
@Tamasg @alexchapman Yeah, it is really exciting to follow the development of a new formant synth after a long time.
-
@rommix0 so looks like DECTalk used a 3-layer approach to modifying this. I have Layer 2 well defined, but not the first layer and the third one. Layer 1 is specific, large Hz offsets for known phoneme pairs. Not computed from a formula though but hard-coded. Then Layer 3, the forward and backward rules. Very helpful there to know.
@Tamasg Yeah it's good stuff. It's a good place to start.
-
@rommix0 ah, now that filename suddenly makes sense from earlier, nice handle change
