Submitted by candidhorse4 t3_10tvggb in MachineLearning
suflaj t1_j7c73af wrote
Reply to comment by candidhorse4 in What text to speech does this guy use? [R] by candidhorse4
Make no mistake - there is no TTS more humanlike than Azure ATM, but the exact voice was likely fiddled around with a bit to get the exact pronunciation, or ran through a filter.
2 days ago I was comparing all the state-of-the-art TTS', and while Google's Neural2 came close to the video, it does not feature similar voices to the one in the video.
candidhorse4 OP t1_j7cg5xb wrote
have you tried murf.ai and wellsaid labs?
suflaj t1_j7ckp0d wrote
Yes. Although impressive in the number of languages and voices, it does not match Azure's more expressive prosody. I have listened to far too many robocalls, so that kind of magic is gone for me.
Someone else might consider it more humanlike, as it's all subjective. Have they published benchmark scores yet?
candidhorse4 OP t1_j7cnaci wrote
i dont think they have, so what do you think then as a whole, which one is the best in replicating the human voice with all its nuances?
suflaj t1_j7cpf1d wrote
Azure
This is due to 2 issues both of these have and Azure mitigates to an extent:
- they both lack humanity, i.e. they can at most be convincing as human prompt readers, but not anything else
- those without a better ear and headphones probably do not notice a certain ring those two have, which a human voice cannot replicate - it might be that this effect is added to make the voices sharper, but ultimately it will make people like me, as well as robovoice detectors be able to more easily distinguish them as TTS
candidhorse4 OP t1_j7dscul wrote
which azure voices are the most realistic?
Viewing a single comment thread. View all comments