Viewing a single comment thread. View all comments

pronunciaai t1_j6l49ij wrote

Yeah I work in the space (mispronunciation detection) and there is not a lack of frameworks, (speechbrain, NeMo, and thunder-speech being the more useful ones for custom stuff imo). The barrier to entry is all the stuff you have to learn to do audio ML, and all the pain points around stuff like CTC. Tutorials are more needed than frameworks to get more people actively working on speech and voice in my opinion.