Submitted by pvp239 t3_1035jt4 in MachineLearning
Speechbox is built on the premise that Whisper is good enough to pretty much transcribe any English speech. Furthermore, Whisper was trained to predict punctuated and orthographic text.
​
Speechbox leverages Whisper's quality to "unnormalize" audio transcriptions (see examples below) to make them more useful for further downstream applications while guaranteeing that the exact same words are being used.
"we are going to the san francisco beach" can have multiple meanings:
=>
- We are going to the San Francisco beach!
- We are going to the San Francisco beach?
- We are going to the San Francisco beach.
​
Speechbox will pick the correct one for you 😉
​
👉 GitHub: https://github.com/huggingface/speechbox
🤗 Demo: https://huggingface.co/spaces/speechbox/whisper-restore-punctuation
sloganking t1_j2xnk3k wrote
Have whisper's hallucinations been improved yet? I know before, it could sometimes derail, and repeat itself nonsensically.
It's highs seem the highest, but it's lows are well.. nonsensical.