This is really cool. I'm trying to understand how this works... would you have to store transcripts of all 800 million videos on YouTube? How often does this transcript database get updated?
Thanks! The transcripts get added on-demand when users request to search for a video. It wouldn't make sense to index the entire database given it's large size. We're also able to get the transcripts pretty quickly, so there's no need to pre-cache the transcripts if a user has never asked for it before.
​
A more detailed overview of how it works can be found here:
Viewing a single comment thread. View all comments