Comments

You must log in or register to comment.

MurdrWeaponRocketBra t1_j6093np wrote

This is really cool. I'm trying to understand how this works... would you have to store transcripts of all 800 million videos on YouTube? How often does this transcript database get updated?

2

Notaprumber t1_j60ekmp wrote

95% of YouTube is reposted garbage, or some dude pointing up with his face below a video

28

rikspik t1_j60g2yt wrote

I would have guessed youtube to be second, after wikipedia. Looks like I was way off then. How do you compare? Pageviews?๐Ÿ˜

1

Thenerdy9 t1_j60h6td wrote

yes! yes yes! Where is this search engine gimme gimme :)

1

DenL4242 t1_j60ko3y wrote

If Mt. Everest were a cow, it would be the largest cow on earth.

250

Ruleyoumind t1_j60mcbl wrote

Do you have a link to the search engine?

2

actvdecay t1_j61a7j7 wrote

I wonder what a AI chat bot trained on YouTube library would sayโ€ฆ

0

NovaticFlame t1_j61as5y wrote

How is this beautiful? Thereโ€™s not even a y axis FFS

88

BlizzardArms t1_j61c6rd wrote

This data just makes me wonder what weโ€™d know if Alexandria hadnโ€™t burned

14

Hentai_Yoshi t1_j61l1pg wrote

Bro, did you never pay attention is school? Label your y axis. Like millions of what?

22

Lethlnjektn t1_j61mlm8 wrote

The library of Alexandria sheds a tear because of 999,999,990 terrible hours of "content" on YouTube

2

Putoigituresse t1_j61nqnd wrote

Iโ€™m actually crazy impressed by how low Reddit is on that list. I have a hard time believing library of the congress has more text than all of Reddit, Twitter, youtube, and Wikipedia combined

20

ZeusTheRecluse t1_j62mm1t wrote

I'm left wondering:

  1. if youtube is only third, what the hell is in the Library of Congress and British Library... seriously...

  2. Why have i never heard of the Library and Archives Canada (I. Am. Canadian).

  3. Wikipedia sooooo small???? damn, wow....

4

Delta4o t1_j62yelk wrote

Reddit would be a library where every other book would be a NSFW question from askreddit

2

Purplekeyboard t1_j62z8oy wrote

Your AI search engine doesn't seem to work. I tried searching on multiple things from youtube videos, like "I gotta have more cowbell", and it produced results which didn't in any way relate to what I searched on.

1

HieronymusGoa t1_j62zqq3 wrote

...id be a very shitty librabry ^^ and i love youtube.

1

miskathonic t1_j633f7k wrote

The Library of Alexandria had maybe 100,000 books worth of scrolls containing ??? written at a time when the smart people thought disease was caused by bad air

There probably was some dope shit, but there's an order of magnitude more educational content on YouTube than burned in the LoA

3

rose1983 t1_j637lqz wrote

If YouTube was a library, it should be named Sturgeonโ€™s Library.

1

3022_Dispatch t1_j637m0j wrote

The next time someone shows me a data table as definitive proof of some ridiculous idea they hold, Iโ€™m going to share this post

1

Demolisher94 t1_j638fuw wrote

If my grandma had wheels, she would be a bicycle!

3

Andulias t1_j63dgrr wrote

And if my grandmother had wheels, she would've been a bike.

3

ezenn t1_j63dwai wrote

I get a feeling that with the growing number of subscribes in this subreddit, the quality of posts are decreasing. What is the quality of data here and what does it tell us?

2

insane9001 t1_j63e1z1 wrote

What is the Y axis? Surely that must be a requirement for posting graphs in this sub

1

YetiGuy t1_j63hlcl wrote

Your argument is so far off though. I mean at least a YouTube and a library are comparable.

Let me fix that. If Mt Everest was a popsicle, itโ€™d be the largest popsicle in the world. /s

1

Ikbeneenpaard t1_j63l4gb wrote

If Reddit were a library, it would be a shitty library.

2

M3NTAL-313 t1_j63rdhe wrote

Can your AI Search algo index timestamps for stars and sexacts from a library of 100K+ p0rn videos? DM me if so...

2

anynonus t1_j63rj7d wrote

If the atlantic ocean was a bath it's be the biggest bath in the world

1

Zenzayy t1_j63v7rs wrote

Nice axis title, dweeb. Why even post this here?

1

EICONTRACT t1_j64jmwj wrote

Doesnโ€™t google already give you time stamps of your search as long as itโ€™s chapetwred?

1

BradMH88 t1_j64ohh8 wrote

I feel like weโ€™ve all let Reddit down. Look how small it is. Itโ€™s time to increase our Reddit participation. This is just embarrassing. I have to imagine there are more random safes or something to generate mini hysteria.

1

Chramir t1_j64xwtj wrote

They made a estimate of how many words are there in every youtube video uploaded. That estimate is calculated by the total runtime of all the videos multiplied by average word count in a conversation per given time. And the total words are devided by the number of words in a average book. To get a 'books size'.

I don't know, but that just seems kinda iffy. First youtube videos are rarely a back and forth conversation. And secondly it's like pointing to a skyscraper and saying it's like a big sandcastle because sand is used in concrete.

Edit: grammar and added the 'word count' estimate explanation.

1

Lyndon91 t1_j650wlw wrote

Donโ€™t get how it makes sense. Is the book equivalent to the video once itโ€™s been transcribed?

1

CeeMX t1_j6528wz wrote

Much likely count of videos compared to the books in the library, which is a weird metric, as books contain much more content than a video and on the other hand the amount of data would put YouTube on rank 1 by far

1

Lirlya t1_j65g2ta wrote

Your missing Hella lot of librairies in your data source

1

tomiwa1a t1_j69prfr wrote

Good point, hereโ€™s we got this information.

  1. We calculated the number of hours of video uploaded to Youtube every minute from 2007-2022 source: statista
  2. We found how many words are spoken per hour of human conversation source: virtualspeech
  3. We calculated the number of words in the average book source: jericho writers

Then we did some calcualations with those numbers to arrive at 99,338,400 books on Youtube

You can see the details of those calculations here: https://docs.google.com/spreadsheets/d/1UbekWhTLJKQj6ZLipg1R269CQ8g0ACDbzPRDFN14inc/edit#gid=52223737

Edit: I also have a question about the last thing you said > thereโ€™s so much more content than that though

What other content is there?

1

tomiwa1a t1_j6b80iu wrote

I don't think it's fair to say that comparing Youtube to a Library is like comparing Mt. Everest to a Cow. For one thing, there is actually a pretty clever way to estimate the amount of text on Youtube and compare it to the amount of text in a library.

Maybe, if I explain how we made the graph you'll see that it's more apples to apples than mountains to cows:

  1. We calculated the number of hours of video uploaded to Youtube every minute from 2007-2022 source: statista
  2. We found how many words are spoken per hour of human conversation source: virtualspeech
  3. We calculated the number of words in the average book source: jericho writers

Then we did some calcualations with those numbers to arrive at 99,338,400 books on Youtube

You can see the details of those calculations here: https://docs.google.com/spreadsheets/d/1UbekWhTLJKQj6ZLipg1R269CQ8g0ACDbzPRDFN14inc/edit#gid=52223737

1

tomiwa1a t1_j6b8wnz wrote

Can you please clarify? what do you mean by it isn't clear how books on Youtube is calculated?

If you check this range you can see how we arrived at our numbers:

  1. We calculated the number of hours of video uploaded to Youtube every minute from 2007-2022 source: statista
  2. We found how many words are spoken per hour of human conversation source: virtualspeech
  3. We calculated the number of words in the average book source: jericho writers

Then we did some calcualations with those numbers to arrive at 99,338,400 books on Youtube

1

tomiwa1a t1_j6ba59m wrote

  1. The other interesting piece is that Library of Congress was founded in 1800 (though a fire caused it to restart it's collection in 1815).

Youtube was founded in 2005.

So in just 17 years, Youtube has amassed a collection of information that is 57% the size of the world's largest library which has been accumulating it's collection for over 200 years.

​

  1. I'm also Canadian. Hadn't heard of it either until we did this report. We probably haven't heard it because we likely won't need to use any of it's resources. Public libraries already do a really good job for most of our day to day needs.

  2. Wikipedia's small size makes sense given that contributions are heavily restricted and have such a high bar. Imagine if every Youtube video had to be approved by a editors before or every author had to have their books approved by editors before publishing.

1

tomiwa1a t1_j6bagzz wrote

Thanks! The transcripts get added on-demand when users request to search for a video. It wouldn't make sense to index the entire database given it's large size. We're also able to get the transcripts pretty quickly, so there's no need to pre-cache the transcripts if a user has never asked for it before.

​

A more detailed overview of how it works can be found here:

  1. https://www.reddit.com/r/OpenAI/comments/10j3gzy/comment/j5jh0wo/?utm_source=share&utm_medium=web2x&context=3
  2. https://atila.ca/blog/tomiwa/atlas
1

tomiwa1a t1_j6bapiz wrote

The reason that happens is because unless someone has previously submitted a youtube video with "I gotta have more cowbell" we won't have it in our index.

​

>The transcripts get added on-demand when users request to search for a video. It wouldn't make sense to index the entire database given it's large size. We're also able to get the transcripts pretty quickly, so there's no need to pre-cache the transcripts if a user has never asked for it before.A more detailed overview of how it works can be found here:

  1. https://www.reddit.com/r/OpenAI/comments/10j3gzy/comment/j5jh0wo/?utm_source=share&utm_medium=web2x&context=3
  2. https://atila.ca/blog/tomiwa/atlas

See: earlier comment

1

tomiwa1a t1_j6bb8e8 wrote

Exactly! This is how it works.

I agree it's not perfect, but remember, Youtube itself is not a library so any comparisons to real libraries will require some degree of approximation. You can think of it as an approximate estimate or my preferred term, a Fermi Estimate.

1

tomiwa1a t1_j6bbj7o wrote

The Y Axis is number of books. I agree with you though, That was an oversight on our part. I also don't like when graphs don't have a labelled Y-Axis. Next time we'll add them.

1

worriedshuffle t1_j6bmjs3 wrote

Phenomenal calculation. You assume every minute of YouTube contains nonstop speech at the average word rate. Obviously this is false.

Second, in comparing quantity of speech you say nothing about quality. Libraries donโ€™t contain every single book in existence. Most books are trash. YouTube does contain tons of trash.

1