Comments

You must log in or register to comment.

OfficialWireGrind OP t1_jccva32 wrote

The bar chart counts the occurrences of double letters in all of English Wikipedia's article text.

Data Source: English Wikipedia's April 1st, 2022 article data dump

Tools: Python, Matplotlib

17

solarmelange t1_jcd1o0v wrote

They need to go case sensitive, eliminating double capitals. That is clearly where ii are coming from.

10

popeter45 t1_jcdpu6g wrote

So how many of the LL are just Welsh words/places?

−1

imlookingatarhino t1_jcdqd5v wrote

I'm gonna put so many Q's in articles now. Gotta* work on those numbers

10

Clambulance1 t1_jcdrtfi wrote

A majority of the jj must come from articles about Korean things.

4

PrompteRaith t1_jcdvsht wrote

I would expect XX to be much higher (genetics, the band, etc)

1

sckurvee t1_jcdzwyx wrote

Wikipedia's got more llamas than accuracy.

17

Waxoplax t1_jce8pmx wrote

What does the k in the number stand for?

−1

dhkendall t1_jcfanit wrote

Each occurrence. I’m sure that, for example, the word “shell” appears more than once in Wikipedia, but it only appears once in the list of words in the English vocabulary.

3

T-Dex_the_T-Rex t1_jcffbm2 wrote

Interestingly, in terms of words with consecutive double letters, there is only 1 word with 3 consecutive double letters and only 1 word with 4 consecutive double letters. These words are Bookkeeper and Subbookkeeper respectively.

1