kilopeter OP t1_j6b6t8c wrote on January 29, 2023 at 2:06 AM

Yep. "A" is the most frequent letter at both the start and end of names in the dataset I used (girls born in 2021 in the USA).

mikeholczer t1_j6b7fim wrote on January 29, 2023 at 2:10 AM

My point is your interpretation is flawed, because the most likely outcome of it is very far from the actual most likely name.

kilopeter OP t1_j6b9h0h wrote on January 29, 2023 at 2:26 AM

Oh, absolutely: the fact that this Markov assumption yields nonsensical names shows that the sequence of letters in given names are not generated by a Markov process. (The next character depends very much on previous characters, not just the current one.)

But this visualization does accurately present the relative frequencies of character transitions in actual names. Using these frequencies to generate Markov chains of characters and calling the results names is a fun diversion whose results I found entertaining.

mikeholczer t1_j6b9was wrote on January 29, 2023 at 2:29 AM

Yeah, I think the display of the data is interesting, I just think what you wrote about it is misleading.

kilopeter OP t1_j6banxn wrote on January 29, 2023 at 2:36 AM

Oh? What part? I specifically qualified my interpretation with "want to reflect typical between-letter patterns of US girl names."

That's the point of using this viz to generate new names: generating character strings with totally realistic letter-to-letter transition probabilities is not enough to yield plausible names, or names which already exist. The generated names are often bizarre or excessively long, yet their character transition probabilities exactly reflect that of the real names in the input dataset.

mikeholczer t1_j6bb50x wrote on January 29, 2023 at 2:39 AM

If one follows your steps, the most common outcome is one letter and there has no between-letter patterns which clearly doesn’t match the between-letter patterns of the source data.

kilopeter OP t1_j6bbmom wrote on January 29, 2023 at 2:43 AM

It does if you include the placeholder "characters" for the start and end of each name! The most probable "name" A represents three tokens: [name start], A, [name end]. And if you generate many names using the transition matrix, you will indeed observe that the frequency of [name start] -> A and A -> [name end] matches the corresponding frequencies in the source data.

EDIT: on reflection, I agree with you. I should introduce the heatmap as a description of transition probabilities, but should avoid walking the reader through using the transition matrix to generate new "names." I should separate the topic of generating new names using the transition matrix under the (invalid) Markov assumption as a diversion. Thanks for pointing out the flaw in my explanation. I'll edit my top level comment when I have a chance!

globglogabgalabyeast t1_j6bwjuz wrote on January 29, 2023 at 5:44 AM

Did you already edit it? Cause I never got the impression that you were implying this process would lead to realistic names

kilopeter OP t1_j6ea6zq wrote on January 29, 2023 at 7:09 PM

Nah, I'm only just now getting a chance to edit my top-level comment. Thanks for throwing in your vote! I feel like I can reword the "interpretation" part better to avoid any possible misinterpretation.

Transition probabilities (shown as percentages) between successive letters in the names of girls born in 2021 in the USA [OC]

kilopeter OP t1_j6adleq wrote on January 28, 2023 at 10:23 PM

mikeholczer t1_j6agh1p wrote on January 28, 2023 at 10:43 PM

kilopeter OP t1_j6b6t8c wrote on January 29, 2023 at 2:06 AM

mikeholczer t1_j6b7fim wrote on January 29, 2023 at 2:10 AM

kilopeter OP t1_j6b9h0h wrote on January 29, 2023 at 2:26 AM

mikeholczer t1_j6b9was wrote on January 29, 2023 at 2:29 AM

kilopeter OP t1_j6banxn wrote on January 29, 2023 at 2:36 AM

mikeholczer t1_j6bb50x wrote on January 29, 2023 at 2:39 AM

kilopeter OP t1_j6bbmom wrote on January 29, 2023 at 2:43 AM

globglogabgalabyeast t1_j6bwjuz wrote on January 29, 2023 at 5:44 AM

kilopeter OP t1_j6ea6zq wrote on January 29, 2023 at 7:09 PM

ghostfaceschiller t1_j6bz394 wrote on January 29, 2023 at 6:11 AM

kilopeter OP t1_j6ecjw9 wrote on January 29, 2023 at 7:24 PM

ghostfaceschiller t1_j6ej1pw wrote on January 29, 2023 at 8:07 PM

kilopeter OP t1_j6eq2tj wrote on January 29, 2023 at 8:49 PM