Viewing a single comment thread. View all comments

inquisitor49 t1_j4tgazw wrote

In transformers, a positional embedding is added to a word embedding. Why does this not mess up the word embedding, such as changing the embedding to another word?

1

mildresponse t1_j4xjmvw wrote

My interpretation is that the words should have different embedding values when they have different positions (context) in the input. Without a positional embedding, the learned word embeddings will be forced into some kind of positional average. The positional offsets give the model more flexibility to resolve differently in different contexts.

Because the embeddings are high dimensional vectors of floats, I'd guess the risk of degeneracy (i.e. that the embeddings could start to overlap with one another) is virtually 0.

1

cztomsik t1_j5qoc6a wrote

I think it does mess them, alibi paper seems like better solution.

1