Submitted by sonudofsilence t3_y19m36 in deeplearning

I would like to take the word embeddings of a text and visualize them at the same plot (for understanding reasons). The question is how i should pass the text into the pretrained BERT model? At first, i separated the text on sentences and passed each one separetely, but im not sure if this had the right results.

2

Comments

You must log in or register to comment.

neuralbeans t1_irw2mza wrote

If you're talking about the contextual embeddings that BERT is known for then those change depending on the sentence used, so you need to supply the full sentence.

2

sonudofsilence OP t1_irw4bmr wrote

Yes, that's why i want to pass "all the text" into bert, because for example a word in a sentence has to have similar vector with the same word (with same meaning) in another sentence. How can i accomplish that, as the max tokens number of bert is 512?

1

neuralbeans t1_irw4jiw wrote

You're supposed to pass in each sentence separately, as a list of sentences. You do not pass all the sentences as one string.

1

sonudofsilence OP t1_irw765w wrote

Yes, i know but in this way the embedding of a word will be created according only to the tokens of the sentence in which it is found, right?

1

ExchangeStrong196 t1_irw93ux wrote

Yes. In order to ensure the contextual token embedding attends to longer text, you need to use a model that accepts larger sequence lengths. Check out Longformer

1