Viewing a single comment thread. View all comments

MysteryInc152 t1_j83uty8 wrote on February 11, 2023 at 1:13 PM

Only the 17b and 30b models are multimodal. Still pretty good though for sure.

We also have some recent advances that ground frozen language models to images. Namely BLIP-2 and fromage.