KingsmanVince

KingsmanVince t1_irvmgnx wrote

In Image Captioning, to train the model, you have to provide any text that describe the images. By this definition, "the prompt that makes the image" does FALL IN. One text can produce many images. One image can be described by many texts. Image and Text have many2many relationships.

For example, to capture a picture of a running dog, people can describe the whole process. That still a caption.

For example, I prompt "running dog". Dalle 2 draws a running dog me. Yes that's a freaking caption.

3