jayalammar
jayalammar OP t1_is9vlpm wrote
Reply to comment by mrflatbush in [R] The Illustrated Stable Diffusion by jayalammar
Thank you!
This caption?
>Larger/better language models have a significant effect on the quality of image generation models. Source: Google Imagen paper by Saharia et. al.. Figure A.5.
What's the issue?
jayalammar OP t1_ir9btg5 wrote
Reply to comment by ryunuck in [R] The Illustrated Stable Diffusion by jayalammar
​
This might be closer to what you're looking for: https://huggingface.co/blog/annotated-diffusion
jayalammar OP t1_ir4jz7p wrote
Reply to comment by Domingo01 in [R] The Illustrated Stable Diffusion by jayalammar
My bad, you're right. It's "paradise cosmic beach by vladimir volegov and raphael lacoste". I arbitrarily picked an image from https://lexica.art/.
jayalammar OP t1_ir2im9w wrote
Reply to comment by new_name_who_dis_ in [R] The Illustrated Stable Diffusion by jayalammar
New Stable Diffusion models have to be trained to utilize the OpenCLIP model. That's because many components in the attention/resnet layer are trained to deal with the representations learned by CLIP. Swapping it out for OpenCLIP would be disruptive.
In that training process, however, OpenCLIP can be frozen just like how CLIP was frozen in the training of Stable Diffusion / LDM.
jayalammar OP t1_ir2hmsu wrote
Reply to comment by DigThatData in [R] The Illustrated Stable Diffusion by jayalammar
Much appreciated <3
jayalammar OP t1_isnkcuy wrote
Reply to comment by mrflatbush in [R] The Illustrated Stable Diffusion by jayalammar
Oh, okay, I understand you now. These are actual examples from the dataset. These were the captions of these images in the LAION Aesthetic dataset. https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6.5plus