tetrisdaemon
tetrisdaemon OP t1_izjp9nc wrote
Reply to comment by calciumcitrate in [R] What the DAAM: Interpreting Stable Diffusion and Uncovering Generation Entanglement by tetrisdaemon
I'm looking into it, but I'm guessing it's the CLIP embeddings, so disentanglement might need to happen at that level. Some supporting evidence is that even if we set the cross attention to zero (for some words), it'll still reflect in the final image, indicating that the word representations are mixed in CLIP.
tetrisdaemon OP t1_izjmb5s wrote
Reply to comment by Purplekeyboard in [R] What the DAAM: Interpreting Stable Diffusion and Uncovering Generation Entanglement by tetrisdaemon
This is a good observation. Actually, in the paper we try out "{rusty, wooden, metallic} shovel in a clean shed," and it still made the shed rusty. Moving forward, we do plan to do the same thing to the other ball prompt.
tetrisdaemon OP t1_izjm0ov wrote
Reply to comment by JClub in [R] What the DAAM: Interpreting Stable Diffusion and Uncovering Generation Entanglement by tetrisdaemon
Cool, nicely done repository. Are you referring to the [16, 4096-ish, 77] cross-attention matrices? I maintained a streaming sum over matrices of the same size on a 64GB (though it does work with 32GB) RAM and 24GB VRAM machine.
tetrisdaemon OP t1_izi47x8 wrote
Reply to comment by moschles in [R] What the DAAM: Interpreting Stable Diffusion and Uncovering Generation Entanglement by tetrisdaemon
For sure, and how linguistics can guide Stable Diffusion to produce better images. For example, if we already understand how objects should relate on the language side (e.g., "a giraffe and a zebra" should probably produce two distinct animals, unlike that observed in the paper), we can twiddle the attention maps so that the giraffe and the zebra are separate.
tetrisdaemon OP t1_izhrg1k wrote
Reply to comment by Parzival_007 in [R] What the DAAM: Interpreting Stable Diffusion and Uncovering Generation Entanglement by tetrisdaemon
Thanks! We're actively improving it.
tetrisdaemon OP t1_izk7fk0 wrote
Reply to comment by JClub in [R] What the DAAM: Interpreting Stable Diffusion and Uncovering Generation Entanglement by tetrisdaemon
Yeah, moving forward it might help to have a disk caching mode.