yeolj0o t1_j2hkerk wrote on January 1, 2023 at 9:51 AM

I was having the exact thought about comparing segmentation labels. The best "deep learning style" approach I've come up with (and also not satisfying) is running a semantic image synthesis model (e.g., SPADE, OASIS, PITI) and comparing FIDs. A better approach for my case (I am working with cityscapes) where the scene outline is mostly fixed, is to utilize height priors and compare KL divergence or FSD according to the height (bottom, mid, top).

uwashingtongold OP t1_j2idybf wrote on January 1, 2023 at 3:44 PM

Super interesting! Thanks for the reply. Can you talk more about the later approach? My dataset is a medical segmentation dataset which obeys similar fixed properties as cityscapes. What do you mean by height priors and how are you comparing divergence metrics to establish similarity?

Although for reference, our paper is studying ambiguity in segmentation. So the annotations all have high variance, and there are multiple annotations per image

yeolj0o t1_j2iizj4 wrote on January 1, 2023 at 4:23 PM

This cityscapes segmentation approach paper provides intuition on the height prior, which is basically categorizing an image into three part according to the height of the pixel coordinates and then measuring pixel-wise class distributions. As you mentioned in your original post, you can use KL-divergence to measure similarity of the class distribution between two images.

For your case (measuring ambiguity), I think measuring the class distribution (of an image) seems like a bad idea since local differences may be the key difference you want to observe. Instead, I think measuring miou between two or more images can be a good measure since ambiguous annotations must have a small overlapping region, thus having a low miou.