Submitted by kingdroopa t3_10f7dyr in MachineLearning
BlazeObsidian t1_j4v495i wrote
Autoencoders like VAE’s should work better than any other models for image to image translation. Maybe you can try different VAE models and compare their performance
I was wrong.
kingdroopa OP t1_j4v5t9a wrote
Hmm, interesting! Do you have any papers/article/sources supporting this claim?
BlazeObsidian t1_j4var74 wrote
Sorry, I was wrong. Modern deep VAE's can match SOTA GAN model performance for img superresolution(https://arxiv.org/abs/2203.09445) but I don't have evidence for recoloring.
But diffusion models are shown to outperform GAN's on multiple img-to-img translation tasks. Eg:- https://deepai.org/publication/palette-image-to-image-diffusion-models
You could probably reframe your problem as an image colorization task:- https://paperswithcode.com/task/colorization and the SOTA is still Palette linked above
kingdroopa OP t1_j4vbaxk wrote
Thanks :) I noticed Palette uses paired images, whilst mine are a bit unaligned. Would you considered it a paired image set, or unpaired? They look closely similar, but don't share pixel information in the top/bottom of the images.
BlazeObsidian t1_j4vc61q wrote
That depends on the extent to which the pixel information is misaligned I think. If cropping your images is not a solution and a large portion of your images have this issue, the model wouldn't be able to generate the right pixel information for the misaligned sections. But it's worth giving a try with Palette if the misalignment is not significant.
Viewing a single comment thread. View all comments