Viewing a single comment thread. View all comments

DEATH_STAR_EXTRACTOR t1_itoxcm2 wrote

But why is the first NUWA vr1 from 10 months ago only about 900M parameters and can do face prediction like shown etc and Imagen Video which is 11B parameters or so can do what it can do. I mean it doesn't look like Imagen Video is so much better. I know it can do words in leaves n all but I feel it can come out the same if given frame rate improvement and upscaling and more data/bigger brain. Yes there's a evaluation score but I'm talking about by eye.

1