Redvolition t1_itlmbug wrote
Reply to comment by red75prime in Given the exponential rate of improvement to prompt based image/video generation, in how many years do you think we'll see entire movies generated from a prompt? by yea_okay_dude
Have you seen the Phenaki demo?
I am not an expert, but from what I am digesting from the papers coming out, you could get to this Q4 2028 scenario with just algorithm improvements, without any actual hardware upgrade.
red75prime t1_itlxjbf wrote
Phenaki has the same problem: limited span of temporal consistency that cannot be easily scaled up. If an object goes offscreen for some time the model forgets how it should look.
DEATH_STAR_EXTRACTOR t1_itoxcm2 wrote
But why is the first NUWA vr1 from 10 months ago only about 900M parameters and can do face prediction like shown etc and Imagen Video which is 11B parameters or so can do what it can do. I mean it doesn't look like Imagen Video is so much better. I know it can do words in leaves n all but I feel it can come out the same if given frame rate improvement and upscaling and more data/bigger brain. Yes there's a evaluation score but I'm talking about by eye.
Viewing a single comment thread. View all comments