Viewing a single comment thread. View all comments

Redvolition t1_itgk5qq wrote

I voted for 3 to 4 years. Here is the breakdown:

The dates in parenthesis refer to when I currently believe the referred technologies will be available as a published, finished, and usable product, instead of codes, papers, beta software, or demos floating around. Also, NeRF just seems to be glorified photogrammetry to me, which at best would produce good conventional 3D models, but that just seems to be a subpar workflow compared to post processing on top of a a crude 3D base or just generating the videos from scratch.

Tell me your own predictions for each category.

Capacity Available

(Q2 2024) Produces realistic and stylized videos in 720p resolution and 24 fps via applying post processing on crude 3D input. The videos are almost temporally consistent frame to frame, yet require occasional correction. Watch the GTA demo, if you haven't already. It could look like a more polished version of that.

(Q1 2025) Produces realistic and stylized videos in 720p resolution and 24 fps from text or low entry-barrier software, and the result is nearly indistinguishable from organic production, although with occasional glitches.

(Q3 2026) AI produces realistic and stylized videos in high resolution and frame rate from text or low entry-barrier software, and the result is truly indistinguishable from organic production. Emerging software allow for fine tuning, such as camera position, angle, speed, focal lenght, depth of field, etc.

(Q4 2027) Dedicated software packages for AI video generation are in full motion, making almost all traditional 3D software as we know obsolete. Realistic high resolution videos can be crafted with the click of a button or a text prompt already, but professionals use these softwares for further fine control.

Temporal and Narrative Consistency

(Q1 2025) Temporal consistency is good frame to frame, yet not perfect, and visual glitches still occur from time to time, requiring one form or another of manual labor to clean up. In addition, character and environment stability or coherence across several minutes of video is not yet possible.

(Q1 2026) The videos are temporally consistent frame to frame, without visual flickering or errors, but lack long-term narrative consistency tools across several minutes of video, such as character expressions, mannerisms, fine object details, etc.

(Q3 2027) Perfect visuals with text input and dedicated software capable of maintaining character and environment stability to the finest details and coherence across several minutes or hours of video.

Generalization Effectiveness

(Current) Only capable of producing what it has been trained for, and does not generalize into niche or highly specific demands, including advanced or fantastical elements for which an abundance of data does not exist.

(Q1 2025) Does generalize into niche or highly specific demands, such as advanced or fantastical elements for which an abundance of data does not exist, yet the results are subpar compared to organic production.

(Q2 2027) Results are limitless and perfectly generalize into all reasonable demands, from realistic, to stylized, fantastical, or surreal.

Computational Resources

(Current) Only supercomputers can generate videos with sufficient high resolution and frame rate for more than a couple of seconds.

(Q2 2025) High end personal computers or expensive subscription services need to be employed to achieve sufficient high resolution and frame rate for more than a couple of seconds.

(Q4 2028) An average to low end computer or cheap subscription service is capable of generating high resolution and frame rate videos spanning several minutes.

5

red75prime t1_itk9f2j wrote

> (Q4 2028) An average to low end computer or cheap subscription service is capable of generating high resolution and frame rate videos spanning several minutes.

If it will take days to render them, then maybe.

AIs don't yet significantly feed back into design and physical construction of the chip fabrication plants, so by 2028 we'll have one or two 2nm fabs and the majority of new consumer CPUs and GPUs will be using 3-5nm technology. Hardware costs will not drop significantly too (fabs are costly), so 2028 low-end will be around today's high-end performance-wise (with less RAM and storage).

Anyway, I would shift perfect long-term temporal consistency to 2026-2032 as it depends on integrating working and long-term memory into existing AI architectures and there's yet no clear path to that.

1

Redvolition t1_itlmbug wrote

Have you seen the Phenaki demo?

I am not an expert, but from what I am digesting from the papers coming out, you could get to this Q4 2028 scenario with just algorithm improvements, without any actual hardware upgrade.

1

red75prime t1_itlxjbf wrote

Phenaki has the same problem: limited span of temporal consistency that cannot be easily scaled up. If an object goes offscreen for some time the model forgets how it should look.

1

DEATH_STAR_EXTRACTOR t1_itoxcm2 wrote

But why is the first NUWA vr1 from 10 months ago only about 900M parameters and can do face prediction like shown etc and Imagen Video which is 11B parameters or so can do what it can do. I mean it doesn't look like Imagen Video is so much better. I know it can do words in leaves n all but I feel it can come out the same if given frame rate improvement and upscaling and more data/bigger brain. Yes there's a evaluation score but I'm talking about by eye.

1