Submitted by ReadSeparate t3_10zcig2 in singularity
We’re all waiting for the day that a GPT-3 scale model is released which integrates text, video, images, and audio. We’ve seen some progress on this front - namely Gato. But nothing that has really wow’ed us yet like ChatGPT or LaMDA. PaLM is really the only exception to this rule, but it was images and text only.
I think we all know this is coming soon, I’m wondering if anyone here is aware of any indications of this actively being worked on, or has any predictions for release dates. Especially for a video model.
A model which can take any combination of video, audio, image, and text tokens as input and output would most likely be very, very remarkable, making ChatGPT look like a toy in comparison.
Sashinii t1_j82ro4w wrote
They're still being developed. When they're ready, they'll be released to the general public (granted, probably not by the big companies, but they'll be open source versions by Stability AI).