harharveryfunny
harharveryfunny t1_je50vw9 wrote
Reply to [Discussion] IsItBS: asking GPT to reflect x times will create a feedback loop that causes it to scrutinize itself x times? by RedditPolluter
There's no indication that I've seen that it maintains any internal state from one word generated to the next. Therefore the only way it can build upon it's own "thoughts" is by generating "step-by-step" output which is fed back into it. It seems its own output is its only working memory, at least for now (GPT-4), although that's an obvious area for improvement.
harharveryfunny t1_jdmd38s wrote
Reply to comment by alrunan in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
>You should read the LLaMA paper.
OK - will do. What specifically did you find interesting (related to scaling or not) ?
harharveryfunny t1_jdm3bm4 wrote
It seems most current models don't need the number of parameters that they have. DeepMind did a study on model size vs number of training tokens and concluded that for each doubling of number of parameters the number of training tokens also needs to double, and that a model like GPT-3, trained on 300B tokens would really need to be trained on 3.7T tokens (a 10x increase) to take advantage of it's size.
To prove their scaling law, DeepMind built the 70B params Chinchilla model, and trained it on the predicted optimal 1.4T (!) tokens, and found it to outperform GPT-3.
harharveryfunny t1_jdj9if0 wrote
Reply to comment by TikiTDO in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
I'm not sure what your point is.
I started by pointing out that there are some use cases (giving face comparison as an example) where you need access to the neural representation of the image (e.g. embeddings), not just object recognition labels.
You seem to want to argue and say that text labels are all you need, but now you've come full circle back to agree with me and say that the model needs that neural representation (embeddings)!
As I said, embeddings are not the same as object labels. An embedding is a point in n-dimensional space. A label is an object name like "cat" or "nose". Encoding an embedding as text (simple enough - just a vector of numbers) doesn't turn it into an object label.
harharveryfunny t1_jdj5mom wrote
Reply to comment by TikiTDO in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
>it would just be a text-encoded representation of an embedding vector
One you've decided to input image embeddings into the model, you may as well enter them directly, not converted into text.
In any case, embeddings, whether represented as text or not, are not the same as object recognition labels.
harharveryfunny t1_jdic1s3 wrote
Reply to comment by TikiTDO in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
>Obviously having the visual system provide data that the model can use directly is going to be far more effective, but nothing about dense object detection and description is going to be fundamentally incompatible with any level of detail you could extract into an embedding vectror. I'm not saying it would be a smart or effective solution, but it could be done.
I can't see how that could work for something like my face example. You could individually detect facial features, subclassified into hundreds of different eye/mouth/hair/etc/etc variants, and still fail to capture the subtle differences that differentiate one individual from another.
harharveryfunny t1_jdhkn99 wrote
Reply to [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
> GPT-4 with image input can interpret any computer screen
Not necessarily - it depends how they've implemented it. If it's just dense object and text detection, then that's all you're going to get.
For the model to be able to actually "see" the image they would need to feed it into the model at the level of neural net representation, not post-detection object description.
For example, if you wanted the model to guage whether two photos of someone not in it's training set are the same person, then it'd need face embeddings to do that (to gauge distance). They could special case all sorts of cases like this in addition to object detection, but you could always find something they missed.
The back-of-a-napkin hand-drawn website sketch demo is promising, but could have been done via object detection.
In the announcement of GPT-4, OpenAI said they're working with another company on the image/vision tech, and gave a link to an assistive vision company... for that type of use maybe dense labelling is enough.
harharveryfunny t1_jckltrp wrote
> I think it should be possible to replicate even GPT-4 with open source tools something like Bloom + FlashAttention & fine-tune on 32k tokens.
So you mean build a model with a 32K attention window, but somehow initialize it with weights from BLOOM (2K window) then finetune ? Are you aware of any attempts to do this sort of thing ?
harharveryfunny t1_jcchnkp wrote
Reply to comment by Alimbiquated in Modern language models refute Chomsky’s approach to language [R] by No_Draft4778
That's a bogus comparison, for a number of reasons such as:
-
These models are learning vastly more than language alone
-
These models are learning in an extraordinarily difficult way with *only* "predict next word" feedback and nothing else
-
Humans learn in a much more efficient, targetted, way via curiosity-driven knowledge gap filling
-
Humans learn via all sorts of modalities in addition to language. Having already learnt a concept then we only need to be given a name for it once for it to stick
harharveryfunny t1_jca7x9f wrote
Yes - the Transformer is proof by demonstration that you don't need a language-specific architecture to learn language, and also that you can learn language via prediction feedback, which it highly likely how our brain does it too.
Chomsky is still he sticking to his innateness opinion though (with Gary Marcus cheering him on). Perhaps Chomsky will now claim that Broca's area is a Transformer?
harharveryfunny t1_jbjxolz wrote
Reply to comment by EmbarrassedHelp in [D] Why are so many tokens needed to train large language models? by blacklemon67
The LLM name for things like GPT-3 seems to have stuck, which IMO is a bit unfortunate since it's rather misleading. They certainly wouldn't need the amount of data they do if the goal was merely a language model, nor would we need to have progressed past smaller models like GPT-1. The "predict next word" training/feedback may not have changed, but the capabilities people are hoping to induce in these larger/ginormous models is now way beyond language and into the realms of world model, semantics and thought.
harharveryfunny t1_jbjk9nb wrote
Reply to comment by harharveryfunny in [D] Why are so many tokens needed to train large language models? by blacklemon67
Just to follow up, the reason why the "interact with the world" approach is way more efficient is because it's largely curiosity driven - we proactively try to fill gaps in our knowledge rather than just go read a set of encyclopedias and hope it might cover what we need to know. We learn in a much more targeted fashion..
harharveryfunny t1_jbjhmif wrote
Humans don't learn by locking themselves in a room at birth with a set of encyclopedias, or a print-out of the internet. We learn by interaction with the world - perceive/generalize/theorize/experiment, learn from feedback, etc.
It's impressive how well these LLM's perform given what is really a very tough task - build an accurate world model given only "predict next word" feedback, but hardly surprising that they need massive amounts of data to compensate for the task being so tough.
harharveryfunny t1_javmsab wrote
Reply to comment by Thin_Sky in [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir
It's a leak, but seems to be legitimate.
https://twitter.com/transitive_bs/status/1628118163874516992
harharveryfunny t1_jamab7m wrote
Reply to comment by londons_explorer in [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir
The two pair up very well though - now that there's a natural language API, you could leverage that for speech->text->ChatGPT. From what I've seen of the Whisper demos, it seems to be the best out there by quite a margin. Does anything else perform as well?
harharveryfunny t1_jaj8bk2 wrote
Reply to comment by Educational-Net303 in [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir
Could you put any numbers to that ?
What are the FLOPS per token inference for a given prompt length (for a given model)?
What do those FLOPS translate to in terms of run time on Azure's GPUs (V100's ?)
What is the GPU power consumption and data center electricity costs ?
Even with these numbers can we really relate this to their $/token pricing scheme ? The pricing page mentions this 90% cost reduction being for the "gpt-3.5-turbo" model vs the earlier davinci-text-3.5 (?) one - do we even know the architectural details to get the FLOPs ?
harharveryfunny t1_jairuhd wrote
Reply to [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir
It says they've cut their costs by 90%, and are passing that saving onto the user. I'd have to guess that they are making money on this, not just treating it as a loss-leader for other more expensive models.
The way the API works is that you have to send the entire conversation each time, and the tokens you will be billed for include both those you send and the API's response (which you are likely to append to the conversation and send back to them, getting billed again and again as the conversation progresses). By the time you've hit the 4K token limit of this API, there will have been a bunch of back and forth - you'll have paid a lot more than 4K * 0.2c/1K for the conversation. It's easy to imagine chat-based API's becoming very widespread and the billable volume becoming huge. OpenAI are using Microsoft Azure compute, who may see a large spike in usage/profits out of this.
It'll be interesting to see how this pricing, and that of competitors evolves. Interesting to see also some of OpenAI's annual price plans outlined elsewhere such as $800K/yr for their 8K token limit "DV" model (DaVinci 4.0?), and $1.5M/yr for the 32K token limit "DV" model.
harharveryfunny t1_j9bf30y wrote
Reply to comment by tdgros in [D] Something basic I don't understand about Nerfs by alik31239
OP's question seems to be how to get from 2D images to the 3D voxels, no? But anyways if they've got their answer that's good.
Edit: I guess they were talking about camera position for the photos, not mapping to 3D.
harharveryfunny t1_j9b30et wrote
Reply to comment by harharveryfunny in [D] Something basic I don't understand about Nerfs by alik31239
Not sure why this got downvoted given that it's correct. ChatGPT is also well capable of explaining how this mapping is learnt (using a view-consistency loss mapping from the 3D voxels back to a 2D view and comparing to image).
harharveryfunny t1_j9aydo9 wrote
Here's the key, thanks to CHatGPT:
Data preparation: First, the training data is preprocessed to convert the 2D images and camera poses into a set of 3D points and corresponding colors. Each 2D image is projected onto a 3D point cloud using the corresponding camera pose, resulting in a set of 3D points with associated colors.
harharveryfunny t1_j88kg27 wrote
Reply to comment by SatoshiNotMe in [D] Have their been any attempts to create a programming language specifically for machine learning? by throwaway957280
>some of the things that make ML code inscrutable are that (a) every tensor has a shape that you have to guess, and keep track of as it goes through the various layers
That's not inherent to ML though - that's a library design choice to have tensor shape be defined at runtime vs compile time. A while back I wrote my own framework in C++ and chose to go with compile-time shapes, which as well as preventing shape errors is more in keeping with C++'s typing. For a dynamically typed language like Python maybe runtime-defined types/shapes seems a more natural choice, but still a choice nonetheless.
harharveryfunny t1_j7ly2jz wrote
Reply to comment by bartturner in [N] Google: An Important Next Step On Our AI Journey by EducationalCicada
And then he travelled back in time to go write that paper at U.Montreal ?
Anyways, Schmidhuber was the real inventor ;-)
harharveryfunny t1_j7lvevz wrote
Reply to comment by bartturner in [N] Google: An Important Next Step On Our AI Journey by EducationalCicada
https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf
See page 1 footnote : "Goodfellow did this work as a UdeM student".
harharveryfunny t1_j7lu67f wrote
Reply to comment by bartturner in [N] Google: An Important Next Step On Our AI Journey by EducationalCicada
What underlying are you talking about? Are you even familiar with the "Attention" paper and it's relevance here? Maybe you think OpenAI use Google's Tensorflow? They don't.
GANs were invented by Ian Goodfellow while he was a student at. U.Montreal, before he ever joined Google.
No - TPUs are not key to deploying at scale unless you are targeting Google cloud. Google is a distant 3rd in cloud marketshare behind Microsoft and Amazon. OpenAI of course deploy on Microsoft Azure, not Google.
harharveryfunny t1_je9t0b4 wrote
Reply to [D] Can large language models be applied to language translation? by matthkamis
Just try it ! Yes - they do very well.
You don't even need to ask them to translate - just give them a foreign language source and ask questions about it, or ask for a summary !