farmingvillein
farmingvillein t1_j004cnd wrote
Reply to [D] Why are ChatGPT's initial responses so unrepresentative of the distribution of possibilities that its training data surely offers? by Osemwaro
Yes, it could be a function of RL, or it could be simply how they are sampling from the distribution.
If this is something you truly want to investigate, I'd start by first running the same tests with "vanilla" GPT (to possibly include avoiding the InstructGPT variant, if you are concerned about RL distortion).
As a bonus, most of the relevant sampling knobs are exposed, so you can make it more or less conservative in terms of how widely it samples from the distribution (this, potentially, is the bigger driver in what you are seeing).
farmingvillein t1_izy22t6 wrote
Reply to comment by gwern in [D] G. Hinton proposes FF – an alternative to Backprop by mrx-ai
Got it.
I'm going to guess that the author meant that you could stick a black box in the middle and all of the neurons could still be trained (but not the black box itself).
farmingvillein t1_izxuivo wrote
Reply to comment by gwern in [D] G. Hinton proposes FF – an alternative to Backprop by mrx-ai
> It mentions that it can handle non-differentiable blackbox components. I don't quite intuit why
Isn't this just because there is no backwards pass being calculated? Where you're taking a loss, then needing to calculate a gradient, etc.
Or am I missing something.
farmingvillein t1_izvq3i8 wrote
Reply to comment by maxToTheJ in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187
> I dont see anything about the input being that.
Again, this has absolutely nothing to do with the discussion here, which is about memory outside of the prompt.
Again, how could you possibly claim this is relevant to the discussion? Only an exceptionally deep lack of conceptual understanding could cause you to make that connection.
farmingvillein t1_izvnwdh wrote
Reply to comment by maxToTheJ in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187
...the whole twitter thread, and my direct link to OpenAI, are about the upper bound. The 822 number is irrelevant (given that OpenAI itself tells us that the window is much longer), and the fact that you pulled it tells me that you literally don't understand how transformers or the broader technology works, and that you have zero interest in learning. Are you a Markov chain?
farmingvillein t1_izvlogd wrote
Reply to comment by the_timps in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187
Honestly, OP has to be a bot or simply just a loon. Look at, e.g., https://www.reddit.com/r/MachineLearning/comments/zjbsie/d_has_open_ai_said_what_chatgpts_architecture_is/izvks5s/.
He/it is crazily out to lunch--one step above vomiting out random text.
farmingvillein t1_izvlja1 wrote
Reply to comment by maxToTheJ in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187
I linked you to a discussion about the context window. You then proceeded to pull a tweet within that thread which was entirely irrelevant. You clearly have no idea about the underlying issue we are discussing (and/or, again, are some sort of bot-hybrid).
farmingvillein t1_izvks5s wrote
Reply to comment by maxToTheJ in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187
Are you a bot? The 822 limit has nothing to do with the context window (other than being a lower bound). The tweet thread is talking about an ostensible limit to the prompt description.
farmingvillein t1_izvka14 wrote
Reply to comment by maxToTheJ in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187
> is really indicative of a 822 limit
This is not germane to our conversation at all. Do you understand the underlying discussion we are having?
farmingvillein t1_izvicpn wrote
Reply to comment by maxToTheJ in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187
> Having a bigger window is a parameter while the context windows implementation in the code is the technique
Do you work at OpenAI? If yes, awesome. If no, how can you make this claim?
OpenAI has released few details about how ChatGPT was built.
farmingvillein t1_izvh1t9 wrote
Reply to comment by maxToTheJ in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187
> How do you figure BlenderBot does that?
BlenderBot paper specifically states that it is a combination of your standard transformer context window and explicit summarization operations.
> What qualifies as a technique?
Whatever would be needed to replicate the underlying model/system.
It could just be a vanilla transformer n^2 context window, but this seems unlikely--see below.
> Source?
GPT3 (most recent iteration) context window is 2048 tokens; ChatGPT is supposedly ~double (https://help.openai.com/en/articles/6787051-does-chatgpt-remember-what-happened-earlier-in-the-conversation).
This, on its own, would suggest some additional optimizations, as n^2 against a context window of (presumably) ~4096 tokens gets very expensive, and generally unrealistic.
(More generally, it would be surprising to see a scale-up to a window of that size, given the extensive research already extant on scaling up context windows, while breaking the n^2 bottleneck.)
Further, though, investigation suggests that the "official" story here is either simply not correct, or it is missing key additional techniques; i.e., under certain experimental contexts, it seems to have a window that operates beyond the "official" spec (upwards of another 2x): e.g., see https://twitter.com/goodside/status/1598882343586238464
Like all things, it could be that the answer is simply "more hardware"--but, right now, we don't know for sure, and there have been copious research papers on dealing with this scaling issue more elegantly, so, at best, we can say that we don't know. And the probabilistic leaning would be that something more sophisticated is going on.
farmingvillein t1_izvcehu wrote
Reply to comment by maxToTheJ in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187
> A) The paper tells you all the ingredients.
Maybe, maybe not--expert consensus is probaby not. BlenderBot, e.g., uses different techniques to achieve long-term conversational memory. Not clear what techniques ChatGPT is using.
> B) "apparently" means that it isnt a known effect.
General consensus is that there is either a really long context window going on or (more likely) some sort of additional long-term compression technique.
> D) Clearly nobody wants to put in the work to read the blog less the paper
Neither of these address the apparently improved long-term conversational memory improvements observed with ChatGPT--unless it turns out to just be a longer context window (which seems unlikely).
Everyone is tea-leaf reading, if/until OpenAI opens the kimono up, but your opinion is directly contrary to the expert consensus.
farmingvillein t1_izv9qkm wrote
Reply to comment by the_timps in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187
> "It's buried in a paper I linked" is not answering someone's question at all.
Lol, yeah, particularly when the answer isn't in the paper.
farmingvillein t1_izv9oug wrote
Reply to comment by maxToTheJ in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187
Because the paper does not at all address the (apparently) longer-term context memory that ChatGPT displays.
farmingvillein t1_izv2hb8 wrote
Reply to comment by maxToTheJ in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187
OP appears to be asking about the apparent conversational memory, not the general architecture. Your links do not address that.
farmingvillein t1_izi021q wrote
Reply to comment by fasttosmile in [D] Workflows for quickly iterating over ideas without free access to super computers by [deleted]
True, but no one has really come up with a better methodology.
The best you can do is train on smaller data + make sure that you can tell yourself a story about how the new technique will still help when data is scaled up (and then hope that you are right).
(The latter is certainly argument for staying at least semi-current with the literature, as it will help you get an intuition for what might scale up and what probably won't.)
farmingvillein t1_ixrvyv1 wrote
I tried a few iterations and the results were...unimpressive...to say the least.
Is this configured to do fewer iterative passes (to save $$$), for example? Totally understand if so, given that it is a public/free interface...just trying to rationalize why things are so meh.
farmingvillein t1_ixoyqey wrote
Reply to [R] Getting GPT-3 quality with a model 1000x smaller via distillation plus Snorkel by bradenjh
What is "GT"? The article does not appear to define.
farmingvillein t1_ixncbjn wrote
Reply to [Discussion] Suggestions on how to annotate X-ray images with only radiology reports available by AJ521432
Check out https://twitter.com/cxbln/status/1595652302123454464:
> 🎉Introducing RoentGen, a generative vision-language foundation model based on #StableDiffusion, fine-tuned on a large chest x-ray and radiology report dataset, and controllable through text prompts!
(Not your full problem, but you may find it helpful!)
More generally, you could probably use the same image-to-text techniques that get used to validate a stablediffusion model.
Or for a really quick-and-dirty solution, you could try using a model like theirs to generate training data (image, text pairs) and train an image => txt model (which they do a variant of in the paper).
farmingvillein t1_ixie2x1 wrote
Reply to comment by plocco-tocco in [D] Schmidhuber: LeCun's "5 best ideas 2012-22” are mostly from my lab, and older by RobbinDeBank
AGI is in there, somewhere, he promises.
farmingvillein t1_ixidh3b wrote
Reply to comment by [deleted] in [D] Schmidhuber: LeCun's "5 best ideas 2012-22” are mostly from my lab, and older by RobbinDeBank
Why do research? He already discovered everything!
farmingvillein t1_ixictvr wrote
Reply to comment by graphicteadatasci in [R] Human-level play in the game of Diplomacy by combining language models with strategic reasoning — Meta AI by hughbzhang
Hmm. Did you read the full paper?
They didn't create a model that does one thing.
They built a whole host of models, with high levels of hand calibration, each configured for a separate task.
farmingvillein t1_ixgd88a wrote
Reply to comment by Acceptable-Cress-374 in [R] Human-level play in the game of Diplomacy by combining language models with strategic reasoning — Meta AI by hughbzhang
Yeah, understood, but that wasn't really what was going on here (unless you take a really expansive definition).
They were basically doing a ton of hand-calibration of a very large # of models, to achieve the desired end-goal performance--if you read the supplementary materials, you'll see that they did a lot of very fiddly work to select model output thresholds, build training data, etc.
On the one hand, I don't want to sound overly critical of a pretty cool end-product.
On the other, it really looks a lot more like a "product", in the same way that any gaming AI would be, than a singular (or close to it) AI system which is learning to play the game.
farmingvillein t1_ixdp4t8 wrote
Reply to [R] Human-level play in the game of Diplomacy by combining language models with strategic reasoning — Meta AI by hughbzhang
Very neat! Would love to see a version built with fewer filters (secondary models)--i.e., more grounded in a singular, "base" model // less hand-tweaking--but otherwise very cool. (Although wouldn't surprise me if simply upgrading the model size went a long way here.)
farmingvillein t1_j0ejk8l wrote
Reply to [P] Medical question-answering without hallucinating by tmblweeds
Check out // compare with https://crfm.stanford.edu/2022/12/15/pubmedgpt.html, if you haven't yet.