[D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? Submitted by 029187 t3_zjbsie on December 11, 2022 at 10:16 PM in MachineLearning 103 comments 244
[D] - Why do Attention layers work so well? Don't weights in DNNs already tell the network how much weight/attention to give to a specific input? (High weight = lots of attention, low weight = little attention) Submitted by 029187 t3_xtzmi2 on October 2, 2022 at 8:56 PM in MachineLearning 26 comments 45
[Discussion] If we had enough memory to always do full batch gradient descent, would we still need rmsprop/momentum/adam? Submitted by 029187 t3_xt0h2k on October 1, 2022 at 5:01 PM in MachineLearning 20 comments 3