Submitted by bo_peng t3_1135aew in MachineLearning
lostmsu t1_j8stb1b wrote
Love the project, but after reading many papers I realize, that the lack of verbosity in formulas is deeply misguided.
Take this picture that explains RWKV attention: https://raw.githubusercontent.com/BlinkDL/RWKV-LM/main/RWKV-formula.png
What are the semantics of i
, j
, R
, u
, W
, and the function σ
? It should be obvious from the first look.
Viewing a single comment thread. View all comments