Viewing a single comment thread. View all comments

bo_peng OP t1_jb1po7i wrote

Will the 150 lines help? Please read the code first :)

https://github.com/BlinkDL/ChatRWKV/blob/main/RWKV_in_150_lines.py

This is ALL you need for RWKV inference.

And you can read https://arxiv.org/abs/2302.13939 (SpikeGPT) which is inspired by RWKV and has plenty of explanations :)

4

_Arsenie_Boca_ t1_jb1wjfi wrote

It does help but certainly doesnt make everything clear. I am confident I could run inference on it, but my interest is rather academic than practical.

What is the magic number 5 all about? It seems to appear all over the code without explanation.

Are the time mixing and channel mixing operations novel or were they introduced by a citable work?

How does the parallelization during training work?

5