Submitted by bo_peng t3_1135aew in MachineLearning
Hi everyone. I am an independent researcher working on my pure RNN language model RWKV. I have finished the training of RWKV-4 14B (FLOPs sponsored by Stability EleutherAI - thank you!) and it is indeed very scalable. Note RWKV is parallelizable too, so it's combining the best of RNN and transformer.
The ChatRWKV project (let's build together):
https://github.com/BlinkDL/ChatRWKV
Zero-shot comparison with NeoX / Pythia (same dataset: the Pile) at same params count (14.2B):
​
Generation results (simply topP=0.85, no repetition penalty) - looks great with my magic prompt (sometimes even better than NeoX 20B):
​
​
​
​
​
Explanation, fine-tuning, training and more:
[deleted] t1_j8o7gjw wrote
[removed]