A better combination: DSA, mHC, and DSMoE similar to DeepSeek R2. https://wandb.ai/yingjun-xuda/Tiny-R2/reports/train-loss-26-01-20-21-52-23---VmlldzoxNTY4NzA2Nw
-
Notifications
You must be signed in to change notification settings - Fork 0
A better combination: DSA, mHC, and DSMoE similar to DeepSeek R2.
License
zhaoyingjun/Tiny-R2
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
A better combination: DSA, mHC, and DSMoE similar to DeepSeek R2.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published