Stanford-CS336: Language Modeling from Scratch

Personal implementation version

Thank you for this excellent course! ⭐

Preamble

This repository contains my personal implementations for the assignments of Stanford's CS336 course, along with answers to some related questions (note: the content is for reference only and may not be entirely accurate).

Large language models have been a highly popular topic in recent years, and related technologies are frequently discussed (e.g., GRPO as mentioned by DeepSeek). While many excellent libraries and tools have made it convenient to use and train large models, I still have many questions about certain technical details (for instance: how does tokenization actually work in large models? Why does the vocabulary file vocab.json often appear to contain no Chinese tokens?).

This course is exceptionally rigorous and well-designed—it guides students through building a large language model from scratch. The amount of coding required even made me wonder how this could be a one-semester course. That said, the process was undoubtedly rewarding and helped me solidify many foundational concepts.

I worked intermittently over a period of time and completed most of the assignment tasks. However, due to limited resources (lacking a GPU with sufficient memory), I skipped a few parts. For the sake of completeness, I’ve organized and shared the code I developed as a record of my learning. Although several assignments are placed in the same repository, each was originally an independent project, and instructions for each can be found in their respective directories.

Assignment (official repo)	Highlight	writeup
Basics	1.Byte-pair encoding 2.Tokenizer 3.Transformer 4.Ablation	writeup.md
Systems	1.FlashAttention 2 2.Profiling and Benchmarking	writeup.md
Scaling	1.Scaling Law in LLM	writeup.md
Data	1.The Common Crawl Dataset	writeup.md
Alignment and Reasoning RL	1.SFT 2.GRPO 3.DPO	writeup.md writeup_sup.md

Pass all test points

Show a screenshot of passing all test cases — it feels really satisfying to see.

Assignment 1 passed all tests.

Assignment 2 passed all tests.

Assignment 4 passed all tests.

Assignment 5 passed all tests.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
assignment1-basics		assignment1-basics
assignment2-systems		assignment2-systems
assignment3-scaling		assignment3-scaling
assignment4-data		assignment4-data
assignment5-alignment		assignment5-alignment
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Stanford-CS336: Language Modeling from Scratch

Personal implementation version

Preamble

Pass all test points

About

Uh oh!

Releases

Packages

Languages

License

zhasion/CS336

Folders and files

Latest commit

History

Repository files navigation

Stanford-CS336: Language Modeling from Scratch

Personal implementation version

Preamble

Pass all test points

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages