Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
6a7ff53
Q3_HIFI added
Nov 27, 2025
431fa1e
Update Q3_HIFI outliers count for accuracy improvement
geoffmunn Nov 29, 2025
7b5e058
Refactor quantization with optional quant_weights
geoffmunn Nov 29, 2025
13184ab
Add quantize_row_q3_hifi_ref function declaration
geoffmunn Nov 29, 2025
a91b6c8
Fix syntax error in ggml.c
geoffmunn Nov 29, 2025
1fb4f16
Add GGML_TYPE_Q3_HIFI case to ops.cpp
geoffmunn Nov 29, 2025
739f7d6
Add quantize_row_q3_hifi function declaration
geoffmunn Nov 29, 2025
7d003b2
Add LLAMA_FTYPE_MOSTLY_Q3_HIFI to llama.h
geoffmunn Nov 29, 2025
d0dcce9
Add Q3_HIFI type support in llama model loader
geoffmunn Nov 29, 2025
2e8e69a
Add support for GGML_TYPE_Q3_HIFI in llama-quant
geoffmunn Nov 29, 2025
3cf3235
Add Q3_HIFI quantization option
geoffmunn Nov 29, 2025
2a23338
Add comparison of Q3 quantization formats
geoffmunn Nov 29, 2025
10b2019
Add complete guide for Importance Matrix (imatrix) files
geoffmunn Nov 29, 2025
11c85c4
Add high-fidelity quantization function
geoffmunn Nov 29, 2025
ac8003e
Implement Q3_HIFI type in ggml-cpu.c
geoffmunn Nov 29, 2025
f4b5ecb
Revise Q3 quantization formats comparison document
geoffmunn Nov 29, 2025
d302e6d
Add GGML_API qualifier to dequantize_row_q3_hifi
geoffmunn Dec 3, 2025
230ee25
Add NEON-optimized dequantization for Q3_HIFI
geoffmunn Dec 3, 2025
f2a2d97
Implement AVX2 dequantization for Q3_HIFI
geoffmunn Dec 3, 2025
7d6a887
Update dequantize.cuh
geoffmunn Dec 3, 2025
c2b5957
Update ggml-metal.metal
geoffmunn Dec 3, 2025
27e8f1b
Create dequant_q3_hifi.comp
geoffmunn Dec 3, 2025
2025109
First round of optimisations, speed is 5.6x slower
GeoffApples Dec 11, 2025
ae313c5
Results updated
GeoffApples Dec 11, 2025
cc7c51d
ql/qh block structure updated
GeoffApples Dec 11, 2025
31200f1
Speed improvements made. 84% of base model.
GeoffApples Dec 11, 2025
40181d8
Hybrid tensor speed improvements
GeoffApples Dec 11, 2025
560865f
More CPU architecture support
GeoffApples Dec 11, 2025
e54de2c
Loop unrolling for small speed improvement
GeoffApples Dec 11, 2025
eeada9d
float casts for more speed improvements
GeoffApples Dec 11, 2025
1fb41ec
HIFI names consolidated
GeoffApples Dec 11, 2025
07eab7b
More GPU support improvements
GeoffApples Dec 11, 2025
5e74059
CUDA support added
GeoffApples Dec 11, 2025
ee314fd
Apple metal support
GeoffApples Dec 11, 2025
530b372
More GPU support
GeoffApples Dec 11, 2025
d834494
Conversion script updated
GeoffApples Dec 11, 2025
a7d56ac
Q3_HIFI tests added
GeoffApples Dec 11, 2025
0ca15bd
Merge pull request #1 from GeoffApples/Q3_HIFI_1.7B_fast
geoffmunn Dec 11, 2025
6ff0291
Vulkan shaders added
GeoffApples Dec 12, 2025
d7fb478
Merge pull request #2 from GeoffApples/Q3_HIFI_1.7B_fast
geoffmunn Dec 12, 2025
0189dd8
Syntax error fixed
GeoffApples Dec 12, 2025
697e328
Merge pull request #3 from GeoffApples/Q3_HIFI_1.7B_fast
geoffmunn Dec 12, 2025
8a4f2d4
Missing Q3_HIFI constants added
GeoffApples Dec 12, 2025
d8ae285
GPU disabled (bad results)
GeoffApples Dec 13, 2025
9344bfe
Latest speed improvements
GeoffApples Dec 13, 2025
c5bf27f
All 3 metrics now exceed Q3_K_M
GeoffApples Dec 13, 2025
1cf26dc
Documentation updated
GeoffApples Dec 13, 2025
9b58d82
Merge pull request #4 from GeoffApples/Q3_HIFI_1.7B_fast
geoffmunn Dec 13, 2025
0baa2c8
Q3_HIFI_A now the official version
GeoffApples Dec 13, 2025
bc8ba8a
Merge pull request #5 from GeoffApples/Q3_HIFI_1.7B_fast
geoffmunn Dec 13, 2025
2d4d0b3
Speed benchmark script added
GeoffApples Dec 14, 2025
a177f2c
Merge pull request #6 from GeoffApples/Q3_HIFI_1.7B_fast
geoffmunn Dec 14, 2025
bc3c5cf
Merge pull request #7 from ggml-org/master
geoffmunn Dec 14, 2025
0e6f3aa
Merge branch 'Q3_HIFI' into master
geoffmunn Dec 14, 2025
9971857
Merge pull request #8 from geoffmunn/master
geoffmunn Dec 14, 2025
42b6477
Old files removed
Dec 21, 2025
5792ab4
Cross-model documentation added
Dec 21, 2025
8b72146
Validation errors fixed
Dec 21, 2025
daf0e20
Whitespace fixed
Dec 21, 2025
bf0d021
Whitespace fixes
Dec 21, 2025
f79424e
Whitespace fixes
Dec 21, 2025
abcb4cc
Whitespace fixes
Dec 21, 2025
7724f7b
Whitespace changes
Dec 21, 2025
a6bb077
Whitespace fixes
Dec 21, 2025
9bae334
Whitespace fixes
Dec 21, 2025
dce3e67
Whitespace fixes
Dec 21, 2025
3e3f931
Whitespace fixes
Dec 21, 2025
972d662
Whitespace fixes
Dec 21, 2025
20390e2
Whitespace fixes
Dec 21, 2025
4851a00
print statements changed to logging()
Dec 21, 2025
9be1c3d
Extra blank line removed
Dec 21, 2025
c42d48f
Merge pull request #9 from geoffmunn/Q3_HIFI
geoffmunn Dec 21, 2025
dbf9a9a
Documentation moved
Dec 21, 2025
2c4049e
GGML_TYPE_Q3_HIFI now value 12
Dec 21, 2025
e4fd98f
GGML_TYPE_Q3_HIFI moved to end, numbers re-ordered
Dec 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -137,3 +137,18 @@ poetry.toml
/.windsurf/
# emscripten
a.out.*
wikitext-2-raw/wikitext-2-raw/wiki.test.raw
wikitext-2-raw/wikitext-2-raw/wiki.train.raw
wikitext-2-raw/wikitext-2-raw/wiki.valid.raw
Qwen3-1.7B/.gitattributes
Qwen3-1.7B/config.json
Qwen3-1.7B/generation_config.json
Qwen3-1.7B/LICENSE
Qwen3-1.7B/merges.txt
Qwen3-1.7B/model-00001-of-00002.safetensors
Qwen3-1.7B/model-00002-of-00002.safetensors
Qwen3-1.7B/model.safetensors.index.json
Qwen3-1.7B/README.md
Qwen3-1.7B/tokenizer_config.json
Qwen3-1.7B/tokenizer.json
Qwen3-1.7B/vocab.json
Loading