Quantized metal. #1594

Narsil · 2024-01-15T17:01:17Z

Working quantized state for candle.

High level overview:

Introduce QStorage to split Metal and Cpu ops.
Quantize/Dequantize still running on CPU even when asked to run on metal. Kernels exist but led to different quantization/dequantization. GGML removed those kernels on device too (I guess because of the differences).
I think we should keep the surface for on metal quantize/dequantize so we can easily implement them later. They are part of the ggml API imho.
Added a bunch of test, using test_device! in order to get similar testing behavior as regular Tensor.
quantized.metal is a direct copy of ggml's ggml-metal.metal. This choice was made so further dev could be made faster and bugs mentionned after can be imported more easily. All the glue logic is in candle_metal_kernels.
Introduces a new GgmlDType in candle_metal_kernels. Ggml uses different kernels based on size of matmul and hardware capacity. This wasn't implemented here, but could with the current API.

Worthy bug already discovered (not fixed in this PR since they do not belong here):

Q2K Metal -> Bugged (also present in GGML).
Q4K CPU -> Bugged (present reviously, new test catch it).
Q5K CPU -> Bugged (present previously).
Q8_1 Both -> Never really implemented it seems
Q8K metal -> Never implemented in metal

- Add a device param, wherever needed. - Create new QMetal storage thing that implements QuantizedType. - Update everywhere needed. Fix Python. Fixing examples. Fix: fmt + clippy + stub. Moving everything around. Only missing the actual implems. Fixing everything + adding dequantized kernels. More work. Fixing matmul. Fmt + Clippy Some clippy fixes. Working state. Q2K Metal -> Bugged (also present in GGML). Q4K CPU -> Bugged (present previously, new test catch it). Q5K CPU -> Bugged (present previously). Q8_1 Both -> Never really implemented it seems Q8K metal -> Never implemented in metal Fixing Q2K bug (present in ggml).

LaurentMazare · 2024-01-15T17:06:33Z

All the ggmldtype bits seems like an orthogonal refactoring that probably is orthogonal to metal? Could this be split in a separate PR? Also all the fences bits seem orthogonal too and could be extracted.

LaurentMazare · 2024-01-15T17:08:49Z

Also blck_size is not a typo, it's on purpose that it matches the llama.cpp nomenclature.

Narsil · 2024-01-17T09:25:43Z

All the ggmldtype bits seems like an orthogonal refactoring that probably is orthogonal to metal? Could this be split in a separate PR? Also all the fences bits seem orthogonal too and could be extracted.

No it's not. It's core to it. The reason is that a lot of the previous code was using the GgmlType (the block type, not the dtype) as a generic. This doesn't work for the metal bit since the buffer that actually store the data are untyped (unlike Vec), therefore we need to change that around. (I know we could use PhantomData, but seems very anti-pattern here, and overall the code seems much simpler like this).

For blck_size, I'm ok respecting llama.cpp convention, but to a point, here it's almost silly not writing it all out.

Narsil · 2024-01-17T09:28:15Z

Merged #1523 directly.

Nicolas Patry and others added 9 commits January 15, 2024 17:42

Cleanup.

2cd1e59

Fix the rebase.

61ad8d9

Removing the fences speeds everything up and *is* correct this time...

c8c603c

Cleanup the fence.

3aefc70

After rebase.

9ef0403

Bad code removal.

b2db5ad

Rebase after phi2 merge + fix replit default to CPU.

3dbf65e

Not implementing quantized.

9694671

Narsil added 2 commits January 15, 2024 18:31

Making the CI happy.

c35d7d5

More happy tests.

67d93b4

grzuy mentioned this pull request Jan 15, 2024

feat: support Metal device mimiquate/candlex#36

Draft

9 tasks

Narsil closed this Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantized metal. #1594

Quantized metal. #1594

Narsil commented Jan 15, 2024 •

edited

Loading

LaurentMazare commented Jan 15, 2024

LaurentMazare commented Jan 15, 2024

Narsil commented Jan 17, 2024

Narsil commented Jan 17, 2024

Quantized metal. #1594

Quantized metal. #1594

Conversation

Narsil commented Jan 15, 2024 • edited Loading

LaurentMazare commented Jan 15, 2024

LaurentMazare commented Jan 15, 2024

Narsil commented Jan 17, 2024

Narsil commented Jan 17, 2024

Narsil commented Jan 15, 2024 •

edited

Loading