[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
benchmark deployment tool evaluation pruning quantization post-training-quantization awq large-language-models llm vllm smoothquant mixtral internlm2 lvlm llama3 omniquant quarot lightllm spinquant
-
Updated
Jan 3, 2025 - Python