feat: add GGUF/GGML model support via llama-cpp-python by crawfordxx · Pull Request #3826 · lm-sys/FastChat

crawfordxx · 2026-04-01T15:08:11Z

Summary

Adds a new model worker backend (gguf_worker.py) that uses llama-cpp-python to load and serve quantized GGUF/GGML models through FastChat.

New file: fastchat/serve/gguf_worker.py — follows the same worker interface pattern as mlx_worker.py and vllm_worker.py
Optional dependency: llama-cpp-python>=0.2.0 added to pyproject.toml under [gguf] extra
Documentation: docs/gguf_integration.md with setup and usage instructions

Features

Streaming and non-streaming text generation
Configurable GPU layer offloading (--n-gpu-layers)
Configurable context window (--n-ctx) and batch size (--n-batch)
Standard FastChat worker lifecycle (heartbeat, controller registration, semaphore concurrency)
Compatible with the OpenAI-compatible API server

Usage

pip install "llama-cpp-python>=0.2.0"

python3 -m fastchat.serve.gguf_worker \
    --model-path ./models/llama-2-7b-chat.Q4_K_M.gguf \
    --model-names llama-2-7b-chat \
    --conv-template llama-2 \
    --n-gpu-layers -1

Closes #2410

Add a new model worker backend that uses llama-cpp-python to load and serve quantized GGUF/GGML models. This enables running large language models efficiently on CPU or with partial GPU offloading. - Add gguf_worker.py following the same pattern as mlx_worker/vllm_worker - Support streaming and non-streaming text generation - Support configurable GPU offloading (n_gpu_layers), context size, batch size - Add llama-cpp-python as an optional dependency in pyproject.toml - Add integration documentation in docs/gguf_integration.md Closes lm-sys#2410

crawfordxx closed this Apr 1, 2026

crawfordxx deleted the feat-gguf-model-support branch April 1, 2026 15:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add GGUF/GGML model support via llama-cpp-python#3826

feat: add GGUF/GGML model support via llama-cpp-python#3826
crawfordxx wants to merge 1 commit intolm-sys:mainfrom
crawfordxx:feat-gguf-model-support

crawfordxx commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

crawfordxx commented Apr 1, 2026

Summary

Features

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant