feat: add fastllm worker for high-performance inference by crawfordxx · Pull Request #3828 · lm-sys/FastChat

crawfordxx · 2026-04-01T15:19:03Z

Summary

Add a new model worker backend based on fastllm, addressing #2521.

New file: fastchat/serve/fastllm_worker.py — follows the BaseModelWorker pattern (consistent with vllm_worker.py, mlx_worker.py, etc.)
Streaming generation with support for temperature, top_p, top_k, repetition_penalty, max_new_tokens, and stop strings
Quantization: --dtype flag supports float16, float32, int8, int4
CPU threading: --threads flag for CPU-bound inference
Optional dependency: fastllm is imported only at runtime; clear error message if not installed
Documentation: docs/fastllm_integration.md with installation and usage instructions

Usage

# Launch controller
python3 -m fastchat.serve.controller

# Launch fastllm worker
python3 -m fastchat.serve.fastllm_worker \
    --model-path chatglm2-6b \
    --dtype int8 \
    --threads 8

Test Plan

Verify worker starts and registers with controller
Test streaming generation via /worker_generate_stream
Test non-streaming generation via /worker_generate
Test with different --dtype options (float16, int8, int4)
Test stop string handling
Verify heartbeat registration with controller

Closes #2521

Add a new model worker backend based on fastllm (https://github.com/ztxz16/fastllm), a high-performance LLM inference engine with strong CPU acceleration. - Add fastchat/serve/fastllm_worker.py following BaseModelWorker pattern - Support streaming generation with temperature, top_p, top_k, repeat_penalty - Support HuggingFace and .flm model formats - Support int4/int8/float16/float32 quantization via --dtype flag - fastllm is an optional dependency - Add docs/fastllm_integration.md with setup instructions Closes lm-sys#2521

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add fastllm worker for high-performance inference#3828

feat: add fastllm worker for high-performance inference#3828
crawfordxx wants to merge 1 commit intolm-sys:mainfrom
crawfordxx:feat-fastllm-worker

crawfordxx commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

crawfordxx commented Apr 1, 2026

Summary

Usage

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant