Releases · WorldFlowAI/semblend

What's New

Fuzzy Chunk Matching

Confidence-gated fuzzy alignment recovers 100% KV reuse on shifted-prefix scenarios (vs 0% with exact-only matching)

PQ segment store for memory-efficient chunk comparison (32x compression, ~137MB at 100K donors)

Three-tier decision: fast_reuse / verified_reuse / recompute

Configurable per-chunk confidence scoring (overlap + positional coherence + position delta decay)

Benchmark Framework

Paper reproduction suite (benchmarks/suite/reproduce.py)

Tiered validation runner (exact replay, cross-instruction, reorder, multi-turn, RAG template)

Pre-flight verification (GPU type, patched LMCache, fuzzy matching)

Bootstrap 95% CIs on all results

Log parser for ground-truth SemBlend hit detection

Results (A100, Qwen2.5-7B-AWQ)

TriviaQA: 26.0% hit (paper: 24.8%)

Cross-instruction: 87-100% hit, 2.15-2.42x speedup

Fuzzy shifted-prefix: 100% hit, 2.25x speedup

Quality PPL: ≤1.007 (paper bound: ≤1.065)

SemBlend v0.2.0

New Integrations

TRT-LLM: SemBlendBackend, KV cache layout adapter, model engine hooks, SemanticCacheLookupProvider + PostPrefixLoadHook upstream ABCs, turnkey launcher (semblend-trtllm)
Dynamo KVBM: SemBlendKvIndexerWrapper, SemBlendEventPublisher, SemanticKvIndexer Rust crate implementing Dynamo's KvIndexerInterface
dynamo-semblend Rust crate: 16 tests, SIMD cosine search, embedding sidecar client

Embedding Improvements

Full-document parallel segmented embedding — 100% document coverage via overlapping 512-token windows with mean pooling
MiniLM GPU auto-detection — uses last available GPU to avoid contending with inference model
Removed sentence sorting — segmented mean pooling is inherently order-invariant (0.996 cosine for reordered docs). Sorting was fragile (broke on code/markdown) and hurt cross-instruction similarity.

Benchmark Results (SGLang, Qwen2.5-7B-Instruct, A10G)

Dataset	v0.1.1 (with sorting)	v0.2.0 (no sorting)	Delta
TriviaQA	3.5% hit	22.6% hit	+19.1pp
SCBench	4.0%	13.6%	+9.6pp
WikiText103	10.0%	15.7%	+5.7pp
LongEval	9.7%	15.2%	+5.5pp
NarrativeQA	16.7%	17.4%	+0.7pp

On-hit speedups: LongEval avg 10.36x (max 27.88x), NarrativeQA avg 2.03x, TriviaQA avg 1.61x

Install

pip install semblend                # core
pip install semblend[vllm]          # + vLLM/LMCache
pip install semblend[sglang]        # + SGLang
pip install semblend[trtllm]        # + TRT-LLM
pip install semblend[embedder]      # + sentence-transformers

CacheBlend Note

For selective layer recomputation (CacheBlend), vLLM requires PR #37339.

Test Coverage

117 Python tests (TRT-LLM: 54, Dynamo: 14, core/SGLang/vLLM: 49)
16 Rust tests (dynamo-semblend)
15 Rust tests (semrouter)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's New

Fuzzy Chunk Matching

Benchmark Framework

Results (A100, Qwen2.5-7B-AWQ)

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

SemBlend v0.2.0

New Integrations

Embedding Improvements

Benchmark Results (SGLang, Qwen2.5-7B-Instruct, A10G)

Install

CacheBlend Note

Test Coverage

Uh oh!

Releases: WorldFlowAI/semblend

v0.3.0 — Fuzzy Chunk Matching + Benchmark Framework

What's New

Fuzzy Chunk Matching

Benchmark Framework

Results (A100, Qwen2.5-7B-AWQ)

Uh oh!

SemBlend v0.2.0

SemBlend v0.2.0

New Integrations

Embedding Improvements

Benchmark Results (SGLang, Qwen2.5-7B-Instruct, A10G)

Install

CacheBlend Note

Test Coverage

Uh oh!