prefix-caching

Here are 3 public repositories matching this topic...

ruipeterpan / marconi

Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]

llm-inference mamba-state-space-models prefix-caching hybrid-llm

Updated Mar 5, 2025
Python

dr-gareth-roberts / context-engineering

Star

Context engineering toolkit for LLMs — pack, cache, debug, red-team, and orchestrate context windows. Council of Experts, adversarial testing, immune system, context compiler, drift detection, multi-agent entanglement. TypeScript + Python.

python typescript ai multi-agent rag llm prompt-engineering llm-tools context-window prefix-caching context-engineering adversarial-testing token-budget council-of-experts context-packing

Updated Mar 24, 2026
Python

rohanarcot / ECUA-OSWorld-OpenCUA

Star

Edge-optimized OpenCUA-7B computer-use agent evaluated on OSWorld, exploring systematic vLLM inference optimizations across CPU and GPU, including precision tuning, image history management, speculative decoding, and prefix caching.

quantization agents multimodal inference-optimization edge-ai vllm speculative-decoding gui-agents prefix-caching osworld opencua

Updated Dec 18, 2025
Python

Improve this page

Add a description, image, and links to the prefix-caching topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the prefix-caching topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly