Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]
-
Updated
Mar 5, 2025 - Python
Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]
Context engineering toolkit for LLMs — pack, cache, debug, red-team, and orchestrate context windows. Council of Experts, adversarial testing, immune system, context compiler, drift detection, multi-agent entanglement. TypeScript + Python.
Edge-optimized OpenCUA-7B computer-use agent evaluated on OSWorld, exploring systematic vLLM inference optimizations across CPU and GPU, including precision tuning, image history management, speculative decoding, and prefix caching.
Add a description, image, and links to the prefix-caching topic page so that developers can more easily learn about it.
To associate your repository with the prefix-caching topic, visit your repo's landing page and select "manage topics."