A fault-tolerant LLM routing system that decouples inference from AWS Bedrock by routing prefill and decode tasks through SQS and ensuring zero-downtime scaling with a graceful drain sidecar.
-
Updated
Mar 24, 2026 - Python
A fault-tolerant LLM routing system that decouples inference from AWS Bedrock by routing prefill and decode tasks through SQS and ensuring zero-downtime scaling with a graceful drain sidecar.
Extracted Features contains R code to chunk and disaggregate text data
Add a description, image, and links to the disaggregate topic page so that developers can more easily learn about it.
To associate your repository with the disaggregate topic, visit your repo's landing page and select "manage topics."