Python: ADR: Unifying Context Management with ContextMiddleware (Python)#3609
Python: ADR: Unifying Context Management with ContextMiddleware (Python)#3609eavanvalkenburg wants to merge 10 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new ADR proposing a unified ContextMiddleware abstraction for Python to replace ContextProvider, ChatMessageStore, and AgentThread, using an onion/wrapper middleware pipeline pattern.
Changes:
- Introduces a proposed
ContextMiddleware+SessionContext+ pipeline design for composable context engineering. - Describes a
StorageContextMiddlewareapproach for loading/storing conversation history and optional auditing. - Outlines migration impact and a phased implementation/testing plan.
| --- | ||
| # These are optional elements. Feel free to remove any of them. | ||
| status: proposed | ||
| contact: eavanvalkenburg | ||
| date: 2026-02-02 | ||
| deciders: eavanvalkenburg, markwallace-microsoft, sphenry, alliscode, johanst, brettcannon | ||
| consulted: taochenosu, moonbox3, dmytrostruk, giles17 | ||
| --- | ||
|
|
||
| # Unifying Context Management with ContextMiddleware |
There was a problem hiding this comment.
This ADR is using the placeholder number “00XX” in the filename/path. Per docs/decisions/README.md (step 1), ADRs should be named with the next sequential number (currently 0015-…). Please rename the file accordingly and update any references to the old filename.
| return session | ||
| ``` | ||
|
|
There was a problem hiding this comment.
The SessionContext docstring refers to add_context_messages(), but the class defines add_messages() as the API for adding context messages. Please update the docstring to match the actual method name to avoid confusion.
| metadata: dict[str, Any] | None = None, | ||
| ): | ||
| self.session_id = session_id | ||
| self.service_session_id = service_session_id | ||
| self.input_messages = input_messages | ||
| self.context_messages: dict[str, list[ChatMessage]] = context_messages or {} | ||
| self.instructions: list[str] = instructions or [] | ||
| self.tools: list[ToolProtocol] = tools or [] |
There was a problem hiding this comment.
In the code sample, ContextMiddlewareFactory / ContextMiddlewareConfig reference ContextMiddleware before the ContextMiddleware class is defined. As written, this would raise a NameError at runtime unless you use from __future__ import annotations or quote the type / move these aliases below the class definition.
| tools: The tools to add | ||
| """ | ||
| for tool in tools: | ||
| # Add source attribution to tool metadata | ||
| if hasattr(tool, 'metadata') and isinstance(tool.metadata, dict): | ||
| tool.metadata["context_source"] = source_id | ||
| self.tools.extend(tools) | ||
|
|
||
| # --- Methods for reading context --- |
There was a problem hiding this comment.
The example in the ContextMiddleware docstring has incorrect indentation/structure: the “POST-PROCESSING” block appears nested under the factory function, not inside process(), which makes the example invalid and hard to follow. Please fix the snippet structure so the post-processing example is shown within process() after await next(context).
| 3. Response messages (if include_response=True) | ||
|
|
||
| Args: | ||
| include_input: If True, append input_messages after context | ||
| include_response: If True, append response_messages at the end |
There was a problem hiding this comment.
The ContextMiddleware.process() docstring references context.history_messages, but SessionContext does not define a history_messages attribute (history appears to be stored under context_messages keyed by source_id). Please update the docstring to reference the correct API/field so implementers know where to read loaded history from.
| UserWarning | ||
| ) | ||
|
|
||
| async def session_created(self, session_id: str | None) -> None: | ||
| """Notify all middleware that a session was created.""" | ||
| for middleware in self._middleware: | ||
| await middleware.session_created(session_id) | ||
|
|
||
| async def execute(self, context: SessionContext) -> None: | ||
| """Execute the middleware pipeline.""" |
There was a problem hiding this comment.
In the AgentSession sample, _ensure_default_storage() uses len(self._context_pipeline) and calls self._context_pipeline.prepend(...), but the provided ContextMiddlewarePipeline sample does not define __len__ or prepend. Please either add these APIs to the pipeline sample or adjust the sample logic (e.g., check self._context_pipeline is None / expose middleware length) to keep the ADR’s code consistent.
| Default storage behavior (applied at runtime, not init): | ||
| - If service_session_id is set: service handles storage, no default added | ||
| - If options.store=True: user expects service storage, no default added | ||
| - If no service_session_id AND store is not True AND no pipeline: | ||
| InMemoryStorageMiddleware is automatically added | ||
|
|
There was a problem hiding this comment.
In the ChatAgent sample, context_middleware is typed as Sequence[ContextMiddleware], but earlier in the ADR the configuration type is ContextMiddlewareConfig = ContextMiddleware | ContextMiddlewareFactory and ContextMiddlewarePipeline.from_config() expects configs (instances or factories). Please update the sample signature/type to Sequence[ContextMiddlewareConfig] (or similar) so it matches the proposed API.
| await next(context) | ||
|
|
||
| # Post-processing | ||
| await self.store_interaction(context.input_messages, context.response_messages) |
There was a problem hiding this comment.
Can you modify response_messages here? If you do, do those modifications get returned to the callers, both middleware higher in the stack and the user?
There was a problem hiding this comment.
I think the intent will be (and i'll clarify) that this is not the place to alter the responses, just read them and do something, the AgentMiddleware can be used for that purpose.
| InMemoryStorageMiddleware("memory"), | ||
| RAGContextMiddleware("rag"), |
There was a problem hiding this comment.
If the user only wanted to use the user input to do a rag search, rather than that plus the chat history, how do they filter to only user input?
There was a problem hiding this comment.
In the setup of each middleware there are controls over which messages they want to use, and since all messages from other context providers are stored separately that is fully configurable
There was a problem hiding this comment.
Is there an example of that?
| - No `service_session_id` (service not managing storage) | ||
| - `options.store` is not `True` (user not expecting service storage) | ||
| - Pipeline is empty or None |
There was a problem hiding this comment.
If the user only supplies regular context middleware and no storage context middleware, does the user get no chat history storage?
There was a problem hiding this comment.
So we can go two routes here, both only for when there is no service_session_id and store==False:
- If the pipeline is present but does not have StorageMiddleware -> Add InMemoryStorage
- If there is NO middleware(pipeline) -> Add pipeline with InMemoryStorage
The first might be easier for getting started but the second is clearer, because if you do already have a pipeline with a middleware the order suddenly matters, so we should clarify that in that case the user should do this themselves because we do not want to make assumptions about ordering. This will make sure that the simple case always works, but once people start adding their own middlewares, they should know what they want to do and thus I think approach two is the way to do.
There was a problem hiding this comment.
I agree that 1 is problematic for the same reasons as you listed.
so we should clarify that in that case the user should do this themselves
At the same time, I think this ^ is a pit of failure, since just adding RAG middleware to a functional agent would make that agent break, and the user has to go and read the documentation to figure out why.
| - No composability for context providers | ||
| - Inconsistent with middleware pattern used elsewhere | ||
|
|
||
| ### Option 2: ContextMiddleware (Chosen) |
There was a problem hiding this comment.
Thanks for this ADR! The unified ContextMiddleware pattern is a nice improvement for context organization and attribution.
One question: would this design support implementing context compaction strategies?
For example, a common need is letting agents run for arbitrary long periods by automatically compacting/replacing the message history when a max context budget is hit. E.g., agent runs calls 10s or 100s of tool calls + model call in succession and between those calls may compact context between.
Currently this is challenging to do today because:
- ChatMessageStore.list_messages() is only called once at the start of agent.run(), not during the tool loop
- ChatMiddleware operates on a copy of messages, so modifications don't persist across tool loop iterations
see some related notes here.
With the new ContextMiddleware, would it be possible to:
Have middleware run during tool loop iterations (not just at the invocation boundary)?
Allow a compaction strategy to truly replace the context mid-execution?
If this is out of scope for this ADR, it might be worth noting as a future consideration—or confirming whether the new architecture would make such a feature easier to add later (or there is some other recommended parttern for this).
There was a problem hiding this comment.
@victordibia thanks for the reply, I added a discussion section for this topic, I'm leaning for a CompactionStrategy abstraction that can be used both for and after storage and for function call loop (with no requirement to use the same one in different places as they serve different purposes), but check the discussion in the latest update here.
…dback - Add Option 3: ContextHooks with before_run/after_run pattern - Add detailed pros/cons for both wrapper and hooks approaches - Add Open Discussion section on context compaction strategies - Clarify response_messages is read-only (use AgentMiddleware for modifications) - Add SimpleRAG examples showing input-only filtering - Clarify default storage only added when NO middleware configured - Add RAGWithBuffer examples for self-managed history - Rename hook methods to before_run/after_run
e005fc3 to
761f2bf
Compare
- Add class hierarchy clarification for both options - Merge detailed design sections (side-by-side comparison) - Move detailed design before decision outcome - Move compaction discussion after decision - Add .NET implementation comparison (feature equivalence) - Update .NET method names to match actual implementation - Rename hook methods to before_run/after_run - Fix storage context table for injected context
- Note that class and method names are open for discussion - Add alternative method naming options table - Include invoking/invoked as option matching current Python and .NET
…filtering - Remove smart mode for load_messages (now explicit bool, default True) - Add attribution marker in additional_properties for message filtering - Update validation to warn on multiple or zero storage loaders - Add note about ChatReducer naming from .NET - Note that attribution should not be propagated to storage
Summary
This ADR proposes unifying \ContextProvider, \ChatMessageStore, and \AgentThread\ into a single \ContextMiddleware\ concept for the Python SDK.
Problem
Currently, developers doing 'Context Engineering' must understand multiple abstractions:
Proposed Solution
A unified \ContextMiddleware\ using the onion/wrapper pattern (like existing \AgentMiddleware):
\\python
class RAGMiddleware(ContextMiddleware):
async def process(self, context: SessionContext, next) -> None:
# Pre-processing: add context
docs = await self.retrieve_documents(context.input_messages[-1].text)
context.add_messages(self.source_id, [ChatMessage.system(f'Context: {docs}')])
agent = ChatAgent(
chat_client=client,
context_middleware=[
InMemoryStorageMiddleware('memory'),
RAGMiddleware('rag'),
]
)
\\
Key Decisions
Migration Impact
See the full ADR in \docs/decisions/00XX-python-context-middleware.md\ for detailed design, code examples, and implementation workplan.
Related Issues
This ADR addresses the following issues: