feat: support forwarding extra headers from provider data to upstream endpoints#5100
feat: support forwarding extra headers from provider data to upstream endpoints#5100Lucifergene wants to merge 7 commits intollamastack:mainfrom
Conversation
… endpoints Add an `extra_headers` field to provider data that allows callers to pass arbitrary HTTP headers through LLama Stack to upstream model endpoints. When a caller includes `extra_headers` in the `X-LlamaStack-Provider-Data` header JSON payload, those headers are forwarded as `default_headers` on the AsyncOpenAI client to the upstream provider. A shared security blocklist (providers/utils/headers.py) filters out hop-by-hop, framing, auth, and proxy/origin headers to prevent request smuggling, credential override, and origin spoofing. This enables use cases like forwarding subscription identifiers (e.g., X-MAAS-SUBSCRIPTION) or other custom metadata to model providers. Changes: - Add `extra_headers: dict[str, str] | None` to VLLMProviderDataValidator - Add `_get_extra_headers_from_provider_data()` to OpenAIMixin - Pass `default_headers` to AsyncOpenAI in the client property - Add shared `providers/utils/headers.py` with BLOCKED_HEADERS and filter_extra_headers() utility - Add comprehensive unit tests for header filtering and mixin integration Signed-off-by: Avik Kundu <47265560+Lucifergene@users.noreply.github.com>
|
|
||
| """Shared header-filtering utilities for provider data forwarding. | ||
| The safety passthrough provider (remote/safety/passthrough/config.py) maintains |
There was a problem hiding this comment.
@Lucifergene can you consolidate both of them? I don't think it makes sense to have two separate sets of headers.
|
hmm yeah, I am reading some comments on #5077, and I think we discussed not doing this and instead just doing the passthrough provider route at the weekly community meeting. I vote that we pump the breaks on this until we get input from @mattf who I think was part of that conversation? @skamenan7 as well to make sure this doesn't conflict with his work |
|
@cdoern Isn't there already the precedence of using Edit: Although I see that this PR isn't actually using that parameter. |
|
Hey @cdoern, happy to add some context here. The discussion in #4607 was specifically about @mattf's isolation concern: external providers loaded as modules into the same process can read PROVIDER_DATA_VAR and potentially steal credentials intended for other providers. @mattf's guidance there was that real isolation requires a process boundary, which is what the passthrough route provides. PR #5004 (safety passthrough, already merged) and #5040 (inference passthrough, tracked in the linked issue) are both aimed at that — deployer-controlled forward_headers config, default-deny, separate process. That's the isolation story from that thread. My read is that this PR is the complementary piece. remote::vllm and other OpenAIMixin providers are trusted, in-tree providers — they don't have the external provider isolation problem. The gap is just that callers can't pass per-request headers like X-MAAS-SUBSCRIPTION to them at runtime, which seems orthogonal to the isolation decision. That said, @mattf can advise/clarify more. Happy to coordinate either way. |
|
@Lucifergene I opened this PR for 5040 - #5134 i have a common utility to be used by all passthrough providers. currently safety and inference are using, others coming in follow up PRs. Please see if you can reuse any code by calling the utility but one caveat it is still pending reviews and might change a little before it lands. cc: @leseb |
|
for the record my position is two fold -
as for generic headers marked "extra_headers" getting passed through, that's kinda cool. it'd be nice if it were orthogonal to the header filtering, however - it's clear how to filter on "vllm_api_token" but how do you filter on generic "extra_headers" to prevent sending them to all backends? @Lucifergene will you make the issue you're solving more concrete? various backends ask for projects as well as auth tokens. is X-MAAS-SUBSCRIPTION specific to a backend you need to work with? maybe it's worth having an adapter for it, which would all passing maas_api_token and maas_project headers, which could also be filtered on. |
|
@mattf The case @Lucifergene is looking at would apply for API services that might wrap some model deployment. E.g. I could have some gateway API that adds chargebacks to some vLLM deployment. You would still need vLLM as a provider, but there might be custom headers (or body) that are needed by the wrapper API. |
|
Why would llamastack need to filter out headers that are passed with this mechanism? I would think it's the responsibility of the person making the request to make sure they're controlling what get's passed in to You would want to avoid forwarding headers passed into responses API, but in this case the user is explicitly passing in a header they want given to the model provider. |
if you still have the opportunity, reconsider allowing a token to cross chargeback boundaries. it'll be simpler for users - they can just use the appropriate key for the appropriate domain / project / pod / team / org / whatev. they won't have to figure out how to thread extra info about their project through call chains.
i agree with you, see #4607. i think the motivation is users may blast all their tokens on each of their requests. to be consistent w/ the ongoing work to narrow the headers that get passed along, you'll probably need to have "vllm_extra_headers" instead of a generic "extra_headers". |
|
@mattf I'm working with @Lucifergene on this, I have an alternative PR for what you're describing: #5217 (comment). Please let me know if that's what you were thinking when you get the chance! |
something like that could work. however, this whole need is very suspect. are there other providers that use headers to change the scope of a token? |
|
Closing this PR to move forward with a single approach and avoid confusion. Work continued in PR #5217 |
What does this PR do?
Add an
extra_headersfield to provider data that allows callers to pass arbitrary HTTP headers through Llama Stack to upstream model provider endpoints.When a caller includes
extra_headersin theX-LlamaStack-Provider-DataJSON payload, those headers are forwarded asdefault_headerson theAsyncOpenAIclient to the upstream provider. A shared security blocklist filters out hop-by-hop, framing, auth, and proxy/origin headers to prevent request smuggling, credential override, and origin spoofing.This builds on the
model_validationwork from #5013 / #5014 by @NickGagan, which enabled per-request API key injection viaX-LlamaStack-Provider-Data. This PR extends that same mechanism to support forwarding arbitrary custom headers alongside the API key.Closes #5077
Changes:
extra_headers: dict[str, str] | NonetoVLLMProviderDataValidator(vLLM only; other providers can opt in later)_get_extra_headers_from_provider_data()toOpenAIMixinwithgetattrguard for backward compatibilitydefault_headerstoAsyncOpenAI()in theclientpropertyproviders/utils/headers.pywithBLOCKED_HEADERSfrozenset andfilter_extra_headers()utilitySecurity:
The
BLOCKED_HEADERSlist is a superset of the safety passthrough provider's existing blocklist. It adds auth and proxy headers becauseextra_headersis caller-controlled (per-request), unlike the deployer-controlledforward_headersconfig:host,content-type,content-length,transfer-encoding,connection,upgrade,te,trailer,cookie,set-cookieauthorizationx-forwarded-for,x-forwarded-host,x-forwarded-proto,x-forwarded-prefix,x-real-ip,cf-connecting-ip,true-client-ipExample usage:
The upstream request to the vLLM endpoint will include
X-MAAS-SUBSCRIPTION: free-tierwhileAuthorizationfrom the blocked list is filtered out.