Skip to content

feat: support forwarding extra headers from provider data to upstream endpoints#5100

Closed
Lucifergene wants to merge 7 commits intollamastack:mainfrom
Lucifergene:extra-headers
Closed

feat: support forwarding extra headers from provider data to upstream endpoints#5100
Lucifergene wants to merge 7 commits intollamastack:mainfrom
Lucifergene:extra-headers

Conversation

@Lucifergene
Copy link

@Lucifergene Lucifergene commented Mar 11, 2026

What does this PR do?

Add an extra_headers field to provider data that allows callers to pass arbitrary HTTP headers through Llama Stack to upstream model provider endpoints.

When a caller includes extra_headers in the X-LlamaStack-Provider-Data JSON payload, those headers are forwarded as default_headers on the AsyncOpenAI client to the upstream provider. A shared security blocklist filters out hop-by-hop, framing, auth, and proxy/origin headers to prevent request smuggling, credential override, and origin spoofing.

This builds on the model_validation work from #5013 / #5014 by @NickGagan, which enabled per-request API key injection via X-LlamaStack-Provider-Data. This PR extends that same mechanism to support forwarding arbitrary custom headers alongside the API key.

Closes #5077

Changes:

  • Add extra_headers: dict[str, str] | None to VLLMProviderDataValidator (vLLM only; other providers can opt in later)
  • Add _get_extra_headers_from_provider_data() to OpenAIMixin with getattr guard for backward compatibility
  • Pass default_headers to AsyncOpenAI() in the client property
  • Add shared providers/utils/headers.py with BLOCKED_HEADERS frozenset and filter_extra_headers() utility
  • Add comprehensive unit tests for both the shared utility and the mixin integration

Security:

The BLOCKED_HEADERS list is a superset of the safety passthrough provider's existing blocklist. It adds auth and proxy headers because extra_headers is caller-controlled (per-request), unlike the deployer-controlled forward_headers config:

  • Hop-by-hop / framing: host, content-type, content-length, transfer-encoding, connection, upgrade, te, trailer, cookie, set-cookie
  • Auth: authorization
  • Proxy / origin: x-forwarded-for, x-forwarded-host, x-forwarded-proto, x-forwarded-prefix, x-real-ip, cf-connecting-ip, true-client-ip

Example usage:

curl -X POST http://localhost:8321/v1/chat/completions \
  -H 'X-LlamaStack-Provider-Data: {"vllm_api_token": "key", "extra_headers": {"X-MAAS-SUBSCRIPTION": "free-tier"}}' \
  -d '{"model": "my-model", "messages": [{"role": "user", "content": "Hello"}]}'

The upstream request to the vLLM endpoint will include X-MAAS-SUBSCRIPTION: free-tier while Authorization from the blocked list is filtered out.

… endpoints

Add an `extra_headers` field to provider data that allows callers to pass
arbitrary HTTP headers through LLama Stack to upstream model endpoints.

When a caller includes `extra_headers` in the `X-LlamaStack-Provider-Data`
header JSON payload, those headers are forwarded as `default_headers` on
the AsyncOpenAI client to the upstream provider.

A shared security blocklist (providers/utils/headers.py) filters out
hop-by-hop, framing, auth, and proxy/origin headers to prevent request
smuggling, credential override, and origin spoofing.

This enables use cases like forwarding subscription identifiers
(e.g., X-MAAS-SUBSCRIPTION) or other custom metadata to model providers.

Changes:
- Add `extra_headers: dict[str, str] | None` to VLLMProviderDataValidator
- Add `_get_extra_headers_from_provider_data()` to OpenAIMixin
- Pass `default_headers` to AsyncOpenAI in the client property
- Add shared `providers/utils/headers.py` with BLOCKED_HEADERS and
  filter_extra_headers() utility
- Add comprehensive unit tests for header filtering and mixin integration

Signed-off-by: Avik Kundu <47265560+Lucifergene@users.noreply.github.com>
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 11, 2026
@Lucifergene
Copy link
Author

/cc @franciscojavierarceo @NickGagan


"""Shared header-filtering utilities for provider data forwarding.
The safety passthrough provider (remote/safety/passthrough/config.py) maintains
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lucifergene can you consolidate both of them? I don't think it makes sense to have two separate sets of headers.

@cdoern
Copy link
Collaborator

cdoern commented Mar 11, 2026

hmm yeah, I am reading some comments on #5077, and I think we discussed not doing this and instead just doing the passthrough provider route at the weekly community meeting. I vote that we pump the breaks on this until we get input from @mattf who I think was part of that conversation? @skamenan7 as well to make sure this doesn't conflict with his work

@NickGagan
Copy link
Contributor

NickGagan commented Mar 11, 2026

@cdoern Isn't there already the precedence of using extra_body for these OpenAIMixin models? Isn't this really just another param that we're surfacing from the OpenAI chat completions client up through LLS.

Edit: Although I see that this PR isn't actually using that parameter.

@skamenan7
Copy link
Contributor

Hey @cdoern, happy to add some context here. The discussion in #4607 was specifically about @mattf's isolation concern: external providers loaded as modules into the same process can read PROVIDER_DATA_VAR and potentially steal credentials intended for other providers. @mattf's guidance there was that real isolation requires a process boundary, which is what the passthrough route provides.

PR #5004 (safety passthrough, already merged) and #5040 (inference passthrough, tracked in the linked issue) are both aimed at that — deployer-controlled forward_headers config, default-deny, separate process. That's the isolation story from that thread.

My read is that this PR is the complementary piece. remote::vllm and other OpenAIMixin providers are trusted, in-tree providers — they don't have the external provider isolation problem. The gap is just that callers can't pass per-request headers like X-MAAS-SUBSCRIPTION to them at runtime, which seems orthogonal to the isolation decision.

That said, @mattf can advise/clarify more. Happy to coordinate either way.

@skamenan7
Copy link
Contributor

@Lucifergene I opened this PR for 5040 - #5134 i have a common utility to be used by all passthrough providers. currently safety and inference are using, others coming in follow up PRs. Please see if you can reuse any code by calling the utility but one caveat it is still pending reviews and might change a little before it lands. cc: @leseb

@mattf
Copy link
Collaborator

mattf commented Mar 16, 2026

for the record my position is two fold -

  • you can't load untrusted code into stack's process and expect stack to be anything but compromised, thus the need for process isolation
  • you can't hand any secrets to code you do not trust and expect a good outcome. filtering the set of secrets you pass helps reduce the scope of the security issue, but doesn't resolve it

as for generic headers marked "extra_headers" getting passed through, that's kinda cool. it'd be nice if it were orthogonal to the header filtering, however - it's clear how to filter on "vllm_api_token" but how do you filter on generic "extra_headers" to prevent sending them to all backends?

@Lucifergene will you make the issue you're solving more concrete? various backends ask for projects as well as auth tokens. is X-MAAS-SUBSCRIPTION specific to a backend you need to work with? maybe it's worth having an adapter for it, which would all passing maas_api_token and maas_project headers, which could also be filtered on.

@NickGagan
Copy link
Contributor

@mattf The case @Lucifergene is looking at would apply for API services that might wrap some model deployment.

E.g. I could have some gateway API that adds chargebacks to some vLLM deployment. You would still need vLLM as a provider, but there might be custom headers (or body) that are needed by the wrapper API.

@NickGagan
Copy link
Contributor

NickGagan commented Mar 16, 2026

Why would llamastack need to filter out headers that are passed with this mechanism? I would think it's the responsibility of the person making the request to make sure they're controlling what get's passed in to X-LlamaStack-Provider-Data.extra_headers.

You would want to avoid forwarding headers passed into responses API, but in this case the user is explicitly passing in a header they want given to the model provider.

X-LlamaStack-Provider-Data: {"vllm_api_token": "key", "extra_headers": {"X-MAAS-SUBSCRIPTION": "free-tier"}}'

@mattf
Copy link
Collaborator

mattf commented Mar 16, 2026

@NickGagan -

I could have some gateway API that adds chargebacks to some vLLM deployment.

if you still have the opportunity, reconsider allowing a token to cross chargeback boundaries.

it'll be simpler for users - they can just use the appropriate key for the appropriate domain / project / pod / team / org / whatev. they won't have to figure out how to thread extra info about their project through call chains.

I would think it's the responsibility of the person making the request to make sure they're controlling what get's passed in to X-LlamaStack-Provider-Data.

i agree with you, see #4607. i think the motivation is users may blast all their tokens on each of their requests.

to be consistent w/ the ongoing work to narrow the headers that get passed along, you'll probably need to have "vllm_extra_headers" instead of a generic "extra_headers".

@NickGagan
Copy link
Contributor

@mattf I'm working with @Lucifergene on this, I have an alternative PR for what you're describing: #5217 (comment).

Please let me know if that's what you were thinking when you get the chance!

@mattf
Copy link
Collaborator

mattf commented Mar 20, 2026

@mattf I'm working with @Lucifergene on this, I have an alternative PR for what you're describing: #5217 (comment).

Please let me know if that's what you were thinking when you get the chance!

something like that could work. however, this whole need is very suspect. are there other providers that use headers to change the scope of a token?

@Lucifergene
Copy link
Author

Closing this PR to move forward with a single approach and avoid confusion.

Work continued in PR #5217

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow forwarding arbitrary HTTP headers from provider data to upstream model endpoints

6 participants