feat: support forwarding extra headers from provider data to upstream endpoints by Lucifergene · Pull Request #5100 · llamastack/llama-stack

Lucifergene · 2026-03-11T08:27:49Z

What does this PR do?

Add an extra_headers field to provider data that allows callers to pass arbitrary HTTP headers through Llama Stack to upstream model provider endpoints.

When a caller includes extra_headers in the X-LlamaStack-Provider-Data JSON payload, those headers are forwarded as default_headers on the AsyncOpenAI client to the upstream provider. A shared security blocklist filters out hop-by-hop, framing, auth, and proxy/origin headers to prevent request smuggling, credential override, and origin spoofing.

This builds on the model_validation work from #5013 / #5014 by @NickGagan, which enabled per-request API key injection via X-LlamaStack-Provider-Data. This PR extends that same mechanism to support forwarding arbitrary custom headers alongside the API key.

Closes #5077

Changes:

Add extra_headers: dict[str, str] | None to VLLMProviderDataValidator (vLLM only; other providers can opt in later)
Add _get_extra_headers_from_provider_data() to OpenAIMixin with getattr guard for backward compatibility
Pass default_headers to AsyncOpenAI() in the client property
Add shared providers/utils/headers.py with BLOCKED_HEADERS frozenset and filter_extra_headers() utility
Add comprehensive unit tests for both the shared utility and the mixin integration

Security:

The BLOCKED_HEADERS list is a superset of the safety passthrough provider's existing blocklist. It adds auth and proxy headers because extra_headers is caller-controlled (per-request), unlike the deployer-controlled forward_headers config:

Hop-by-hop / framing: host, content-type, content-length, transfer-encoding, connection, upgrade, te, trailer, cookie, set-cookie
Auth: authorization
Proxy / origin: x-forwarded-for, x-forwarded-host, x-forwarded-proto, x-forwarded-prefix, x-real-ip, cf-connecting-ip, true-client-ip

Example usage:

curl -X POST http://localhost:8321/v1/chat/completions \
  -H 'X-LlamaStack-Provider-Data: {"vllm_api_token": "key", "extra_headers": {"X-MAAS-SUBSCRIPTION": "free-tier"}}' \
  -d '{"model": "my-model", "messages": [{"role": "user", "content": "Hello"}]}'

The upstream request to the vLLM endpoint will include X-MAAS-SUBSCRIPTION: free-tier while Authorization from the blocked list is filtered out.

… endpoints Add an `extra_headers` field to provider data that allows callers to pass arbitrary HTTP headers through LLama Stack to upstream model endpoints. When a caller includes `extra_headers` in the `X-LlamaStack-Provider-Data` header JSON payload, those headers are forwarded as `default_headers` on the AsyncOpenAI client to the upstream provider. A shared security blocklist (providers/utils/headers.py) filters out hop-by-hop, framing, auth, and proxy/origin headers to prevent request smuggling, credential override, and origin spoofing. This enables use cases like forwarding subscription identifiers (e.g., X-MAAS-SUBSCRIPTION) or other custom metadata to model providers. Changes: - Add `extra_headers: dict[str, str] | None` to VLLMProviderDataValidator - Add `_get_extra_headers_from_provider_data()` to OpenAIMixin - Pass `default_headers` to AsyncOpenAI in the client property - Add shared `providers/utils/headers.py` with BLOCKED_HEADERS and filter_extra_headers() utility - Add comprehensive unit tests for header filtering and mixin integration Signed-off-by: Avik Kundu <47265560+Lucifergene@users.noreply.github.com>

Lucifergene · 2026-03-11T08:32:41Z

/cc @franciscojavierarceo @NickGagan

franciscojavierarceo · 2026-03-11T14:17:19Z

src/llama_stack/providers/utils/headers.py

+
+"""Shared header-filtering utilities for provider data forwarding.
+
+The safety passthrough provider (remote/safety/passthrough/config.py) maintains


@Lucifergene can you consolidate both of them? I don't think it makes sense to have two separate sets of headers.

cdoern · 2026-03-11T14:21:40Z

hmm yeah, I am reading some comments on #5077, and I think we discussed not doing this and instead just doing the passthrough provider route at the weekly community meeting. I vote that we pump the breaks on this until we get input from @mattf who I think was part of that conversation? @skamenan7 as well to make sure this doesn't conflict with his work

NickGagan · 2026-03-11T16:36:23Z

@cdoern Isn't there already the precedence of using extra_body for these OpenAIMixin models? Isn't this really just another param that we're surfacing from the OpenAI chat completions client up through LLS.

Edit: Although I see that this PR isn't actually using that parameter.

skamenan7 · 2026-03-12T12:39:13Z

Hey @cdoern, happy to add some context here. The discussion in #4607 was specifically about @mattf's isolation concern: external providers loaded as modules into the same process can read PROVIDER_DATA_VAR and potentially steal credentials intended for other providers. @mattf's guidance there was that real isolation requires a process boundary, which is what the passthrough route provides.

PR #5004 (safety passthrough, already merged) and #5040 (inference passthrough, tracked in the linked issue) are both aimed at that — deployer-controlled forward_headers config, default-deny, separate process. That's the isolation story from that thread.

My read is that this PR is the complementary piece. remote::vllm and other OpenAIMixin providers are trusted, in-tree providers — they don't have the external provider isolation problem. The gap is just that callers can't pass per-request headers like X-MAAS-SUBSCRIPTION to them at runtime, which seems orthogonal to the isolation decision.

That said, @mattf can advise/clarify more. Happy to coordinate either way.

skamenan7 · 2026-03-13T21:42:03Z

@Lucifergene I opened this PR for 5040 - #5134 i have a common utility to be used by all passthrough providers. currently safety and inference are using, others coming in follow up PRs. Please see if you can reuse any code by calling the utility but one caveat it is still pending reviews and might change a little before it lands. cc: @leseb

mattf · 2026-03-16T02:29:09Z

for the record my position is two fold -

you can't load untrusted code into stack's process and expect stack to be anything but compromised, thus the need for process isolation
you can't hand any secrets to code you do not trust and expect a good outcome. filtering the set of secrets you pass helps reduce the scope of the security issue, but doesn't resolve it

as for generic headers marked "extra_headers" getting passed through, that's kinda cool. it'd be nice if it were orthogonal to the header filtering, however - it's clear how to filter on "vllm_api_token" but how do you filter on generic "extra_headers" to prevent sending them to all backends?

@Lucifergene will you make the issue you're solving more concrete? various backends ask for projects as well as auth tokens. is X-MAAS-SUBSCRIPTION specific to a backend you need to work with? maybe it's worth having an adapter for it, which would all passing maas_api_token and maas_project headers, which could also be filtered on.

NickGagan · 2026-03-16T13:47:03Z

@mattf The case @Lucifergene is looking at would apply for API services that might wrap some model deployment.

E.g. I could have some gateway API that adds chargebacks to some vLLM deployment. You would still need vLLM as a provider, but there might be custom headers (or body) that are needed by the wrapper API.

NickGagan · 2026-03-16T14:38:32Z

Why would llamastack need to filter out headers that are passed with this mechanism? I would think it's the responsibility of the person making the request to make sure they're controlling what get's passed in to X-LlamaStack-Provider-Data.extra_headers.

You would want to avoid forwarding headers passed into responses API, but in this case the user is explicitly passing in a header they want given to the model provider.

X-LlamaStack-Provider-Data: {"vllm_api_token": "key", "extra_headers": {"X-MAAS-SUBSCRIPTION": "free-tier"}}'

mattf · 2026-03-16T15:10:00Z

@NickGagan -

I could have some gateway API that adds chargebacks to some vLLM deployment.

if you still have the opportunity, reconsider allowing a token to cross chargeback boundaries.

it'll be simpler for users - they can just use the appropriate key for the appropriate domain / project / pod / team / org / whatev. they won't have to figure out how to thread extra info about their project through call chains.

I would think it's the responsibility of the person making the request to make sure they're controlling what get's passed in to X-LlamaStack-Provider-Data.

i agree with you, see #4607. i think the motivation is users may blast all their tokens on each of their requests.

to be consistent w/ the ongoing work to narrow the headers that get passed along, you'll probably need to have "vllm_extra_headers" instead of a generic "extra_headers".

NickGagan · 2026-03-19T16:21:13Z

@mattf I'm working with @Lucifergene on this, I have an alternative PR for what you're describing: #5217 (comment).

Please let me know if that's what you were thinking when you get the chance!

mattf · 2026-03-20T14:45:03Z

@mattf I'm working with @Lucifergene on this, I have an alternative PR for what you're describing: #5217 (comment).

Please let me know if that's what you were thinking when you get the chance!

something like that could work. however, this whole need is very suspect. are there other providers that use headers to change the scope of a token?

Lucifergene · 2026-03-23T06:04:41Z

Closing this PR to move forward with a single approach and avoid confusion.

Work continued in PR #5217

Lucifergene requested review from ashwinb, bbrowning, cdoern, ehhuang, franciscojavierarceo, leseb, mattf and raghotham as code owners March 11, 2026 08:27

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 11, 2026

mergify bot added 3 commits March 11, 2026 08:42

Merge branch 'main' into extra-headers

2d40a50

Merge branch 'main' into extra-headers

2df430e

Merge branch 'main' into extra-headers

f1bb9c4

franciscojavierarceo reviewed Mar 11, 2026

View reviewed changes

mergify bot added 3 commits March 11, 2026 17:02

Merge branch 'main' into extra-headers

294d938

Merge branch 'main' into extra-headers

c1d6b59

Merge branch 'main' into extra-headers

1d33c29

NickGagan mentioned this pull request Mar 19, 2026

feat: add inference provider specific extra_headers fields to be used at runtime #5217

Open

Lucifergene closed this Mar 23, 2026


		"""Shared header-filtering utilities for provider data forwarding.

		The safety passthrough provider (remote/safety/passthrough/config.py) maintains

Conversation

Lucifergene commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

Lucifergene commented Mar 11, 2026

Uh oh!

franciscojavierarceo Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

cdoern commented Mar 11, 2026

Uh oh!

NickGagan commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skamenan7 commented Mar 12, 2026

Uh oh!

skamenan7 commented Mar 13, 2026

Uh oh!

mattf commented Mar 16, 2026

Uh oh!

NickGagan commented Mar 16, 2026

Uh oh!

NickGagan commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattf commented Mar 16, 2026

Uh oh!

NickGagan commented Mar 19, 2026

Uh oh!

mattf commented Mar 20, 2026

Uh oh!

Lucifergene commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Lucifergene commented Mar 11, 2026 •

edited

Loading

NickGagan commented Mar 11, 2026 •

edited

Loading

NickGagan commented Mar 16, 2026 •

edited

Loading