-
-
Notifications
You must be signed in to change notification settings - Fork 270
Description
Problem
The openai driver in Prism sends requests to the /responses endpoint. When using OpenAI-compatible inference backends like vLLM behind a LiteLLM proxy, this fails because neither vLLM nor LiteLLM supports the /responses endpoint $
The /responses endpoint is a relatively new, stateful OpenAI feature. vLLM has been discussing support in vllm-project/vllm#14721, but implementation does not appear to be on the near-$
This means there is currently no way to use open-source models on vLLM through Prism's openai driver.
Why /chat/completions is sufficient
The /responses endpoint's main advantage is server-side conversation state management. However, laravel/ai (the main consumer of Prism) already handles this client-side:
- The
agent_conversationsandagent_conversation_messagestables store full conversation history - The
RememberConversationmiddleware andConversationStoremanage persistence - Previous messages are loaded and sent with each request via the agent's
messages()method
Since conversation context is already managed in the database, the stateless /chat/completions endpoint provides equivalent functionality without requiring backend support for /responses.
Current workaround and its issues
The only workaround is using the groq driver, which sends requests to /chat/completions. Despite its name, the groq driver is not Groq-specific — it sends standard OpenAI-compatible chat/completions requests. However, it is not a drop-$
-
Broken streaming with missing usage data — LiteLLM does not return usage data in
StreamEndevents, causing aTypeErrorin Prism'sPrismUsagebridge. This breaks the generator before any post-iteration callbacks can fire — in the c$ -
Potential feature gaps — The
groqdriver may not support the full set of features available in theopenaidriver since it targets a specific provider.
Proposal
Add a configuration option or a dedicated driver that uses the /chat/completions endpoint instead of /responses for OpenAI-compatible backends. For example:
'providers' => [
'litellm' => [
'driver' => 'openai',
'endpoint' => 'chat/completions', // opt into the stateless endpoint
'url' => env('LITELLM_BASE_URL'),
'key' => '...',
],
],Alternatively, a driver like openai-compatible that targets /chat/completions would solve this for the entire ecosystem of OpenAI-compatible backends (vLLM, LiteLLM, Ollama, LocalAI, etc.).
Environment
prism-php/prism: v0.99.19laravel/ai: v0.1.3- Backend: vLLM via LiteLLM proxy
- Workaround:
groqdriver (partially working, requires application-level fixes for streaming and persistence bugs)
This issue was written with assistance from Claude Code — human reviewed, minor inaccuracies may exist.