Skip to content

OpenAI driver forces /responses endpoint — no /chat/completions fallback for vLLM and LiteLLM backends #900

@aimbit-ni

Description

@aimbit-ni

Problem

The openai driver in Prism sends requests to the /responses endpoint. When using OpenAI-compatible inference backends like vLLM behind a LiteLLM proxy, this fails because neither vLLM nor LiteLLM supports the /responses endpoint $

The /responses endpoint is a relatively new, stateful OpenAI feature. vLLM has been discussing support in vllm-project/vllm#14721, but implementation does not appear to be on the near-$

This means there is currently no way to use open-source models on vLLM through Prism's openai driver.

Why /chat/completions is sufficient

The /responses endpoint's main advantage is server-side conversation state management. However, laravel/ai (the main consumer of Prism) already handles this client-side:

  • The agent_conversations and agent_conversation_messages tables store full conversation history
  • The RememberConversation middleware and ConversationStore manage persistence
  • Previous messages are loaded and sent with each request via the agent's messages() method

Since conversation context is already managed in the database, the stateless /chat/completions endpoint provides equivalent functionality without requiring backend support for /responses.

Current workaround and its issues

The only workaround is using the groq driver, which sends requests to /chat/completions. Despite its name, the groq driver is not Groq-specific — it sends standard OpenAI-compatible chat/completions requests. However, it is not a drop-$

  1. Broken streaming with missing usage data — LiteLLM does not return usage data in StreamEnd events, causing a TypeError in Prism's PrismUsage bridge. This breaks the generator before any post-iteration callbacks can fire — in the c$

  2. Potential feature gaps — The groq driver may not support the full set of features available in the openai driver since it targets a specific provider.

Proposal

Add a configuration option or a dedicated driver that uses the /chat/completions endpoint instead of /responses for OpenAI-compatible backends. For example:

'providers' => [
    'litellm' => [
        'driver' => 'openai',
        'endpoint' => 'chat/completions', // opt into the stateless endpoint
        'url' => env('LITELLM_BASE_URL'),
        'key' => '...',
    ],
],

Alternatively, a driver like openai-compatible that targets /chat/completions would solve this for the entire ecosystem of OpenAI-compatible backends (vLLM, LiteLLM, Ollama, LocalAI, etc.).

Environment

  • prism-php/prism: v0.99.19
  • laravel/ai: v0.1.3
  • Backend: vLLM via LiteLLM proxy
  • Workaround: groq driver (partially working, requires application-level fixes for streaming and persistence bugs)

This issue was written with assistance from Claude Code — human reviewed, minor inaccuracies may exist.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions