OpenAI driver forces /responses endpoint — no /chat/completions fallback for vLLM and LiteLLM backends

## Problem

The `openai` driver in Prism sends requests to the `/responses` endpoint. When using OpenAI-compatible inference backends like **vLLM** behind a **LiteLLM** proxy, this fails because neither vLLM nor LiteLLM supports the `/responses` endpoint $

The `/responses` endpoint is a relatively new, **stateful** OpenAI feature. vLLM has been discussing support in [vllm-project/vllm#14721](https://github.com/vllm-project/vllm/issues/14721), but implementation does not appear to be on the near-$

**This means there is currently no way to use open-source models on vLLM through Prism's `openai` driver.**

## Why `/chat/completions` is sufficient

The `/responses` endpoint's main advantage is server-side conversation state management. However, `laravel/ai` (the main consumer of Prism) already handles this client-side:

- The `agent_conversations` and `agent_conversation_messages` tables store full conversation history
- The `RememberConversation` middleware and `ConversationStore` manage persistence
- Previous messages are loaded and sent with each request via the agent's `messages()` method

Since conversation context is already managed in the database, the stateless `/chat/completions` endpoint provides equivalent functionality without requiring backend support for `/responses`.

## Current workaround and its issues

The only workaround is using the `groq` driver, which sends requests to `/chat/completions`. Despite its name, the `groq` driver is not Groq-specific — it sends standard OpenAI-compatible `chat/completions` requests. However, it is not a drop-$

1. **Broken streaming with missing usage data** — LiteLLM does not return usage data in `StreamEnd` events, causing a `TypeError` in Prism's `PrismUsage` bridge. This breaks the generator before any post-iteration callbacks can fire — in the c$

2. **Potential feature gaps** — The `groq` driver may not support the full set of features available in the `openai` driver since it targets a specific provider.

## Proposal

Add a configuration option or a dedicated driver that uses the `/chat/completions` endpoint instead of `/responses` for OpenAI-compatible backends. For example:

```php
'providers' => [
    'litellm' => [
        'driver' => 'openai',
        'endpoint' => 'chat/completions', // opt into the stateless endpoint
        'url' => env('LITELLM_BASE_URL'),
        'key' => '...',
    ],
],
```

Alternatively, a driver like `openai-compatible` that targets `/chat/completions` would solve this for the entire ecosystem of OpenAI-compatible backends (vLLM, LiteLLM, Ollama, LocalAI, etc.).

## Environment

- `prism-php/prism`: v0.99.19
- `laravel/ai`: v0.1.3
- Backend: vLLM via LiteLLM proxy
- Workaround: `groq` driver (partially working, requires application-level fixes for streaming and persistence bugs)

---

*This issue was written with assistance from [Claude Code](https://claude.ai/claude-code) — human reviewed, minor inaccuracies may exist.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OpenAI driver forces /responses endpoint — no /chat/completions fallback for vLLM and LiteLLM backends #900

Problem

Why `/chat/completions` is sufficient

Current workaround and its issues

Proposal

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

OpenAI driver forces /responses endpoint — no /chat/completions fallback for vLLM and LiteLLM backends #900

Description

Problem

Why /chat/completions is sufficient

Current workaround and its issues

Proposal

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Why `/chat/completions` is sufficient