Hi,
In monologue streaming generation, the model generates 80ms audio in about 95ms time using NVIDIA A30 GPU, with bf16 true.
- Is there a way to improve the latency to be able to generate real time?
- What is the Realtime factor on L20 GPU?
- Does the model support VLLM or NVIDIA TensorRT-LLM serving?
Hi,
In monologue streaming generation, the model generates 80ms audio in about 95ms time using NVIDIA A30 GPU, with bf16 true.