How to improve Latency

Hi, 
In monologue streaming generation, the model generates 80ms audio in about 95ms time using NVIDIA A30 GPU, with bf16 true. 

1. Is there a way to improve the latency to be able to generate real time? 
2. What is the Realtime factor on L20 GPU?
3. Does the model support [VLLM](https://github.com/vllm-project/vllm) or NVIDIA [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/tree/main) serving?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to improve Latency #44

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to improve Latency #44

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions