TTS/STT: Gemini-TTS Model Integration with unified API by Prajna1999 · Pull Request #574 · ProjectTech4DevAI/kaapi-backend

Prajna1999 · 2026-02-02T15:59:10Z

Summary

Target issue is #528
Intergrate Gemin-pro-2.5-preview-tts models with unified API

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

Currently tested with only Hindi and English as the output language.
Ad hoc main function to test tts scripts will be deleted before final push.

Summary by CodeRabbit

New Features
- Added Text-to-Speech support (Google Gemini) with configurable model, voice, language, and output formats (WAV/MP3/OGG).
- Audio conversion options (MP3, OGG) with format selection and MIME-aware outputs.
Improvements
- Better audio input validation and accurate audio size calculations.
- Consolidated input resolution and temporary-audio handling for text and audio inputs.
Chores
- Added pydub dependency for audio processing.
Tests
- Extensive STT and TTS test coverage added.

…-stt-new' into feature/unified-api-stt-new merge upstream

…tion config

…test for now

…ssistant_id handling

coderabbitai

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

backend/app/services/llm/jobs.py (2)
345-382: ⚠️ Potential issue | 🟠 Major

Output guardrails assume TextOutput — will break for TTS/audio responses.

Line 347 accesses response.response.output.content.value. For TTS responses, output is AudioOutput where content.value is base64-encoded audio data. Passing that to call_guardrails as text is meaningless, and writing safe_text back (Lines 360-362, 374) corrupts the audio payload.

Apply the same type-guarding pattern as for input guardrails:
Proposed fix
         if response:
             if output_guardrails:
-                output_text = response.response.output.content.value
-                safe_output = call_guardrails(output_text, output_guardrails, job_id)
+                from app.models.llm.response import TextOutput
+                if not isinstance(response.response.output, TextOutput):
+                    logger.warning("[execute_job] Output guardrails skipped for non-text output type")
+                else:
+                    output_text = response.response.output.content.value
+                    safe_output = call_guardrails(output_text, output_guardrails, job_id)
+                    # ... rest of guardrail handling indented under else
364-368: ⚠️ Potential issue | 🟠 Major

error=request.query.input passes a Pydantic model object as the error string.

Line 366 sets error=request.query.input, which is a TextInput/AudioInput model — not a string. This will serialize oddly (or raise) when APIResponse.failure_response processes it, and it leaks the raw user input (potentially sensitive) into the error payload.

This should likely be a descriptive error message instead:
Proposed fix
                     if safe_output["data"]["rephrase_needed"] == True:
                         callback_response = APIResponse.failure_response(
-                            error=request.query.input,
+                            error="Output guardrail flagged content for rephrasing",
                             metadata=request.request_metadata,
                         )

🤖 Fix all issues with AI agents

In `@backend/app/alembic/versions/046_add_llm_call_table.py`:
- Around line 66-71: The provider column on the llm_call table is created with
the initial comment but then immediately altered; instead, set the final comment
directly in the create_table call by updating the provider column's comment to
"AI provider as sent by user (e.g. openai, openai-native, google)" (fixing the
"-native" typo), then remove the redundant sa.alter_column(...) block in
upgrade() that targets llm_call.provider as well as its reversal in downgrade();
keep everything else in create_table, upgrade(), and downgrade() unchanged.

In `@backend/app/api/routes/config/version.py`:
- Around line 25-30: The endpoint functions (e.g., create_version) are missing
return type annotations; update create_version and the other handlers in this
file to include explicit return type hints that match the endpoint's actual
returned type (for example the appropriate Pydantic model or FastAPI
Response/JSONResponse type used elsewhere in the codebase), e.g., replace the
bare def create_version(...) with a signature that specifies the correct return
type and do the same for the other handler functions in this module (use the
same model/class names used by other routes to stay consistent).
- Around line 25-41: Add explicit return type hints to all endpoint functions
create_version, list_versions, get_version, and delete_version (e.g., ->
Response or the FastAPI/Starlette response model used in this project) and
update the function signatures accordingly; in create_version, keep the call to
ConfigVersionCrud.create_from_partial_or_raise(version_create=version_create)
but add a comment/docstring line and API documentation entry stating this
endpoint only supports partial version creation (will return HTTP 400 with
message "Cannot create partial version: no existing version found. Use full
config for initial version.") and that initial versions must be created via the
config creation endpoint so callers understand the intentional breaking change.

In `@backend/app/crud/config/version.py`:
- Around line 291-311: The duplicate "get latest version" SQL in
_validate_config_type_unchanged should be replaced by calling the existing
helper _get_latest_version() to keep code DRY: inside
_validate_config_type_unchanged call latest_version = self._get_latest_version()
(or equivalent) and then proceed with the existing type-check logic using
latest_version; ensure you preserve the current behavior when latest_version is
None and raise the same HTTPException if types don't match (and keep references
to ConfigVersion, version_create, and self.session logic intact).

In `@backend/app/crud/llm.py`:
- Around line 39-47: The current AudioInput branch sets size_bytes using
len(query_input.content.value), which measures the base64 string length; instead
decode the base64 to get the actual byte count (e.g. size_bytes =
len(base64.b64decode(query_input.content.value))) and update the JSON generation
in the AudioInput branch (the block handling isinstance(query_input,
AudioInput)) accordingly; add the necessary import for the base64 module and
consider wrapping decoding in a try/except if input might be malformed to avoid
exceptions.
- Around line 76-99: The code uses getattr/hasattr on completion_config.params
which breaks when params is a dict (e.g., NativeCompletionConfig); update the
logic in the block that computes completion_type and model to handle both dict
and object params: check if completion_config.params is a dict and use
params.get("type", "text") and params.get("model", "") in that case, otherwise
keep using getattr(completion_config.params, "type", "text") and
getattr/completion_config.params.model via hasattr for model; adjust the
assignment of completion_type and model accordingly (refer to completion_config,
completion_type, model, and resolved_config in the diff).

In `@backend/app/models/llm/request.py`:
- Around line 488-492: The updated_at Field currently uses default_factory=now
which only sets the timestamp at object creation; modify the Field definition
for updated_at to use sa_column with an onupdate clause (e.g.,
sa_column=Column(..., onupdate=now)) so SQLAlchemy will auto-update the
timestamp on record modifications, or alternately ensure every update path in
the CRUD layer explicitly sets updated_at = now() before persisting; locate the
updated_at Field in request.py and implement the sa_column onupdate approach (or
add the explicit update in functions that perform updates) to guarantee
updated_at changes on every update.
- Around line 3-11: The file imports Field and SQLModel twice; consolidate them
by removing the duplicate import and using a single import from sqlmodel that
includes Field, SQLModel, Index, and text (so references to Field and SQLModel
in functions/classes like any model definitions remain valid); ensure other
imports from sqlmodel (Index, text) are preserved in that single import and
delete the redundant earlier import to resolve the F811 duplicate-import error.

In `@backend/app/models/llm/response.py`:
- Around line 31-32: Remove None from the discriminated union so LLMOutput is
Annotated[TextOutput | AudioOutput, Field(discriminator="type")] (keep
TextOutput and AudioOutput as variants), delete the unused imports base64 and
model_validator, and change the model field that currently uses LLMOutput to
accept Optional[LLMOutput] (i.e., make the field Optional/nullable outside the
discriminated union) so validation works with Pydantic v2.

In `@backend/app/services/llm/mappers.py`:
- Around line 98-102: The google mapper is assigning google_params["model"] =
kaapi_params.get("model") unconditionally which can set model to None; change it
to only add "model" to google_params when kaapi_params contains a non-null model
(mirror the OpenAI mapper's conditional behavior used around model in that file)
so that google_params only includes the model key when kaapi_params.get("model")
is truthy/non-null.

In `@backend/app/services/llm/providers/gai.py`:
- Around line 94-95: The uploaded gemini_file created by
self.client.files.upload(file=resolved_input) is never deleted; modify the flow
around generate_content (the method that consumes gemini_file) to ensure cleanup
by calling self.client.files.delete(name=gemini_file.name) in a finally block so
the uploaded object is removed regardless of success or error; locate the upload
call (gemini_file = self.client.files.upload(...)) and wrap the subsequent
generate_content invocation in try/finally to always delete gemini_file via
self.client.files.delete.
- Around line 239-271: The code in GoogleAIProvider._execute_tts lets
unsupported response_format values fall through: it may log a successful
conversion and set response_mime_type to the requested format while
actual_format and raw_audio_bytes remain WAV; fix by validating/handling
unsupported formats before conversion (or after the conversion switch).
Specifically, check response_format against allowed values ("wav","mp3","ogg")
and if it's unsupported return an explicit error or default to "wav" (updating
response_mime_type and actual_format accordingly), and move or guard the "Audio
conversion successful" log so it only runs when a conversion actually occurred
(use actual_format or the convert_error flags and raw_audio_bytes to decide).

In `@backend/app/services/llm/providers/PLAN.md`:
- Line 88: Update the phrase "JAVA like" to the hyphenated form "JAVA-like" in
the sentence "Do not write over abstracted code. Focus on readability than
pristine JAVA-like code." so the documentation uses the consistent hyphenated
form; locate and edit that exact sentence in PLAN.md.
- Line 84: Fix the spelling typo in PLAN.md by replacing "thorugh" with
"through" in the sentence that directs the reader to review the implementation;
while editing, ensure the sentence still references the correct implementation
to review (the _execute_stt function in providers.gai.py and the relevant models
in the app/models/llm module) so the documentation remains accurate and clear.

In `@backend/app/services/llm/providers/registry.py`:
- Around line 70-72: The logger.error call in registry.py needs a function-name
prefix in square brackets; capture the runtime function name (e.g., via import
inspect and inspect.currentframe().f_code.co_name) and change the message to
include that prefix before the existing text (e.g.,
logger.error(f"[{inspect.currentframe().f_code.co_name}] Failed to initialize
{provider_type} client: {e}", exc_info=True)); keep the same exception re-raise
(RuntimeError) unchanged.

🧹 Nitpick comments (11)

backend/app/alembic/versions/046_add_llm_call_table.py (1)
22-138: Consider adding an index on project_id for project-scoped queries.

get_llm_call_by_id in backend/app/crud/llm.py filters by project_id, and project-scoped lookups are a common access pattern. A partial index similar to the existing ones would help:
op.create_index(
    "idx_llm_call_project_id",
    "llm_call",
    ["project_id"],
    unique=False,
    postgresql_where=sa.text("deleted_at IS NULL"),
)
backend/app/models/llm/response.py (1)
7-8: Unused imports: base64 and model_validator.

Neither base64 nor model_validator is referenced anywhere in this file. Remove them to keep the module clean.
Suggested fix
-import base64
-from pydantic import model_validator
 from sqlmodel import SQLModel, Field
backend/app/crud/llm.py (1)
179-191: Mutating the caller's content dict in-place.

Line 187 adds audio_size_bytes to the passed-in content dict. If the caller retains a reference, they'll see this side effect. Consider working on a copy:
Suggested fix
     if content is not None:
+        content = content.copy()
         # For audio outputs (AudioOutput model): calculate size metadata from base64 content
backend/app/crud/config/version.py (1)
110-120: Leaking internal validation details to API consumers.

str(e) from a Pydantic ValidationError can be very verbose and expose internal schema names/field paths. Consider sanitizing or summarizing the error before returning it in the HTTP response.
Suggested fix
         except ValidationError as e:
             logger.error(
                 f"[ConfigVersionCrud.create_from_partial] Validation failed | "
                 f"{{'config_id': '{self.config_id}', 'error': '{str(e)}'}}"
             )
             raise HTTPException(
                 status_code=400,
-                detail=f"Invalid config after merge: {str(e)}",
+                detail="Invalid config after merge. Check that all required fields are present and valid.",
             )
backend/app/services/llm/jobs.py (2)
130-136: Missing type hint for task_instance parameter.

As per coding guidelines, all function parameters should have type hints.
 def execute_job(
     request_data: dict,
     project_id: int,
     organization_id: int,
     job_id: str,
     task_id: str,
-    task_instance,
+    task_instance: Any,
 ) -> dict:
253-255: Remove empty else: pass block.

This serves no purpose and adds noise.
                     request.request_metadata.setdefault("warnings", []).extend(warnings)
-                else:
-                    pass
backend/app/services/llm/providers/gai.py (4)
4-8: ThinkingConfig is imported but unused.

The import on line 7 is dead code (the usage at line 107 is commented out). Remove it to satisfy clean imports.
 from google.genai.types import (
     GenerateContentResponse,
     GenerateContentConfig,
-    ThinkingConfig,
 )
126-128: f-strings without placeholders — use plain strings.

Static analysis (Ruff F541) correctly flags lines 127 and 281. These are f-strings with no interpolation.
-            logger.warning(
-                f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
-            )
+            logger.warning(
+                "[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
+            )
-            logger.warning(
-                f"[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros"
-            )
+            logger.warning(
+                "[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros"
+            )
Also applies to: 280-282

34-35: Redundant self.client assignment.

super().__init__(client) in BaseProvider.__init__ already sets self.client = client (see base.py). Line 35 is unnecessary.
         super().__init__(client)
-        self.client = client
206-207: Avoid inline import inside the method body.

from google.genai import types is imported inline at line 207 but google.genai.types members are already imported at the top of the file (lines 4-8). Either import types at the top level or use the already-imported GenerateContentConfig directly (it's types.GenerateContentConfig). This also avoids repeated import overhead on every TTS call.
+from google.genai import types  # move to top-level imports
backend/app/models/llm/request.py (1)
180-182: Provider field description is outdated.

The description reads "LLM provider (openai)" but the Literal now accepts both "openai" and "google".
     provider: Literal["openai", "google"] = Field(
-        ..., description="LLM provider (openai)"
+        ..., description="LLM provider (e.g., openai, google)"
     )

coderabbitai · 2026-02-12T04:21:58Z

backend/app/alembic/versions/046_add_llm_call_table.py

+        sa.Column(
+            "provider",
+            sa.String(),
+            nullable=False,
+            comment="AI provider: openai, google, anthropic",
+        ),


🛠️ Refactor suggestion | 🟠 Major

Redundant alter_column on llm_call.provider — set the final comment directly in create_table.

The migration creates llm_call.provider with comment "AI provider: openai, google, anthropic" at Line 70, then immediately alters it at Lines 161-168 to "AI provider as sent by user (e.g openai, -native, google)". Just use the final comment in the create_table call. Also note the typo: "-native" should likely be "openai-native".

Suggested fix

In the create_table call, replace the provider column comment:

sa.Column( "provider", sa.String(), nullable=False, - comment="AI provider: openai, google, anthropic", + comment="AI provider as sent by user (e.g. openai, openai-native, google)", ),

Then remove the redundant alter_column block (Lines 161-168) from upgrade() and the corresponding reversal (Lines 182-188) from downgrade().

Also applies to: 161-168

🤖 Prompt for AI Agents

In `@backend/app/alembic/versions/046_add_llm_call_table.py` around lines 66 - 71, The provider column on the llm_call table is created with the initial comment but then immediately altered; instead, set the final comment directly in the create_table call by updating the provider column's comment to "AI provider as sent by user (e.g. openai, openai-native, google)" (fixing the "-native" typo), then remove the redundant sa.alter_column(...) block in upgrade() that targets llm_call.provider as well as its reversal in downgrade(); keep everything else in create_table, upgrade(), and downgrade() unchanged.

backend/app/api/routes/config/version.py

coderabbitai · 2026-02-12T04:21:58Z

backend/app/crud/config/version.py

+    def _validate_config_type_unchanged(
+        self, version_create: ConfigVersionCreate
+    ) -> None:
+        """
+        Validate that the config type (text/stt/tts) in the new version matches
+        the type from the latest existing version.
+        Raises HTTPException if types don't match.
+        """
+        # Get the latest version
+        stmt = (
+            select(ConfigVersion)
+            .where(
+                and_(
+                    ConfigVersion.config_id == self.config_id,
+                    ConfigVersion.deleted_at.is_(None),
+                )
+            )
+            .order_by(ConfigVersion.version.desc())
+            .limit(1)
+        )
+        latest_version = self.session.exec(stmt).first()


🛠️ Refactor suggestion | 🟠 Major

Duplicated "get latest version" query — reuse _get_latest_version().

_validate_config_type_unchanged (Lines 300-311) duplicates the exact same query as _get_latest_version() (Lines 157-168). Reuse the helper to keep things DRY.

Suggested fix

def _validate_config_type_unchanged( self, version_create: ConfigVersionCreate ) -> None: - # Get the latest version - stmt = ( - select(ConfigVersion) - .where( - and_( - ConfigVersion.config_id == self.config_id, - ConfigVersion.deleted_at.is_(None), - ) - ) - .order_by(ConfigVersion.version.desc()) - .limit(1) - ) - latest_version = self.session.exec(stmt).first() + latest_version = self._get_latest_version()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _validate_config_type_unchanged(

self, version_create: ConfigVersionCreate

) -> None:

"""

Validate that the config type (text/stt/tts) in the new version matches

the type from the latest existing version.

Raises HTTPException if types don't match.

"""

# Get the latest version

stmt = (

select(ConfigVersion)

.where(

and_(

ConfigVersion.config_id == self.config_id,

ConfigVersion.deleted_at.is_(None),

)

)

.order_by(ConfigVersion.version.desc())

.limit(1)

)

latest_version = self.session.exec(stmt).first()

def _validate_config_type_unchanged(

self, version_create: ConfigVersionCreate

) -> None:

"""

Validate that the config type (text/stt/tts) in the new version matches

the type from the latest existing version.

Raises HTTPException if types don't match.

"""

latest_version = self._get_latest_version()

🤖 Prompt for AI Agents

In `@backend/app/crud/config/version.py` around lines 291 - 311, The duplicate "get latest version" SQL in _validate_config_type_unchanged should be replaced by calling the existing helper _get_latest_version() to keep code DRY: inside _validate_config_type_unchanged call latest_version = self._get_latest_version() (or equivalent) and then proceed with the existing type-check logic using latest_version; ensure you preserve the current behavior when latest_version is None and raise the same HTTPException if types don't match (and keep references to ConfigVersion, version_create, and self.session logic intact).

backend/app/crud/llm.py

backend/app/services/llm/mappers.py

backend/app/services/llm/providers/gai.py

coderabbitai · 2026-02-12T04:21:59Z

backend/app/services/llm/providers/PLAN.md

+1. **Instruction Priority:** If you set `speaking_rate=2.0` and also put "Speak very slowly" in the `system_instruction`, the model may produce erratic results. Use **natural language** for "tone" and **programmatic fields** for "fixed pacing."
+2. **Streaming:** The `google-genai` SDK supports streaming audio bytes for real-time applications via the `models.generate_content_stream` method, but `audio_encoding` must be `LINEAR16` or `PCM` for minimum latency.
+3. **SynthID:** Note that all output audio is watermarked with Google's SynthID for safety tracking.
+4. You are required to go thorugh the implementation of _execute_stt function inside the app/services/llm/providers.gai.py and relevant models inside app/models/llm as well.


⚠️ Potential issue | 🟡 Minor

Fix spelling typo (“thorugh” → “through”).
Minor documentation error.

🧰 Tools

🪛 LanguageTool

[grammar] ~84-~84: Ensure spelling is correct
Context: ...ety tracking. 4. You are required to go thorugh the implementation of _execute_stt func...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents

In `@backend/app/services/llm/providers/PLAN.md` at line 84, Fix the spelling typo in PLAN.md by replacing "thorugh" with "through" in the sentence that directs the reader to review the implementation; while editing, ensure the sentence still references the correct implementation to review (the _execute_stt function in providers.gai.py and the relevant models in the app/models/llm module) so the documentation remains accurate and clear.

coderabbitai · 2026-02-12T04:21:59Z

backend/app/services/llm/providers/PLAN.md

+5. Make sure to follow the celery task queue structure and pydantic models idioms and database schema.
+6. The generated audio is not saved in the database, so store some metadata for now.
+7. Make sure to follow the TTS config types, (KaapiLLMConfig and Native LLM Configs and relevant models for config and versions).
+8. Do not write over abstracted code. Focus on readability than pristine JAVA-like code.


⚠️ Potential issue | 🟡 Minor

Hyphenate “JAVA-like”.
Use “JAVA-like” for readability and consistency.

🧰 Tools

🪛 LanguageTool

[grammar] ~88-~88: Use a hyphen to join words.
Context: ...nfig and versions). 8. Do not write over abstracted code. Focus on readability th...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents

In `@backend/app/services/llm/providers/PLAN.md` at line 88, Update the phrase "JAVA like" to the hyphenated form "JAVA-like" in the sentence "Do not write over abstracted code. Focus on readability than pristine JAVA-like code." so the documentation uses the consistent hyphenated form; locate and edit that exact sentence in PLAN.md.

coderabbitai · 2026-02-12T04:21:59Z

backend/app/services/llm/providers/registry.py

+    except Exception as e:
+        logger.error(f"Failed to initialize {provider_type} client: {e}", exc_info=True)
+        raise RuntimeError(f"Could not connect to {provider_type} services.")


🛠️ Refactor suggestion | 🟠 Major

Log message missing [function_name] prefix.

Per coding guidelines, log messages should be prefixed with the function name in square brackets.

Suggested fix

- logger.error(f"Failed to initialize {provider_type} client: {e}", exc_info=True) + logger.error(f"[get_llm_provider] Failed to initialize {provider_type} client: {e}", exc_info=True)

As per coding guidelines: Prefix all log messages with the function name in square brackets.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

except Exception as e:

logger.error(f"Failed to initialize {provider_type} client: {e}", exc_info=True)

raise RuntimeError(f"Could not connect to {provider_type} services.")

except Exception as e:

logger.error(f"[get_llm_provider] Failed to initialize {provider_type} client: {e}", exc_info=True)

raise RuntimeError(f"Could not connect to {provider_type} services.")

🤖 Prompt for AI Agents

In `@backend/app/services/llm/providers/registry.py` around lines 70 - 72, The logger.error call in registry.py needs a function-name prefix in square brackets; capture the runtime function name (e.g., via import inspect and inspect.currentframe().f_code.co_name) and change the message to include that prefix before the existing text (e.g., logger.error(f"[{inspect.currentframe().f_code.co_name}] Failed to initialize {provider_type} client: {e}", exc_info=True)); keep the same exception re-raise (RuntimeError) unchanged.

backend/app/crud/llm.py

backend/app/models/llm/request.py

nishika26 · 2026-02-12T04:15:33Z

backend/app/models/llm/request.py

+    model: str
+    voice: str
+    language: str
+    response_format: Literal["mp3", "wav", "ogg"] | None = "wav"


why was description added above but not here

the keys looks self explanatory. Hence did not add extra sentences

nishika26 · 2026-02-12T04:16:37Z

backend/app/models/llm/request.py

+        },
+    )
+
+    organization_id: int = Field(


do we need the org id to be in the table if project id is alreaady there and we can find out the org id from the project id

yes we discussed regarding that. Decided not to remove that for the time being because it would be easier to easier to fetch project and org from the table without multiple joins.

nishika26 · 2026-02-12T04:19:20Z

backend/app/services/llm/providers/gai.py

+
+        director_notes = gemini_params.get("director_notes")
+        # Build Gemini TTS config
+        from google.genai import types


please place this import above and not here

nishika26 · 2026-02-12T04:20:39Z

backend/app/services/llm/providers/oai.py

 import logging

 import openai
 from openai import OpenAI


i feel like naming this file "oai" is confusing, openai is better

Yes that was the case but its creating circular dependency issues i.e importing OpenAI SDK from OpenAI library and re-exporting at top level from openai.py. so shortened the nomenclature. We can however think about better naming in a sperate PR>

backend/app/services/llm/providers/PLAN.md

nishika26 · 2026-02-12T04:24:05Z

backend/app/services/llm/input_resolver.py

@@ -0,0 +1,87 @@
+import base64


this file should be in a specific audio folder inside services/llm

backend/app/services/llm/jobs.py

nishika26 · 2026-02-12T04:27:36Z

backend/app/services/llm/jobs.py

+            job = session.get(Job, job_id)
+            if not job:
+                # Log all jobs to see what's in the database
+                from sqlmodel import select


please move this import up

@Prajna1999 this comment was not addressed

…-tts' into feature/unified-api-tts

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (2)

backend/app/services/llm/providers/gai.py (1)

283-286: f-string without placeholders on line 285.

Static analysis (Ruff F541) flags this. Remove the f prefix.

Proposed fix

-            logger.warning(
-                f"[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros"
-            )
+            logger.warning(
+                "[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros"
+            )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 283 - 286, The
logger.warning call inside GoogleAIProvider._execute_tts uses an unnecessary
f-string (no placeholders); remove the leading "f" so the message is a plain
string literal (update the logger.warning invocation that currently contains
"[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros").
Ensure no other f-strings without placeholders remain in the _execute_tts
method.

backend/app/services/llm/mappers.py (1)

102-106: Use the already-validated model variable instead of re-fetching from dict.

Line 102 already extracts and validates model = kaapi_params.get("model"), but line 106 calls kaapi_params.get("model") again. Use the local variable for consistency and to avoid redundant dict lookups.

Proposed fix

     model = kaapi_params.get("model")
     if not model:
         return {}, ["Missing required 'model' parameter"]
 
-    google_params["model"] = kaapi_params.get("model")
+    google_params["model"] = model

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/mappers.py` around lines 102 - 106, The code
validates model with model = kaapi_params.get("model") but then redundantly
re-reads kaapi_params.get("model") when setting google_params["model"]; update
that assignment to use the already-validated local variable model instead of
re-fetching from kaapi_params to avoid redundant lookups and ensure consistency
(refer to the variables model, kaapi_params, and google_params in the mapper
function).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/models/llm/request.py`:
- Around line 69-76: The AudioContent model's mime_type is annotated as
non-nullable str but defaults to None; update the type annotation for the
mime_type field on the AudioContent class to allow None (e.g., make mime_type of
type str | None) so it matches the default and the downstream usage (see places
referencing query_input.content.mime_type or "audio/wav") and adjust the
Field(...) declaration to keep the same default None and description.

In `@backend/app/services/llm/providers/gai.py`:
- Line 222: The code uses an undefined symbol types.GenerateContentConfig
causing a NameError; replace the incorrect reference with the already-imported
GenerateContentConfig by changing the line that sets config (currently using
types.GenerateContentConfig) to use GenerateContentConfig(**config_kwargs),
ensuring the GenerateContentConfig symbol (imported at top) is used instead of
the nonexistent types module.
- Around line 239-275: The "Audio conversion successful" log is emitted even
when no conversion occurs; inside GoogleAIProvider._execute_tts, move or
duplicate the success log so it only runs after a real conversion, i.e., place
the logger.info call into the response_format == "mp3" branch after
convert_pcm_to_mp3 succeeds and into the response_format == "ogg" branch after
convert_pcm_to_ogg succeeds (use convert_pcm_to_mp3, convert_pcm_to_ogg,
actual_format and converted_bytes to build the message), and remove or guard the
existing unconditional success log so unsupported-format fallback (where
response_format is reset to "wav") does not log success.

In `@backend/app/utils.py`:
- Around line 426-444: The temp-file leak occurs in resolve_audio_base64 when
base64 decoding succeeds but writing to NamedTemporaryFile fails; fix by
tracking temp_path (initialize to None before the try), and in the outer except
for the write error check if temp_path is set and the file exists then remove it
(os.remove) before returning the error; ensure resolve_audio_base64 imports os
and sets temp_path only after tmp.name is assigned so cleanup is safe, or
alternatively use a try/finally around the NamedTemporaryFile write to always
unlink on error while still returning the error string.
- Around line 447-474: The function resolve_input is missing a type annotation
for its query_input parameter; update its signature to type-hint query_input as
the request module's QueryInput (or appropriate union of TextInput|AudioInput)
using a forward/string annotation or by importing QueryInput under a
TYPE_CHECKING guard to avoid circular imports, and keep the existing return type
tuple[str, str | None]; ensure references in the body (TextInput, AudioInput,
resolve_audio_base64) remain unchanged.

---

Duplicate comments:
In `@backend/app/crud/llm.py`:
- Around line 68-91: The code incorrectly uses getattr/hasattr on
completion_config.params which may be a dict; update logic in the
completion_type and model resolution to handle both dict and object shapes: for
completion_type, check if completion_config.params is a dict and use
completion_config.params.get("type", "text"), otherwise use
getattr(completion_config.params, "type", "text"); for model, similarly if
params is a dict use params.get("model", "") else use
getattr(completion_config.params, "model", completion_config.params.get("model",
"") if hasattr(completion_config.params, "get") else ""); adjust references to
completion_config, completion_type, model, and params accordingly so both
dict-based (NativeCompletionConfig) and object-based params work.

In `@backend/app/models/llm/request.py`:
- Around line 4-8: There are duplicate imports of Field and SQLModel (two
sqlmodel import lines); remove the redundant import and consolidate all symbols
imported from sqlmodel into a single import statement so only one import
supplies Field, SQLModel, Index, and text; keep the other imports
(Discriminator, model_validator, HttpUrl, datetime, JSONB) as-is and ensure no
name collisions remain in request.py.

In `@backend/app/services/llm/providers/gai.py`:
- Around line 96-97: The uploaded Gemini file (assigned to gemini_file via
self.client.files.upload(file=resolved_input)) is never deleted, causing storage
accumulation; wrap the upload + generate_content flow in try/finally and in the
finally block call the provider client file-delete API (e.g.,
self.client.files.delete(...) using the gemini_file identifier or name) to
remove the uploaded artifact regardless of success/failure, ensuring any
exceptions from delete are caught/logged so cleanup does not mask the original
error.

---

Nitpick comments:
In `@backend/app/services/llm/mappers.py`:
- Around line 102-106: The code validates model with model =
kaapi_params.get("model") but then redundantly re-reads
kaapi_params.get("model") when setting google_params["model"]; update that
assignment to use the already-validated local variable model instead of
re-fetching from kaapi_params to avoid redundant lookups and ensure consistency
(refer to the variables model, kaapi_params, and google_params in the mapper
function).

In `@backend/app/services/llm/providers/gai.py`:
- Around line 283-286: The logger.warning call inside
GoogleAIProvider._execute_tts uses an unnecessary f-string (no placeholders);
remove the leading "f" so the message is a plain string literal (update the
logger.warning invocation that currently contains
"[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros").
Ensure no other f-strings without placeholders remain in the _execute_tts
method.

coderabbitai · 2026-02-17T07:04:18Z

backend/app/models/llm/request.py

 class AudioContent(SQLModel):
    format: Literal["base64"] = "base64"
-    value: str = Field(..., min_length=1, description="Base64 encoded audio")
+    value: str = Field(..., description="Base64 encoded audio")
    # keeping the mime_type liberal here, since does not affect transcription type
-    mime_type: str | None = Field(
+    mime_type: str = Field(
        None,
        description="MIME type of the audio (e.g., audio/wav, audio/mp3, audio/ogg)",
    )


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the actual code in request.py at the specified lines cat -n backend/app/models/llm/request.py | sed -n '69,76p'

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 482

🏁 Script executed:

# Check how mime_type is used downstream, especially in utils.py rg -n "mime_type" --type=py -C3

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 16516

🏁 Script executed:

# Specifically check the utils.py line 466 mentioned in the comment if [ -f backend/app/utils.py ] || [ -f backend/utils.py ] || [ -f app/utils.py ]; then find . -name "utils.py" -path "*/backend/*" | head -1 | xargs sed -n '460,472p' - fi

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57

🏁 Script executed:

# Let's also check if there are any type checking or validation issues # Check how AudioContent is instantiated or used rg -n "AudioContent" --type=py -C2

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 3900

🏁 Script executed:

# Check SQLModel/Pydantic version and configuration find . -name "pyproject.toml" -o -name "requirements.txt" | head -5 | xargs grep -l "sqlmodel\|pydantic" 2>/dev/null

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 99

🏁 Script executed:

# Check if there's any model validation configuration or model_config in the request.py file cat -n backend/app/models/llm/request.py | head -100

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 3402

🏁 Script executed:

# Check if there are any validation errors or issues reported in tests related to AudioContent rg -n "AudioContent" --type=py backend/app/tests/ -C2 | grep -A5 -B5 "error\|exception\|validate"

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1053

🏁 Script executed:

# Check the pyproject.toml to see Pydantic/SQLModel versions cat backend/pyproject.toml | grep -A5 -B5 "pydantic\|sqlmodel"

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 699

🏁 Script executed:

# Test if AudioContent can actually be instantiated with None for mime_type # and check if there are any validation issues in the codebase rg -n "AudioContent\(" --type=py | grep -v "mime_type=" | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 229

🏁 Script executed:

# Check for any Pydantic ConfigDict or model_config settings that might affect validation rg -n "model_config\|ConfigDict" --type=py backend/app/models/llm/

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57

🏁 Script executed:

# Let's check if there are any instances where AudioContent is created without mime_type explicitly rg -n "AudioContent\(" --type=py -C1

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1109

🏁 Script executed:

# Check if there's any instance where mime_type might be set to None after instantiation rg -n "mime_type\s*=" --type=py -C2 | grep -v test | head -40

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2447

Fix type annotation for mime_type field to match actual usage pattern.

mime_type: str = Field(None, ...) declares a non-nullable str type but defaults to None. Pydantic v2 will accept None at instantiation, but the type annotation is misleading. Downstream code at utils.py line 466 explicitly handles the None case with query_input.content.mime_type or "audio/wav", confirming None is an expected value.

Change to: mime_type: str | None = Field(None, ...)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/models/llm/request.py` around lines 69 - 76, The AudioContent model's mime_type is annotated as non-nullable str but defaults to None; update the type annotation for the mime_type field on the AudioContent class to allow None (e.g., make mime_type of type str | None) so it matches the default and the downstream usage (see places referencing query_input.content.mime_type or "audio/wav") and adjust the Field(...) declaration to keep the same default None and description.

coderabbitai · 2026-02-17T07:04:18Z

backend/app/services/llm/providers/gai.py

+        if director_notes:
+            config_kwargs["system_instruction"] = director_notes
+
+        config = types.GenerateContentConfig(**config_kwargs)


⚠️ Potential issue | 🔴 Critical

types is undefined — this will crash at runtime with NameError.

Line 222 references types.GenerateContentConfig, but types is never imported. GenerateContentConfig is already imported directly on line 7. This is a guaranteed runtime crash for every TTS request.

Proposed fix

- config = types.GenerateContentConfig(**config_kwargs) + config = GenerateContentConfig(**config_kwargs)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

config = types.GenerateContentConfig(**config_kwargs)

config = GenerateContentConfig(**config_kwargs)

🧰 Tools

🪛 Ruff (0.15.0)

[error] 222-222: Undefined name types

(F821)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/gai.py` at line 222, The code uses an undefined symbol types.GenerateContentConfig causing a NameError; replace the incorrect reference with the already-imported GenerateContentConfig by changing the line that sets config (currently using types.GenerateContentConfig) to use GenerateContentConfig(**config_kwargs), ensuring the GenerateContentConfig symbol (imported at top) is used instead of the nonexistent types module.

coderabbitai · 2026-02-17T07:04:18Z

backend/app/services/llm/providers/gai.py

+        # Post-process audio format conversion if needed
+        # Gemini TTS natively outputs 24kHz 16-bit PCM (WAV format)
+        actual_format = "wav"  # Native Gemini TTS output format
+        encoded_content = base64.b64encode(raw_audio_bytes or b"").decode("ascii")
+
+        if response_format and response_format != "wav":
+            # Need to convert from WAV to requested format
+            logger.info(
+                f"[GoogleAIProvider._execute_tts] Converting audio from WAV to {response_format}"
+            )
+
+            if response_format == "mp3":
+                converted_bytes, convert_error = convert_pcm_to_mp3(raw_audio_bytes)
+                if convert_error:
+                    return None, f"Failed to convert audio to MP3: {convert_error}"
+                encoded_content = base64.b64encode(converted_bytes or b"").decode(
+                    "ascii"
+                )
+                actual_format = "mp3"
+
+            elif response_format == "ogg":
+                converted_bytes, convert_error = convert_pcm_to_ogg(raw_audio_bytes)
+                if convert_error:
+                    return None, f"Failed to convert audio to OGG: {convert_error}"
+                encoded_content = base64.b64encode(converted_bytes or b"").decode(
+                    "ascii"
+                )
+                actual_format = "ogg"
+            else:
+                logger.warning(
+                    f"[GoogleAIProvider._execute_tts] Unsupported response_format '{response_format}', returning native WAV"
+                )
+                response_format = "wav"
+            logger.info(
+                f"[GoogleAIProvider._execute_tts] Audio conversion successful: {actual_format.upper()} ({len(raw_audio_bytes)} bytes)"
+            )
+        response_mime_type = f"audio/{response_format}"


⚠️ Potential issue | 🟡 Minor

"Audio conversion successful" log is misleading for the unsupported-format fallback path.

When response_format is unsupported (e.g., "flac"), line 268-271 logs a warning and resets to "wav", but the code then falls through to line 272-274 and logs "Audio conversion successful" even though no conversion occurred. Move the success log inside the mp3/ogg branches, or guard it.

Also, response_mime_type on line 275 now correctly uses response_format (which was reset to "wav" in the fallback), so the mime type issue from the prior review is resolved.

Proposed fix

elif response_format == "ogg": converted_bytes, convert_error = convert_pcm_to_ogg(raw_audio_bytes) if convert_error: return None, f"Failed to convert audio to OGG: {convert_error}" encoded_content = base64.b64encode(converted_bytes or b"").decode( "ascii" ) actual_format = "ogg" else: logger.warning( f"[GoogleAIProvider._execute_tts] Unsupported response_format '{response_format}', returning native WAV" ) response_format = "wav" - logger.info( - f"[GoogleAIProvider._execute_tts] Audio conversion successful: {actual_format.upper()} ({len(raw_audio_bytes)} bytes)" - ) + + if actual_format != "wav": + logger.info( + f"[GoogleAIProvider._execute_tts] Audio conversion successful: {actual_format.upper()} ({len(raw_audio_bytes)} bytes)" + ) response_mime_type = f"audio/{response_format}"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/providers/gai.py` around lines 239 - 275, The "Audio conversion successful" log is emitted even when no conversion occurs; inside GoogleAIProvider._execute_tts, move or duplicate the success log so it only runs after a real conversion, i.e., place the logger.info call into the response_format == "mp3" branch after convert_pcm_to_mp3 succeeds and into the response_format == "ogg" branch after convert_pcm_to_ogg succeeds (use convert_pcm_to_mp3, convert_pcm_to_ogg, actual_format and converted_bytes to build the message), and remove or guard the existing unconditional success log so unsupported-format fallback (where response_format is reset to "wav") does not log success.

coderabbitai · 2026-02-17T07:04:18Z

backend/app/utils.py

+def resolve_audio_base64(data: str, mime_type: str) -> tuple[str, str | None]:
+    """Decode base64 audio and write to temp file. Returns (file_path, error)."""
+    try:
+        audio_bytes = base64.b64decode(data)
+    except Exception as e:
+        return "", f"Invalid base64 audio data: {str(e)}"
+
+    ext = get_file_extension(mime_type)
+    try:
+        with tempfile.NamedTemporaryFile(
+            suffix=ext, delete=False, prefix="audio_"
+        ) as tmp:
+            tmp.write(audio_bytes)
+            temp_path = tmp.name
+
+        logger.info(f"[resolve_audio_base64] Wrote audio to temp file: {temp_path}")
+        return temp_path, None
+    except Exception as e:
+        return "", f"Failed to write audio to temp file: {str(e)}"


⚠️ Potential issue | 🟡 Minor

Temp files are never cleaned up on the error path inside resolve_audio_base64.

If base64.b64decode succeeds but the subsequent NamedTemporaryFile write fails partway through, the temp file (created with delete=False) may be left behind. The current except on line 443 returns an error string, but any partially-created file is leaked. This is a minor concern since NamedTemporaryFile creation and a single write are unlikely to fail partially, but worth noting.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/utils.py` around lines 426 - 444, The temp-file leak occurs in resolve_audio_base64 when base64 decoding succeeds but writing to NamedTemporaryFile fails; fix by tracking temp_path (initialize to None before the try), and in the outer except for the write error check if temp_path is set and the file exists then remove it (os.remove) before returning the error; ensure resolve_audio_base64 imports os and sets temp_path only after tmp.name is assigned so cleanup is safe, or alternatively use a try/finally around the NamedTemporaryFile write to always unlink on error while still returning the error string.

coderabbitai · 2026-02-17T07:04:18Z

backend/app/utils.py

+def resolve_input(query_input) -> tuple[str, str | None]:
+    """Resolve discriminated union input to content string.
+
+    Args:
+        query_input: The input from QueryParams (TextInput or AudioInput)
+
+    Returns:
+        (content_string, None) on success - for text returns content value, for audio returns temp file path
+        ("", error_message) on failure
+    """
+    # Import here to avoid circular imports
+    from app.models.llm.request import TextInput, AudioInput
+
+    try:
+        if isinstance(query_input, TextInput):
+            return query_input.content.value, None
+
+        elif isinstance(query_input, AudioInput):
+            # AudioInput content is base64-encoded audio
+            mime_type = query_input.content.mime_type or "audio/wav"
+            return resolve_audio_base64(query_input.content.value, mime_type)
+
+        else:
+            return "", f"Unknown input type: {type(query_input)}"
+
+    except Exception as e:
+        logger.error(f"[resolve_input] Failed to resolve input: {e}", exc_info=True)
+        return "", f"Failed to resolve input: {str(e)}"


🛠️ Refactor suggestion | 🟠 Major

Missing type hint on query_input parameter.

Per coding guidelines, all function parameters must have type hints. Since the local import already brings in TextInput and AudioInput, the annotation should use QueryInput (or the union type) from the request module. You can use a string annotation or TYPE_CHECKING to avoid circular imports at runtime.

Proposed fix

+from __future__ import annotations +from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from app.models.llm.request import QueryInput + ... -def resolve_input(query_input) -> tuple[str, str | None]: +def resolve_input(query_input: QueryInput) -> tuple[str, str | None]:

Alternatively, if from __future__ import annotations would be too broad a change for this file, use a string literal:

-def resolve_input(query_input) -> tuple[str, str | None]: +def resolve_input(query_input: "TextInput | AudioInput") -> tuple[str, str | None]:

As per coding guidelines: "Always add type hints to all function parameters and return values in Python code".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/utils.py` around lines 447 - 474, The function resolve_input is missing a type annotation for its query_input parameter; update its signature to type-hint query_input as the request module's QueryInput (or appropriate union of TextInput|AudioInput) using a forward/string annotation or by importing QueryInput under a TYPE_CHECKING guard to avoid circular imports, and keep the existing return type tuple[str, str | None]; ensure references in the body (TextInput, AudioInput, resolve_audio_base64) remain unchanged.

nishika26 · 2026-02-18T07:06:21Z

backend/app/services/llm/jobs.py

+            job = session.get(Job, job_id)
+            if not job:
+                # Log all jobs to see what's in the database
+                from sqlmodel import select


@Prajna1999 this comment was not addressed

nishika26 · 2026-02-18T08:25:08Z

backend/app/utils.py

    return load_description(Path(filename))
+
+
+# Input resolver functions moved from app.services.llm.input_resolver


in my opinion, the common utils file is supposed to have generic functions and all the audio related functions here seem to domain specific, but if you and other people think that this is fine then i guess its okay for now

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/services/llm/jobs.py (1)

446-453: ⚠️ Potential issue | 🟠 Major

error=request.query.input passes a Pydantic model where str is expected.

APIResponse.failure_response has error: str | list. Passing the QueryInput Pydantic object will either cause a Pydantic v2 coercion to its str() representation (producing a useless internal model dump as the error message) or a validation error. The analogous block in the input-guardrails path (dead code at line 189) correctly passes safe_input["data"]["safe_text"].

🐛 Proposed fix

-                        callback_response = APIResponse.failure_response(
-                            error=request.query.input,
-                            metadata=request.request_metadata,
-                        )
+                        callback_response = APIResponse.failure_response(
+                            error=safe_output["data"]["safe_text"],
+                            metadata=request.request_metadata,
+                        )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/jobs.py` around lines 446 - 453, The failure branch
uses APIResponse.failure_response(error=request.query.input) which passes a
Pydantic QueryInput object where a string is expected; change it to pass the
safe text string (like safe_input["data"]["safe_text"] used in the
input-guardrails path) so error is a plain str or list of str, then call
handle_job_error(job_id, request.callback_url, callback_response) as before;
update the block that checks safe_output["data"]["rephrase_needed"] to build
callback_response with error set to the safe text rather than
request.query.input.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/services/llm/jobs.py`:
- Around line 169-204: Remove the unreachable pre-session guardrails block that
begins with the "if input_guardrails:" check (the code that calls
call_guardrails(...) and manipulates safe_input/request.query.input) because
input_guardrails is only populated later in the session; instead rely on the
existing working guardrails implementation already present later (the
session-based block that uses list_validators_config and call_guardrails).
Delete the entire pre-session block to eliminate dead code and the undefined
symbol input_query and ensure only the session-based guardrail handling is
executed.
- Around line 411-412: The isinstance check is using the imported function name
resolve_input instead of the resolved value; replace the check to test the
resolved_input variable (i.e., use isinstance(resolved_input, AudioInput)) and
call cleanup_temp_file(resolved_input) so audio temp files are actually cleaned
up. Locate the block around the resolve_input usage in jobs.py where
resolved_input is set and update the conditional to reference resolved_input and
pass that object to cleanup_temp_file.
- Around line 22-24: Remove the stray closing parenthesis and the stale import:
delete the lone ")" on line 24 and replace the import of call_guardrails with
the correct active guardrails symbol; remove or change the import statement
referencing call_guardrails and ensure the file only imports
run_guardrails_validation (the function actually used in the active guardrails
path) so that the dead pre-session reference to call_guardrails is eliminated
and the module parses cleanly.
- Around line 466-475: Remove the orphaned redundant else block that assigns
response.response.output.content.value = safe_output["error"] and constructs
callback_response using APIResponse.failure_response, then calls
handle_job_error; this block references undefined safe_output and duplicates
logic already handled in the inner else, so delete that entire orphaned else
section (the duplicated assignment and return calling handle_job_error) to
restore valid Python syntax and avoid NameError while keeping the existing error
handling in the inner else branch intact.

---

Outside diff comments:
In `@backend/app/services/llm/jobs.py`:
- Around line 446-453: The failure branch uses
APIResponse.failure_response(error=request.query.input) which passes a Pydantic
QueryInput object where a string is expected; change it to pass the safe text
string (like safe_input["data"]["safe_text"] used in the input-guardrails path)
so error is a plain str or list of str, then call handle_job_error(job_id,
request.callback_url, callback_response) as before; update the block that checks
safe_output["data"]["rephrase_needed"] to build callback_response with error set
to the safe text rather than request.query.input.

---

Duplicate comments:
In `@backend/app/models/llm/request.py`:
- Around line 4-8: Remove the duplicate imports of Field and SQLModel by
consolidating into a single import statement that includes all sqlmodel symbols
used (Field, SQLModel, Index, text) and keep the other imports (Discriminator,
model_validator, HttpUrl, datetime, JSONB) separate; update the import block so
only one "from sqlmodel import ..." line exists to eliminate the F811 duplicate
import for Field/SQLModel.
- Around line 73-76: The mime_type field's type annotation disagrees with its
None default — update the annotation for mime_type back to Optional[str] (and
import Optional from typing if missing) so the Field(None, ...) default is
valid; ensure downstream usage like query_input.content.mime_type or "audio/wav"
continues to work with a nullable mime_type.

In `@backend/app/services/llm/jobs.py`:
- Around line 209-221: The code contains leftover debug scaffolding after the
session.get(Job, job_id) lookup: remove the inline import of select, the
recent-jobs query (all_jobs), and the verbose "Recent jobs in DB" logger entry;
instead, keep a concise error log in the execute_job flow (the logger.error
following the session.get(Job, job_id) check) that only reports job_id and a
clear message that the job was not found, without querying or printing other DB
rows; ensure any removal preserves control flow around
logger.info(f"[execute_job] Attempting to fetch job | job_id={job_id}") and the
subsequent handling of the missing job.

coderabbitai · 2026-02-19T04:29:52Z

backend/app/services/llm/jobs.py

+)
+from app.services.llm.guardrails import call_guardrails
 )


⚠️ Potential issue | 🔴 Critical

Stray ) on line 24 causes a parse error — CI is failing.

The from app.models.llm.request import (...) block closes correctly on line 22. Line 23 adds a standalone call_guardrails import, then line 24 has an orphaned ) that makes the file unparseable (confirmed by the CI failure: Cannot parse file: 24:0: )).

Additionally, call_guardrails (line 23) is stale: the dead pre-session block references it (line 175), while the active guardrails path uses run_guardrails_validation (already imported via lines 26–29).

🐛 Proposed fix

TextInput, AudioInput, ) -from app.services.llm.guardrails import call_guardrails -) from app.models.llm.response import TextOutput

🧰 Tools

🪛 GitHub Actions: Kaapi CI

[error] 24-24: Black formatting failed. Cannot parse file: 24:0: ).

🪛 Ruff (0.15.1)

[warning] 24-24: Expected a statement

(invalid-syntax)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/jobs.py` around lines 22 - 24, Remove the stray closing parenthesis and the stale import: delete the lone ")" on line 24 and replace the import of call_guardrails with the correct active guardrails symbol; remove or change the import statement referencing call_guardrails and ensure the file only imports run_guardrails_validation (the function actually used in the active guardrails path) so that the dead pre-session reference to call_guardrails is eliminated and the module parses cleanly.

coderabbitai · 2026-02-19T04:29:53Z

backend/app/services/llm/jobs.py

+        if input_guardrails:
+            if not isinstance(request.query.input, TextInput):
+                logger.warning(
+                    "[execute_job] Input guardrails skipped for non-text input type"
+                )
+            else:
+                safe_input = call_guardrails(input_query, input_guardrails, job_id)
+
+            logger.info(
+                f"[execute_job] Input guardrail validation | success={safe_input['success']}."
+            )
+
+            if safe_input.get("bypassed"):
+                logger.info("[execute_job] Guardrails bypassed (service unavailable)")
+
+            elif safe_input["success"]:
+                # Update the text value within the QueryInput structure
+                request.query.input.content.value = safe_input["data"]["safe_text"]
+
+                if safe_input["data"]["rephrase_needed"]:
+                    callback_response = APIResponse.failure_response(
+                        error=safe_input["data"]["safe_text"],
+                        metadata=request.request_metadata,
+                    )
+                    return handle_job_error(
+                        job_id, request.callback_url, callback_response
+                    )
+            else:
+                # Update the text value with error message
+                request.query.input.content.value = safe_input["error"]
+
+                callback_response = APIResponse.failure_response(
+                    error=safe_input["error"],
+                    metadata=request.request_metadata,
+                )
+                return handle_job_error(job_id, request.callback_url, callback_response)


⚠️ Potential issue | 🟠 Major

Dead pre-session guardrails block — unreachable and references undefined input_query.

At line 169, input_guardrails is still [] (initialized at line 160); the actual population via list_validators_config happens at lines 253–259 inside the session. The if input_guardrails: check is therefore always False and this entire block is unreachable.

Beyond being dead code, it has two additional defects that would surface if it ever ran:

Line 175: call_guardrails(input_query, ...) — input_query is never defined in this scope (NameError).

Lines 177–178: safe_input['success'] is accessed outside the TextInput branch, so safe_input would be undefined for non-text inputs.

The actual working implementation is at lines 261–300. Remove lines 169–204 entirely.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/jobs.py` around lines 169 - 204, Remove the unreachable pre-session guardrails block that begins with the "if input_guardrails:" check (the code that calls call_guardrails(...) and manipulates safe_input/request.query.input) because input_guardrails is only populated later in the session; instead rely on the existing working guardrails implementation already present later (the session-based block that uses list_validators_config and call_guardrails). Delete the entire pre-session block to eliminate dead code and the undefined symbol input_query and ensure only the session-based guardrail handling is executed.

coderabbitai · 2026-02-19T04:29:53Z

backend/app/services/llm/jobs.py

+            if resolved_input and isinstance(resolve_input, AudioInput):
                cleanup_temp_file(resolved_input)


⚠️ Potential issue | 🔴 Critical

Temp-file cleanup never executes — resolve_input is the imported function, not the input object.

resolve_input on line 411 refers to the resolve_input function imported at line 32, not the resolved string resolved_input or the request's input object. isinstance(<function>, AudioInput) is always False, so cleanup_temp_file is never called and every audio request leaks a temp file. The past review fix targeted this exact line but the wrong variable was used.

🐛 Proposed fix

- if resolved_input and isinstance(resolve_input, AudioInput): + if resolved_input and isinstance(request.query.input, AudioInput): cleanup_temp_file(resolved_input)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/jobs.py` around lines 411 - 412, The isinstance check is using the imported function name resolve_input instead of the resolved value; replace the check to test the resolved_input variable (i.e., use isinstance(resolved_input, AudioInput)) and call cleanup_temp_file(resolved_input) so audio temp files are actually cleaned up. Locate the block around the resolve_input usage in jobs.py where resolved_input is set and update the conditional to reference resolved_input and pass that object to cleanup_temp_file.

coderabbitai · 2026-02-19T04:29:53Z

backend/app/services/llm/jobs.py

+                else:
+                    response.response.output.content.value = safe_output["error"]
+
+                    callback_response = APIResponse.failure_response(
+                        error=safe_output["error"],
+                        metadata=request.request_metadata,
+                    )
+                    return handle_job_error(
+                        job_id, request.callback_url, callback_response
+                    )


⚠️ Potential issue | 🔴 Critical

Orphaned else: block is invalid Python syntax and references undefined safe_output.

Line 466 introduces a second else: at the same indentation (8 spaces) as line 421's else:, which already closes the if not isinstance(response.response.output, TextOutput): conditional. Python allows only one else per conditional, making this a syntax error confirmed by Ruff (lines 466–467: "Expected a statement").

Additionally, if this block somehow executed (e.g., in the not isinstance branch), safe_output (set at line 423 inside the else at line 421) would be undefined, causing a NameError. This block is a copy-paste artifact of the inner else: at line 455 and should be removed entirely.

🐛 Proposed fix — remove the orphaned block

else: response.response.output.content.value = safe_output["error"] callback_response = APIResponse.failure_response( error=safe_output["error"], metadata=request.request_metadata, ) return handle_job_error( job_id, request.callback_url, callback_response ) - else: - response.response.output.content.value = safe_output["error"] - - callback_response = APIResponse.failure_response( - error=safe_output["error"], - metadata=request.request_metadata, - ) - return handle_job_error( - job_id, request.callback_url, callback_response - ) callback_response = APIResponse.success_response(

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

else:

response.response.output.content.value = safe_output["error"]

callback_response = APIResponse.failure_response(

error=safe_output["error"],

metadata=request.request_metadata,

)

return handle_job_error(

job_id, request.callback_url, callback_response

)

🧰 Tools

🪛 Ruff (0.15.1)

[warning] 466-466: Expected a statement

(invalid-syntax)

[warning] 466-466: Expected a statement

(invalid-syntax)

[warning] 466-467: Expected a statement

(invalid-syntax)

[warning] 467-467: Unexpected indentation

(invalid-syntax)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/llm/jobs.py` around lines 466 - 475, Remove the orphaned redundant else block that assigns response.response.output.content.value = safe_output["error"] and constructs callback_response using APIResponse.failure_response, then calls handle_job_error; this block references undefined safe_output and duplicates logic already handled in the inner else, so delete that entire orphaned else section (the duplicated assignment and return calling handle_job_error) to restore valid Python syntax and avoid NameError while keeping the existing error handling in the inner else branch intact.

…-tts' into feature/unified-api-tts

Prajna1999 and others added 28 commits January 20, 2026 21:57

fix: merge conflict

5aad042

chore: update dependencies

f47c20c

feat: add google ai provider for Gemini models

1e03961

feat: working stt with gemini and hotfixing circular import

4ac4de8

Merge branch 'main' into feature/unified-api-stt-new

3c0bae7

Merge branch 'main' into feature/unified-api-stt-new

7db94f1

feat: llm_call table, type enforce gAI stt response

196eb5c

Merge remote-tracking branch 'refs/remotes/origin/feature/unified-api…

dca3139

…-stt-new' into feature/unified-api-stt-new merge upstream

feat: discriminated union type enforcing for stt, tts and text comple…

271d677

…tion config

Merge branch 'main' into feature/unified-api-stt-new

5ae59e5

fix: type annotation

250ce9f

chore: fix alembic revision for shure

1742a8b

feat: add google stt task to async job

ebb2394

feat: yolo commit and linting issues

0bcb697

feat: query input takes audio_url and base64 as audio file input

a6850a3

chore: test cases for google ai and async job fixes, supress mappers …

f4693f6

…test for now

fix: test cases for config

909e249

chore: clean PLAN.md

a7b0062

chore: extract stt code into its own

fa25199

Merge branch 'main' into feature/unified-api-stt-new

24007a2

Refactor evaluation endpoint to use stored configuration and remove a…

bbd2c7f

…ssistant_id handling

fix: default original provider bug

b907440

fix: coderrabbit comments

9f38f45

Merge branch 'main' into feature/unified-api-stt-new

f6348b5

fix: migration number

b3ea8ec

chore: formatting issue solved

26e0a6a

fix: eval core crud test cases

a623efa

fix: test cases for evaluation and test_llm

5c86cf2

Prajna1999 requested review from nishika26 and vprashrex February 2, 2026 15:59

Prajna1999 force-pushed the feature/unified-api-tts branch from af56d56 to c9be67a Compare February 12, 2026 04:15

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

nishika26 requested changes Feb 12, 2026

View reviewed changes

Prajna1999 added 10 commits February 12, 2026 12:16

refactor: fix version crud naming

3fb6ce6

chore: fix test cases

ee3dd60

feat: basic tts implementation with gemini-2.5-pro-preview-tts

75cccf6

refactor: use pydub for wav to ogg, mp3 conversion

243e6f1

feat: add tts config fields to mappers function

cc3fa95

chore: fix test cases

f54c1a9

Merge remote-tracking branch 'refs/remotes/origin/feature/unified-api…

c044e66

…-tts' into feature/unified-api-tts

Merge branch 'main' into feature/unified-api-tts

065ac74

Merge remote-tracking branch 'refs/remotes/origin/feature/unified-api…

1d953e6

…-tts' into feature/unified-api-tts

fix: comments

1f93b0e

coderabbitai bot reviewed Feb 17, 2026

View reviewed changes

vprashrex approved these changes Feb 18, 2026

View reviewed changes

nishika26 requested changes Feb 18, 2026

View reviewed changes

Merge branch 'main' into feature/unified-api-tts

afe349f

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

Prajna1999 moved this to In Progress in Kaapi-dev Feb 19, 2026

Prajna1999 added this to Kaapi-dev Feb 19, 2026

Prajna1999 moved this from In Progress to In Review in Kaapi-dev Feb 19, 2026

Prajna1999 added 8 commits February 19, 2026 11:29

chore: remove unsued imports

2f1e32d

reafactor: refactor execute_job function and resolved other comments

6553e78

fix: test_job test case

9fffc81

Merge branch 'main' into feature/unified-api-tts

3a2d625

chore: test cases coverage and cleanups

1999cbf

Merge remote-tracking branch 'refs/remotes/origin/feature/unified-api…

3e0f069

…-tts' into feature/unified-api-tts

fix_name error

8d9ae7d

fix: test cases

6245e8e

	config = types.GenerateContentConfig(**config_kwargs)
	config = GenerateContentConfig(**config_kwargs)

		return load_description(Path(filename))


		# Input resolver functions moved from app.services.llm.input_resolver

		if resolved_input and isinstance(resolve_input, AudioInput):
		cleanup_temp_file(resolved_input)

Comments

Conversation

Prajna1999 commented Feb 2, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Prajna1999 commented Feb 2, 2026 •

edited by coderabbitai bot

Loading