Skip to content

Comments

TTS/STT: Gemini-TTS Model Integration with unified API#574

Open
Prajna1999 wants to merge 64 commits intomainfrom
feature/unified-api-tts
Open

TTS/STT: Gemini-TTS Model Integration with unified API#574
Prajna1999 wants to merge 64 commits intomainfrom
feature/unified-api-tts

Conversation

@Prajna1999
Copy link
Collaborator

@Prajna1999 Prajna1999 commented Feb 2, 2026

Summary

Target issue is #528
Intergrate Gemin-pro-2.5-preview-tts models with unified API

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

  1. Currently tested with only Hindi and English as the output language.
  2. Ad hoc main function to test tts scripts will be deleted before final push.

Summary by CodeRabbit

  • New Features

    • Added Text-to-Speech support (Google Gemini) with configurable model, voice, language, and output formats (WAV/MP3/OGG).
    • Audio conversion options (MP3, OGG) with format selection and MIME-aware outputs.
  • Improvements

    • Better audio input validation and accurate audio size calculations.
    • Consolidated input resolution and temporary-audio handling for text and audio inputs.
  • Chores

    • Added pydub dependency for audio processing.
  • Tests

    • Extensive STT and TTS test coverage added.

Prajna1999 and others added 28 commits January 20, 2026 21:57
…-stt-new' into feature/unified-api-stt-new

merge upstream
@Prajna1999 Prajna1999 force-pushed the feature/unified-api-tts branch from af56d56 to c9be67a Compare February 12, 2026 04:15
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/app/services/llm/jobs.py (2)

345-382: ⚠️ Potential issue | 🟠 Major

Output guardrails assume TextOutput — will break for TTS/audio responses.

Line 347 accesses response.response.output.content.value. For TTS responses, output is AudioOutput where content.value is base64-encoded audio data. Passing that to call_guardrails as text is meaningless, and writing safe_text back (Lines 360-362, 374) corrupts the audio payload.

Apply the same type-guarding pattern as for input guardrails:

Proposed fix
         if response:
             if output_guardrails:
-                output_text = response.response.output.content.value
-                safe_output = call_guardrails(output_text, output_guardrails, job_id)
+                from app.models.llm.response import TextOutput
+                if not isinstance(response.response.output, TextOutput):
+                    logger.warning("[execute_job] Output guardrails skipped for non-text output type")
+                else:
+                    output_text = response.response.output.content.value
+                    safe_output = call_guardrails(output_text, output_guardrails, job_id)
+                    # ... rest of guardrail handling indented under else

364-368: ⚠️ Potential issue | 🟠 Major

error=request.query.input passes a Pydantic model object as the error string.

Line 366 sets error=request.query.input, which is a TextInput/AudioInput model — not a string. This will serialize oddly (or raise) when APIResponse.failure_response processes it, and it leaks the raw user input (potentially sensitive) into the error payload.

This should likely be a descriptive error message instead:

Proposed fix
                     if safe_output["data"]["rephrase_needed"] == True:
                         callback_response = APIResponse.failure_response(
-                            error=request.query.input,
+                            error="Output guardrail flagged content for rephrasing",
                             metadata=request.request_metadata,
                         )
🤖 Fix all issues with AI agents
In `@backend/app/alembic/versions/046_add_llm_call_table.py`:
- Around line 66-71: The provider column on the llm_call table is created with
the initial comment but then immediately altered; instead, set the final comment
directly in the create_table call by updating the provider column's comment to
"AI provider as sent by user (e.g. openai, openai-native, google)" (fixing the
"-native" typo), then remove the redundant sa.alter_column(...) block in
upgrade() that targets llm_call.provider as well as its reversal in downgrade();
keep everything else in create_table, upgrade(), and downgrade() unchanged.

In `@backend/app/api/routes/config/version.py`:
- Around line 25-30: The endpoint functions (e.g., create_version) are missing
return type annotations; update create_version and the other handlers in this
file to include explicit return type hints that match the endpoint's actual
returned type (for example the appropriate Pydantic model or FastAPI
Response/JSONResponse type used elsewhere in the codebase), e.g., replace the
bare def create_version(...) with a signature that specifies the correct return
type and do the same for the other handler functions in this module (use the
same model/class names used by other routes to stay consistent).
- Around line 25-41: Add explicit return type hints to all endpoint functions
create_version, list_versions, get_version, and delete_version (e.g., ->
Response or the FastAPI/Starlette response model used in this project) and
update the function signatures accordingly; in create_version, keep the call to
ConfigVersionCrud.create_from_partial_or_raise(version_create=version_create)
but add a comment/docstring line and API documentation entry stating this
endpoint only supports partial version creation (will return HTTP 400 with
message "Cannot create partial version: no existing version found. Use full
config for initial version.") and that initial versions must be created via the
config creation endpoint so callers understand the intentional breaking change.

In `@backend/app/crud/config/version.py`:
- Around line 291-311: The duplicate "get latest version" SQL in
_validate_config_type_unchanged should be replaced by calling the existing
helper _get_latest_version() to keep code DRY: inside
_validate_config_type_unchanged call latest_version = self._get_latest_version()
(or equivalent) and then proceed with the existing type-check logic using
latest_version; ensure you preserve the current behavior when latest_version is
None and raise the same HTTPException if types don't match (and keep references
to ConfigVersion, version_create, and self.session logic intact).

In `@backend/app/crud/llm.py`:
- Around line 39-47: The current AudioInput branch sets size_bytes using
len(query_input.content.value), which measures the base64 string length; instead
decode the base64 to get the actual byte count (e.g. size_bytes =
len(base64.b64decode(query_input.content.value))) and update the JSON generation
in the AudioInput branch (the block handling isinstance(query_input,
AudioInput)) accordingly; add the necessary import for the base64 module and
consider wrapping decoding in a try/except if input might be malformed to avoid
exceptions.
- Around line 76-99: The code uses getattr/hasattr on completion_config.params
which breaks when params is a dict (e.g., NativeCompletionConfig); update the
logic in the block that computes completion_type and model to handle both dict
and object params: check if completion_config.params is a dict and use
params.get("type", "text") and params.get("model", "") in that case, otherwise
keep using getattr(completion_config.params, "type", "text") and
getattr/completion_config.params.model via hasattr for model; adjust the
assignment of completion_type and model accordingly (refer to completion_config,
completion_type, model, and resolved_config in the diff).

In `@backend/app/models/llm/request.py`:
- Around line 488-492: The updated_at Field currently uses default_factory=now
which only sets the timestamp at object creation; modify the Field definition
for updated_at to use sa_column with an onupdate clause (e.g.,
sa_column=Column(..., onupdate=now)) so SQLAlchemy will auto-update the
timestamp on record modifications, or alternately ensure every update path in
the CRUD layer explicitly sets updated_at = now() before persisting; locate the
updated_at Field in request.py and implement the sa_column onupdate approach (or
add the explicit update in functions that perform updates) to guarantee
updated_at changes on every update.
- Around line 3-11: The file imports Field and SQLModel twice; consolidate them
by removing the duplicate import and using a single import from sqlmodel that
includes Field, SQLModel, Index, and text (so references to Field and SQLModel
in functions/classes like any model definitions remain valid); ensure other
imports from sqlmodel (Index, text) are preserved in that single import and
delete the redundant earlier import to resolve the F811 duplicate-import error.

In `@backend/app/models/llm/response.py`:
- Around line 31-32: Remove None from the discriminated union so LLMOutput is
Annotated[TextOutput | AudioOutput, Field(discriminator="type")] (keep
TextOutput and AudioOutput as variants), delete the unused imports base64 and
model_validator, and change the model field that currently uses LLMOutput to
accept Optional[LLMOutput] (i.e., make the field Optional/nullable outside the
discriminated union) so validation works with Pydantic v2.

In `@backend/app/services/llm/mappers.py`:
- Around line 98-102: The google mapper is assigning google_params["model"] =
kaapi_params.get("model") unconditionally which can set model to None; change it
to only add "model" to google_params when kaapi_params contains a non-null model
(mirror the OpenAI mapper's conditional behavior used around model in that file)
so that google_params only includes the model key when kaapi_params.get("model")
is truthy/non-null.

In `@backend/app/services/llm/providers/gai.py`:
- Around line 94-95: The uploaded gemini_file created by
self.client.files.upload(file=resolved_input) is never deleted; modify the flow
around generate_content (the method that consumes gemini_file) to ensure cleanup
by calling self.client.files.delete(name=gemini_file.name) in a finally block so
the uploaded object is removed regardless of success or error; locate the upload
call (gemini_file = self.client.files.upload(...)) and wrap the subsequent
generate_content invocation in try/finally to always delete gemini_file via
self.client.files.delete.
- Around line 239-271: The code in GoogleAIProvider._execute_tts lets
unsupported response_format values fall through: it may log a successful
conversion and set response_mime_type to the requested format while
actual_format and raw_audio_bytes remain WAV; fix by validating/handling
unsupported formats before conversion (or after the conversion switch).
Specifically, check response_format against allowed values ("wav","mp3","ogg")
and if it's unsupported return an explicit error or default to "wav" (updating
response_mime_type and actual_format accordingly), and move or guard the "Audio
conversion successful" log so it only runs when a conversion actually occurred
(use actual_format or the convert_error flags and raw_audio_bytes to decide).

In `@backend/app/services/llm/providers/PLAN.md`:
- Line 88: Update the phrase "JAVA like" to the hyphenated form "JAVA-like" in
the sentence "Do not write over abstracted code. Focus on readability than
pristine JAVA-like code." so the documentation uses the consistent hyphenated
form; locate and edit that exact sentence in PLAN.md.
- Line 84: Fix the spelling typo in PLAN.md by replacing "thorugh" with
"through" in the sentence that directs the reader to review the implementation;
while editing, ensure the sentence still references the correct implementation
to review (the _execute_stt function in providers.gai.py and the relevant models
in the app/models/llm module) so the documentation remains accurate and clear.

In `@backend/app/services/llm/providers/registry.py`:
- Around line 70-72: The logger.error call in registry.py needs a function-name
prefix in square brackets; capture the runtime function name (e.g., via import
inspect and inspect.currentframe().f_code.co_name) and change the message to
include that prefix before the existing text (e.g.,
logger.error(f"[{inspect.currentframe().f_code.co_name}] Failed to initialize
{provider_type} client: {e}", exc_info=True)); keep the same exception re-raise
(RuntimeError) unchanged.
🧹 Nitpick comments (11)
backend/app/alembic/versions/046_add_llm_call_table.py (1)

22-138: Consider adding an index on project_id for project-scoped queries.

get_llm_call_by_id in backend/app/crud/llm.py filters by project_id, and project-scoped lookups are a common access pattern. A partial index similar to the existing ones would help:

op.create_index(
    "idx_llm_call_project_id",
    "llm_call",
    ["project_id"],
    unique=False,
    postgresql_where=sa.text("deleted_at IS NULL"),
)
backend/app/models/llm/response.py (1)

7-8: Unused imports: base64 and model_validator.

Neither base64 nor model_validator is referenced anywhere in this file. Remove them to keep the module clean.

Suggested fix
-import base64
-from pydantic import model_validator
 from sqlmodel import SQLModel, Field
backend/app/crud/llm.py (1)

179-191: Mutating the caller's content dict in-place.

Line 187 adds audio_size_bytes to the passed-in content dict. If the caller retains a reference, they'll see this side effect. Consider working on a copy:

Suggested fix
     if content is not None:
+        content = content.copy()
         # For audio outputs (AudioOutput model): calculate size metadata from base64 content
backend/app/crud/config/version.py (1)

110-120: Leaking internal validation details to API consumers.

str(e) from a Pydantic ValidationError can be very verbose and expose internal schema names/field paths. Consider sanitizing or summarizing the error before returning it in the HTTP response.

Suggested fix
         except ValidationError as e:
             logger.error(
                 f"[ConfigVersionCrud.create_from_partial] Validation failed | "
                 f"{{'config_id': '{self.config_id}', 'error': '{str(e)}'}}"
             )
             raise HTTPException(
                 status_code=400,
-                detail=f"Invalid config after merge: {str(e)}",
+                detail="Invalid config after merge. Check that all required fields are present and valid.",
             )
backend/app/services/llm/jobs.py (2)

130-136: Missing type hint for task_instance parameter.

As per coding guidelines, all function parameters should have type hints.

 def execute_job(
     request_data: dict,
     project_id: int,
     organization_id: int,
     job_id: str,
     task_id: str,
-    task_instance,
+    task_instance: Any,
 ) -> dict:

253-255: Remove empty else: pass block.

This serves no purpose and adds noise.

                     request.request_metadata.setdefault("warnings", []).extend(warnings)
-                else:
-                    pass
backend/app/services/llm/providers/gai.py (4)

4-8: ThinkingConfig is imported but unused.

The import on line 7 is dead code (the usage at line 107 is commented out). Remove it to satisfy clean imports.

 from google.genai.types import (
     GenerateContentResponse,
     GenerateContentConfig,
-    ThinkingConfig,
 )

126-128: f-strings without placeholders — use plain strings.

Static analysis (Ruff F541) correctly flags lines 127 and 281. These are f-strings with no interpolation.

-            logger.warning(
-                f"[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
-            )
+            logger.warning(
+                "[GoogleAIProvider._execute_stt] Response missing usage_metadata, using zeros"
+            )
-            logger.warning(
-                f"[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros"
-            )
+            logger.warning(
+                "[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros"
+            )

Also applies to: 280-282


34-35: Redundant self.client assignment.

super().__init__(client) in BaseProvider.__init__ already sets self.client = client (see base.py). Line 35 is unnecessary.

         super().__init__(client)
-        self.client = client

206-207: Avoid inline import inside the method body.

from google.genai import types is imported inline at line 207 but google.genai.types members are already imported at the top of the file (lines 4-8). Either import types at the top level or use the already-imported GenerateContentConfig directly (it's types.GenerateContentConfig). This also avoids repeated import overhead on every TTS call.

+from google.genai import types  # move to top-level imports
backend/app/models/llm/request.py (1)

180-182: Provider field description is outdated.

The description reads "LLM provider (openai)" but the Literal now accepts both "openai" and "google".

     provider: Literal["openai", "google"] = Field(
-        ..., description="LLM provider (openai)"
+        ..., description="LLM provider (e.g., openai, google)"
     )

Comment on lines 66 to 71
sa.Column(
"provider",
sa.String(),
nullable=False,
comment="AI provider: openai, google, anthropic",
),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Redundant alter_column on llm_call.provider — set the final comment directly in create_table.

The migration creates llm_call.provider with comment "AI provider: openai, google, anthropic" at Line 70, then immediately alters it at Lines 161-168 to "AI provider as sent by user (e.g openai, -native, google)". Just use the final comment in the create_table call. Also note the typo: "-native" should likely be "openai-native".

Suggested fix

In the create_table call, replace the provider column comment:

         sa.Column(
             "provider",
             sa.String(),
             nullable=False,
-            comment="AI provider: openai, google, anthropic",
+            comment="AI provider as sent by user (e.g. openai, openai-native, google)",
         ),

Then remove the redundant alter_column block (Lines 161-168) from upgrade() and the corresponding reversal (Lines 182-188) from downgrade().

Also applies to: 161-168

🤖 Prompt for AI Agents
In `@backend/app/alembic/versions/046_add_llm_call_table.py` around lines 66 - 71,
The provider column on the llm_call table is created with the initial comment
but then immediately altered; instead, set the final comment directly in the
create_table call by updating the provider column's comment to "AI provider as
sent by user (e.g. openai, openai-native, google)" (fixing the "-native" typo),
then remove the redundant sa.alter_column(...) block in upgrade() that targets
llm_call.provider as well as its reversal in downgrade(); keep everything else
in create_table, upgrade(), and downgrade() unchanged.

Comment on lines 291 to 311
def _validate_config_type_unchanged(
self, version_create: ConfigVersionCreate
) -> None:
"""
Validate that the config type (text/stt/tts) in the new version matches
the type from the latest existing version.
Raises HTTPException if types don't match.
"""
# Get the latest version
stmt = (
select(ConfigVersion)
.where(
and_(
ConfigVersion.config_id == self.config_id,
ConfigVersion.deleted_at.is_(None),
)
)
.order_by(ConfigVersion.version.desc())
.limit(1)
)
latest_version = self.session.exec(stmt).first()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Duplicated "get latest version" query — reuse _get_latest_version().

_validate_config_type_unchanged (Lines 300-311) duplicates the exact same query as _get_latest_version() (Lines 157-168). Reuse the helper to keep things DRY.

Suggested fix
     def _validate_config_type_unchanged(
         self, version_create: ConfigVersionCreate
     ) -> None:
-        # Get the latest version
-        stmt = (
-            select(ConfigVersion)
-            .where(
-                and_(
-                    ConfigVersion.config_id == self.config_id,
-                    ConfigVersion.deleted_at.is_(None),
-                )
-            )
-            .order_by(ConfigVersion.version.desc())
-            .limit(1)
-        )
-        latest_version = self.session.exec(stmt).first()
+        latest_version = self._get_latest_version()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _validate_config_type_unchanged(
self, version_create: ConfigVersionCreate
) -> None:
"""
Validate that the config type (text/stt/tts) in the new version matches
the type from the latest existing version.
Raises HTTPException if types don't match.
"""
# Get the latest version
stmt = (
select(ConfigVersion)
.where(
and_(
ConfigVersion.config_id == self.config_id,
ConfigVersion.deleted_at.is_(None),
)
)
.order_by(ConfigVersion.version.desc())
.limit(1)
)
latest_version = self.session.exec(stmt).first()
def _validate_config_type_unchanged(
self, version_create: ConfigVersionCreate
) -> None:
"""
Validate that the config type (text/stt/tts) in the new version matches
the type from the latest existing version.
Raises HTTPException if types don't match.
"""
latest_version = self._get_latest_version()
🤖 Prompt for AI Agents
In `@backend/app/crud/config/version.py` around lines 291 - 311, The duplicate
"get latest version" SQL in _validate_config_type_unchanged should be replaced
by calling the existing helper _get_latest_version() to keep code DRY: inside
_validate_config_type_unchanged call latest_version = self._get_latest_version()
(or equivalent) and then proceed with the existing type-check logic using
latest_version; ensure you preserve the current behavior when latest_version is
None and raise the same HTTPException if types don't match (and keep references
to ConfigVersion, version_create, and self.session logic intact).

1. **Instruction Priority:** If you set `speaking_rate=2.0` and also put "Speak very slowly" in the `system_instruction`, the model may produce erratic results. Use **natural language** for "tone" and **programmatic fields** for "fixed pacing."
2. **Streaming:** The `google-genai` SDK supports streaming audio bytes for real-time applications via the `models.generate_content_stream` method, but `audio_encoding` must be `LINEAR16` or `PCM` for minimum latency.
3. **SynthID:** Note that all output audio is watermarked with Google's SynthID for safety tracking.
4. You are required to go thorugh the implementation of _execute_stt function inside the app/services/llm/providers.gai.py and relevant models inside app/models/llm as well.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix spelling typo (“thorugh” → “through”).
Minor documentation error.

🧰 Tools
🪛 LanguageTool

[grammar] ~84-~84: Ensure spelling is correct
Context: ...ety tracking. 4. You are required to go thorugh the implementation of _execute_stt func...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
In `@backend/app/services/llm/providers/PLAN.md` at line 84, Fix the spelling typo
in PLAN.md by replacing "thorugh" with "through" in the sentence that directs
the reader to review the implementation; while editing, ensure the sentence
still references the correct implementation to review (the _execute_stt function
in providers.gai.py and the relevant models in the app/models/llm module) so the
documentation remains accurate and clear.

5. Make sure to follow the celery task queue structure and pydantic models idioms and database schema.
6. The generated audio is not saved in the database, so store some metadata for now.
7. Make sure to follow the TTS config types, (KaapiLLMConfig and Native LLM Configs and relevant models for config and versions).
8. Do not write over abstracted code. Focus on readability than pristine JAVA-like code.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hyphenate “JAVA-like”.
Use “JAVA-like” for readability and consistency.

🧰 Tools
🪛 LanguageTool

[grammar] ~88-~88: Use a hyphen to join words.
Context: ...nfig and versions). 8. Do not write over abstracted code. Focus on readability th...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
In `@backend/app/services/llm/providers/PLAN.md` at line 88, Update the phrase
"JAVA like" to the hyphenated form "JAVA-like" in the sentence "Do not write
over abstracted code. Focus on readability than pristine JAVA-like code." so the
documentation uses the consistent hyphenated form; locate and edit that exact
sentence in PLAN.md.

Comment on lines 70 to 72
except Exception as e:
logger.error(f"Failed to initialize {provider_type} client: {e}", exc_info=True)
raise RuntimeError(f"Could not connect to {provider_type} services.")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Log message missing [function_name] prefix.

Per coding guidelines, log messages should be prefixed with the function name in square brackets.

Suggested fix
-        logger.error(f"Failed to initialize {provider_type} client: {e}", exc_info=True)
+        logger.error(f"[get_llm_provider] Failed to initialize {provider_type} client: {e}", exc_info=True)

As per coding guidelines: Prefix all log messages with the function name in square brackets.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
logger.error(f"Failed to initialize {provider_type} client: {e}", exc_info=True)
raise RuntimeError(f"Could not connect to {provider_type} services.")
except Exception as e:
logger.error(f"[get_llm_provider] Failed to initialize {provider_type} client: {e}", exc_info=True)
raise RuntimeError(f"Could not connect to {provider_type} services.")
🤖 Prompt for AI Agents
In `@backend/app/services/llm/providers/registry.py` around lines 70 - 72, The
logger.error call in registry.py needs a function-name prefix in square
brackets; capture the runtime function name (e.g., via import inspect and
inspect.currentframe().f_code.co_name) and change the message to include that
prefix before the existing text (e.g.,
logger.error(f"[{inspect.currentframe().f_code.co_name}] Failed to initialize
{provider_type} client: {e}", exc_info=True)); keep the same exception re-raise
(RuntimeError) unchanged.

model: str
voice: str
language: str
response_format: Literal["mp3", "wav", "ogg"] | None = "wav"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was description added above but not here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the keys looks self explanatory. Hence did not add extra sentences

},
)

organization_id: int = Field(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the org id to be in the table if project id is alreaady there and we can find out the org id from the project id

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we discussed regarding that. Decided not to remove that for the time being because it would be easier to easier to fetch project and org from the table without multiple joins.


director_notes = gemini_params.get("director_notes")
# Build Gemini TTS config
from google.genai import types
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please place this import above and not here

import logging

import openai
from openai import OpenAI
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel like naming this file "oai" is confusing, openai is better

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that was the case but its creating circular dependency issues i.e importing OpenAI SDK from OpenAI library and re-exporting at top level from openai.py. so shortened the nomenclature. We can however think about better naming in a sperate PR>

@@ -0,0 +1,87 @@
import base64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file should be in a specific audio folder inside services/llm

job = session.get(Job, job_id)
if not job:
# Log all jobs to see what's in the database
from sqlmodel import select
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move this import up

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Prajna1999 this comment was not addressed

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
backend/app/services/llm/providers/gai.py (1)

283-286: f-string without placeholders on line 285.

Static analysis (Ruff F541) flags this. Remove the f prefix.

Proposed fix
-            logger.warning(
-                f"[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros"
-            )
+            logger.warning(
+                "[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros"
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 283 - 286, The
logger.warning call inside GoogleAIProvider._execute_tts uses an unnecessary
f-string (no placeholders); remove the leading "f" so the message is a plain
string literal (update the logger.warning invocation that currently contains
"[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros").
Ensure no other f-strings without placeholders remain in the _execute_tts
method.
backend/app/services/llm/mappers.py (1)

102-106: Use the already-validated model variable instead of re-fetching from dict.

Line 102 already extracts and validates model = kaapi_params.get("model"), but line 106 calls kaapi_params.get("model") again. Use the local variable for consistency and to avoid redundant dict lookups.

Proposed fix
     model = kaapi_params.get("model")
     if not model:
         return {}, ["Missing required 'model' parameter"]
 
-    google_params["model"] = kaapi_params.get("model")
+    google_params["model"] = model
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/mappers.py` around lines 102 - 106, The code
validates model with model = kaapi_params.get("model") but then redundantly
re-reads kaapi_params.get("model") when setting google_params["model"]; update
that assignment to use the already-validated local variable model instead of
re-fetching from kaapi_params to avoid redundant lookups and ensure consistency
(refer to the variables model, kaapi_params, and google_params in the mapper
function).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/models/llm/request.py`:
- Around line 69-76: The AudioContent model's mime_type is annotated as
non-nullable str but defaults to None; update the type annotation for the
mime_type field on the AudioContent class to allow None (e.g., make mime_type of
type str | None) so it matches the default and the downstream usage (see places
referencing query_input.content.mime_type or "audio/wav") and adjust the
Field(...) declaration to keep the same default None and description.

In `@backend/app/services/llm/providers/gai.py`:
- Line 222: The code uses an undefined symbol types.GenerateContentConfig
causing a NameError; replace the incorrect reference with the already-imported
GenerateContentConfig by changing the line that sets config (currently using
types.GenerateContentConfig) to use GenerateContentConfig(**config_kwargs),
ensuring the GenerateContentConfig symbol (imported at top) is used instead of
the nonexistent types module.
- Around line 239-275: The "Audio conversion successful" log is emitted even
when no conversion occurs; inside GoogleAIProvider._execute_tts, move or
duplicate the success log so it only runs after a real conversion, i.e., place
the logger.info call into the response_format == "mp3" branch after
convert_pcm_to_mp3 succeeds and into the response_format == "ogg" branch after
convert_pcm_to_ogg succeeds (use convert_pcm_to_mp3, convert_pcm_to_ogg,
actual_format and converted_bytes to build the message), and remove or guard the
existing unconditional success log so unsupported-format fallback (where
response_format is reset to "wav") does not log success.

In `@backend/app/utils.py`:
- Around line 426-444: The temp-file leak occurs in resolve_audio_base64 when
base64 decoding succeeds but writing to NamedTemporaryFile fails; fix by
tracking temp_path (initialize to None before the try), and in the outer except
for the write error check if temp_path is set and the file exists then remove it
(os.remove) before returning the error; ensure resolve_audio_base64 imports os
and sets temp_path only after tmp.name is assigned so cleanup is safe, or
alternatively use a try/finally around the NamedTemporaryFile write to always
unlink on error while still returning the error string.
- Around line 447-474: The function resolve_input is missing a type annotation
for its query_input parameter; update its signature to type-hint query_input as
the request module's QueryInput (or appropriate union of TextInput|AudioInput)
using a forward/string annotation or by importing QueryInput under a
TYPE_CHECKING guard to avoid circular imports, and keep the existing return type
tuple[str, str | None]; ensure references in the body (TextInput, AudioInput,
resolve_audio_base64) remain unchanged.

---

Duplicate comments:
In `@backend/app/crud/llm.py`:
- Around line 68-91: The code incorrectly uses getattr/hasattr on
completion_config.params which may be a dict; update logic in the
completion_type and model resolution to handle both dict and object shapes: for
completion_type, check if completion_config.params is a dict and use
completion_config.params.get("type", "text"), otherwise use
getattr(completion_config.params, "type", "text"); for model, similarly if
params is a dict use params.get("model", "") else use
getattr(completion_config.params, "model", completion_config.params.get("model",
"") if hasattr(completion_config.params, "get") else ""); adjust references to
completion_config, completion_type, model, and params accordingly so both
dict-based (NativeCompletionConfig) and object-based params work.

In `@backend/app/models/llm/request.py`:
- Around line 4-8: There are duplicate imports of Field and SQLModel (two
sqlmodel import lines); remove the redundant import and consolidate all symbols
imported from sqlmodel into a single import statement so only one import
supplies Field, SQLModel, Index, and text; keep the other imports
(Discriminator, model_validator, HttpUrl, datetime, JSONB) as-is and ensure no
name collisions remain in request.py.

In `@backend/app/services/llm/providers/gai.py`:
- Around line 96-97: The uploaded Gemini file (assigned to gemini_file via
self.client.files.upload(file=resolved_input)) is never deleted, causing storage
accumulation; wrap the upload + generate_content flow in try/finally and in the
finally block call the provider client file-delete API (e.g.,
self.client.files.delete(...) using the gemini_file identifier or name) to
remove the uploaded artifact regardless of success/failure, ensuring any
exceptions from delete are caught/logged so cleanup does not mask the original
error.

---

Nitpick comments:
In `@backend/app/services/llm/mappers.py`:
- Around line 102-106: The code validates model with model =
kaapi_params.get("model") but then redundantly re-reads
kaapi_params.get("model") when setting google_params["model"]; update that
assignment to use the already-validated local variable model instead of
re-fetching from kaapi_params to avoid redundant lookups and ensure consistency
(refer to the variables model, kaapi_params, and google_params in the mapper
function).

In `@backend/app/services/llm/providers/gai.py`:
- Around line 283-286: The logger.warning call inside
GoogleAIProvider._execute_tts uses an unnecessary f-string (no placeholders);
remove the leading "f" so the message is a plain string literal (update the
logger.warning invocation that currently contains
"[GoogleAIProvider._execute_tts] Response missing usage_metadata, using zeros").
Ensure no other f-strings without placeholders remain in the _execute_tts
method.

Comment on lines 69 to 76
class AudioContent(SQLModel):
format: Literal["base64"] = "base64"
value: str = Field(..., min_length=1, description="Base64 encoded audio")
value: str = Field(..., description="Base64 encoded audio")
# keeping the mime_type liberal here, since does not affect transcription type
mime_type: str | None = Field(
mime_type: str = Field(
None,
description="MIME type of the audio (e.g., audio/wav, audio/mp3, audio/ogg)",
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the actual code in request.py at the specified lines
cat -n backend/app/models/llm/request.py | sed -n '69,76p'

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 482


🏁 Script executed:

# Check how mime_type is used downstream, especially in utils.py
rg -n "mime_type" --type=py -C3

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 16516


🏁 Script executed:

# Specifically check the utils.py line 466 mentioned in the comment
if [ -f backend/app/utils.py ] || [ -f backend/utils.py ] || [ -f app/utils.py ]; then
  find . -name "utils.py" -path "*/backend/*" | head -1 | xargs sed -n '460,472p' -
fi

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57


🏁 Script executed:

# Let's also check if there are any type checking or validation issues
# Check how AudioContent is instantiated or used
rg -n "AudioContent" --type=py -C2

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 3900


🏁 Script executed:

# Check SQLModel/Pydantic version and configuration
find . -name "pyproject.toml" -o -name "requirements.txt" | head -5 | xargs grep -l "sqlmodel\|pydantic" 2>/dev/null

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 99


🏁 Script executed:

# Check if there's any model validation configuration or model_config in the request.py file
cat -n backend/app/models/llm/request.py | head -100

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 3402


🏁 Script executed:

# Check if there are any validation errors or issues reported in tests related to AudioContent
rg -n "AudioContent" --type=py backend/app/tests/ -C2 | grep -A5 -B5 "error\|exception\|validate"

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1053


🏁 Script executed:

# Check the pyproject.toml to see Pydantic/SQLModel versions
cat backend/pyproject.toml | grep -A5 -B5 "pydantic\|sqlmodel"

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 699


🏁 Script executed:

# Test if AudioContent can actually be instantiated with None for mime_type
# and check if there are any validation issues in the codebase
rg -n "AudioContent\(" --type=py | grep -v "mime_type=" | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 229


🏁 Script executed:

# Check for any Pydantic ConfigDict or model_config settings that might affect validation
rg -n "model_config\|ConfigDict" --type=py backend/app/models/llm/

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57


🏁 Script executed:

# Let's check if there are any instances where AudioContent is created without mime_type explicitly
rg -n "AudioContent\(" --type=py -C1

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1109


🏁 Script executed:

# Check if there's any instance where mime_type might be set to None after instantiation
rg -n "mime_type\s*=" --type=py -C2 | grep -v test | head -40

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2447


Fix type annotation for mime_type field to match actual usage pattern.

mime_type: str = Field(None, ...) declares a non-nullable str type but defaults to None. Pydantic v2 will accept None at instantiation, but the type annotation is misleading. Downstream code at utils.py line 466 explicitly handles the None case with query_input.content.mime_type or "audio/wav", confirming None is an expected value.

Change to: mime_type: str | None = Field(None, ...)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/llm/request.py` around lines 69 - 76, The AudioContent
model's mime_type is annotated as non-nullable str but defaults to None; update
the type annotation for the mime_type field on the AudioContent class to allow
None (e.g., make mime_type of type str | None) so it matches the default and the
downstream usage (see places referencing query_input.content.mime_type or
"audio/wav") and adjust the Field(...) declaration to keep the same default None
and description.

if director_notes:
config_kwargs["system_instruction"] = director_notes

config = types.GenerateContentConfig(**config_kwargs)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

types is undefined — this will crash at runtime with NameError.

Line 222 references types.GenerateContentConfig, but types is never imported. GenerateContentConfig is already imported directly on line 7. This is a guaranteed runtime crash for every TTS request.

Proposed fix
-        config = types.GenerateContentConfig(**config_kwargs)
+        config = GenerateContentConfig(**config_kwargs)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
config = types.GenerateContentConfig(**config_kwargs)
config = GenerateContentConfig(**config_kwargs)
🧰 Tools
🪛 Ruff (0.15.0)

[error] 222-222: Undefined name types

(F821)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` at line 222, The code uses an
undefined symbol types.GenerateContentConfig causing a NameError; replace the
incorrect reference with the already-imported GenerateContentConfig by changing
the line that sets config (currently using types.GenerateContentConfig) to use
GenerateContentConfig(**config_kwargs), ensuring the GenerateContentConfig
symbol (imported at top) is used instead of the nonexistent types module.

Comment on lines +239 to +275
# Post-process audio format conversion if needed
# Gemini TTS natively outputs 24kHz 16-bit PCM (WAV format)
actual_format = "wav" # Native Gemini TTS output format
encoded_content = base64.b64encode(raw_audio_bytes or b"").decode("ascii")

if response_format and response_format != "wav":
# Need to convert from WAV to requested format
logger.info(
f"[GoogleAIProvider._execute_tts] Converting audio from WAV to {response_format}"
)

if response_format == "mp3":
converted_bytes, convert_error = convert_pcm_to_mp3(raw_audio_bytes)
if convert_error:
return None, f"Failed to convert audio to MP3: {convert_error}"
encoded_content = base64.b64encode(converted_bytes or b"").decode(
"ascii"
)
actual_format = "mp3"

elif response_format == "ogg":
converted_bytes, convert_error = convert_pcm_to_ogg(raw_audio_bytes)
if convert_error:
return None, f"Failed to convert audio to OGG: {convert_error}"
encoded_content = base64.b64encode(converted_bytes or b"").decode(
"ascii"
)
actual_format = "ogg"
else:
logger.warning(
f"[GoogleAIProvider._execute_tts] Unsupported response_format '{response_format}', returning native WAV"
)
response_format = "wav"
logger.info(
f"[GoogleAIProvider._execute_tts] Audio conversion successful: {actual_format.upper()} ({len(raw_audio_bytes)} bytes)"
)
response_mime_type = f"audio/{response_format}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

"Audio conversion successful" log is misleading for the unsupported-format fallback path.

When response_format is unsupported (e.g., "flac"), line 268-271 logs a warning and resets to "wav", but the code then falls through to line 272-274 and logs "Audio conversion successful" even though no conversion occurred. Move the success log inside the mp3/ogg branches, or guard it.

Also, response_mime_type on line 275 now correctly uses response_format (which was reset to "wav" in the fallback), so the mime type issue from the prior review is resolved.

Proposed fix
             elif response_format == "ogg":
                 converted_bytes, convert_error = convert_pcm_to_ogg(raw_audio_bytes)
                 if convert_error:
                     return None, f"Failed to convert audio to OGG: {convert_error}"
                 encoded_content = base64.b64encode(converted_bytes or b"").decode(
                     "ascii"
                 )
                 actual_format = "ogg"
             else:
                 logger.warning(
                     f"[GoogleAIProvider._execute_tts] Unsupported response_format '{response_format}', returning native WAV"
                 )
                 response_format = "wav"
-            logger.info(
-                f"[GoogleAIProvider._execute_tts] Audio conversion successful: {actual_format.upper()} ({len(raw_audio_bytes)} bytes)"
-            )
+
+            if actual_format != "wav":
+                logger.info(
+                    f"[GoogleAIProvider._execute_tts] Audio conversion successful: {actual_format.upper()} ({len(raw_audio_bytes)} bytes)"
+                )
         response_mime_type = f"audio/{response_format}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/providers/gai.py` around lines 239 - 275, The "Audio
conversion successful" log is emitted even when no conversion occurs; inside
GoogleAIProvider._execute_tts, move or duplicate the success log so it only runs
after a real conversion, i.e., place the logger.info call into the
response_format == "mp3" branch after convert_pcm_to_mp3 succeeds and into the
response_format == "ogg" branch after convert_pcm_to_ogg succeeds (use
convert_pcm_to_mp3, convert_pcm_to_ogg, actual_format and converted_bytes to
build the message), and remove or guard the existing unconditional success log
so unsupported-format fallback (where response_format is reset to "wav") does
not log success.

Comment on lines +426 to +444
def resolve_audio_base64(data: str, mime_type: str) -> tuple[str, str | None]:
"""Decode base64 audio and write to temp file. Returns (file_path, error)."""
try:
audio_bytes = base64.b64decode(data)
except Exception as e:
return "", f"Invalid base64 audio data: {str(e)}"

ext = get_file_extension(mime_type)
try:
with tempfile.NamedTemporaryFile(
suffix=ext, delete=False, prefix="audio_"
) as tmp:
tmp.write(audio_bytes)
temp_path = tmp.name

logger.info(f"[resolve_audio_base64] Wrote audio to temp file: {temp_path}")
return temp_path, None
except Exception as e:
return "", f"Failed to write audio to temp file: {str(e)}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Temp files are never cleaned up on the error path inside resolve_audio_base64.

If base64.b64decode succeeds but the subsequent NamedTemporaryFile write fails partway through, the temp file (created with delete=False) may be left behind. The current except on line 443 returns an error string, but any partially-created file is leaked. This is a minor concern since NamedTemporaryFile creation and a single write are unlikely to fail partially, but worth noting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/utils.py` around lines 426 - 444, The temp-file leak occurs in
resolve_audio_base64 when base64 decoding succeeds but writing to
NamedTemporaryFile fails; fix by tracking temp_path (initialize to None before
the try), and in the outer except for the write error check if temp_path is set
and the file exists then remove it (os.remove) before returning the error;
ensure resolve_audio_base64 imports os and sets temp_path only after tmp.name is
assigned so cleanup is safe, or alternatively use a try/finally around the
NamedTemporaryFile write to always unlink on error while still returning the
error string.

Comment on lines 447 to 474
def resolve_input(query_input) -> tuple[str, str | None]:
"""Resolve discriminated union input to content string.

Args:
query_input: The input from QueryParams (TextInput or AudioInput)

Returns:
(content_string, None) on success - for text returns content value, for audio returns temp file path
("", error_message) on failure
"""
# Import here to avoid circular imports
from app.models.llm.request import TextInput, AudioInput

try:
if isinstance(query_input, TextInput):
return query_input.content.value, None

elif isinstance(query_input, AudioInput):
# AudioInput content is base64-encoded audio
mime_type = query_input.content.mime_type or "audio/wav"
return resolve_audio_base64(query_input.content.value, mime_type)

else:
return "", f"Unknown input type: {type(query_input)}"

except Exception as e:
logger.error(f"[resolve_input] Failed to resolve input: {e}", exc_info=True)
return "", f"Failed to resolve input: {str(e)}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Missing type hint on query_input parameter.

Per coding guidelines, all function parameters must have type hints. Since the local import already brings in TextInput and AudioInput, the annotation should use QueryInput (or the union type) from the request module. You can use a string annotation or TYPE_CHECKING to avoid circular imports at runtime.

Proposed fix
+from __future__ import annotations
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from app.models.llm.request import QueryInput
+
 ...
 
-def resolve_input(query_input) -> tuple[str, str | None]:
+def resolve_input(query_input: QueryInput) -> tuple[str, str | None]:

Alternatively, if from __future__ import annotations would be too broad a change for this file, use a string literal:

-def resolve_input(query_input) -> tuple[str, str | None]:
+def resolve_input(query_input: "TextInput | AudioInput") -> tuple[str, str | None]:

As per coding guidelines: "Always add type hints to all function parameters and return values in Python code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/utils.py` around lines 447 - 474, The function resolve_input is
missing a type annotation for its query_input parameter; update its signature to
type-hint query_input as the request module's QueryInput (or appropriate union
of TextInput|AudioInput) using a forward/string annotation or by importing
QueryInput under a TYPE_CHECKING guard to avoid circular imports, and keep the
existing return type tuple[str, str | None]; ensure references in the body
(TextInput, AudioInput, resolve_audio_base64) remain unchanged.

job = session.get(Job, job_id)
if not job:
# Log all jobs to see what's in the database
from sqlmodel import select
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Prajna1999 this comment was not addressed

return load_description(Path(filename))


# Input resolver functions moved from app.services.llm.input_resolver
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my opinion, the common utils file is supposed to have generic functions and all the audio related functions here seem to domain specific, but if you and other people think that this is fine then i guess its okay for now

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/services/llm/jobs.py (1)

446-453: ⚠️ Potential issue | 🟠 Major

error=request.query.input passes a Pydantic model where str is expected.

APIResponse.failure_response has error: str | list. Passing the QueryInput Pydantic object will either cause a Pydantic v2 coercion to its str() representation (producing a useless internal model dump as the error message) or a validation error. The analogous block in the input-guardrails path (dead code at line 189) correctly passes safe_input["data"]["safe_text"].

🐛 Proposed fix
-                        callback_response = APIResponse.failure_response(
-                            error=request.query.input,
-                            metadata=request.request_metadata,
-                        )
+                        callback_response = APIResponse.failure_response(
+                            error=safe_output["data"]["safe_text"],
+                            metadata=request.request_metadata,
+                        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/jobs.py` around lines 446 - 453, The failure branch
uses APIResponse.failure_response(error=request.query.input) which passes a
Pydantic QueryInput object where a string is expected; change it to pass the
safe text string (like safe_input["data"]["safe_text"] used in the
input-guardrails path) so error is a plain str or list of str, then call
handle_job_error(job_id, request.callback_url, callback_response) as before;
update the block that checks safe_output["data"]["rephrase_needed"] to build
callback_response with error set to the safe text rather than
request.query.input.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/services/llm/jobs.py`:
- Around line 169-204: Remove the unreachable pre-session guardrails block that
begins with the "if input_guardrails:" check (the code that calls
call_guardrails(...) and manipulates safe_input/request.query.input) because
input_guardrails is only populated later in the session; instead rely on the
existing working guardrails implementation already present later (the
session-based block that uses list_validators_config and call_guardrails).
Delete the entire pre-session block to eliminate dead code and the undefined
symbol input_query and ensure only the session-based guardrail handling is
executed.
- Around line 411-412: The isinstance check is using the imported function name
resolve_input instead of the resolved value; replace the check to test the
resolved_input variable (i.e., use isinstance(resolved_input, AudioInput)) and
call cleanup_temp_file(resolved_input) so audio temp files are actually cleaned
up. Locate the block around the resolve_input usage in jobs.py where
resolved_input is set and update the conditional to reference resolved_input and
pass that object to cleanup_temp_file.
- Around line 22-24: Remove the stray closing parenthesis and the stale import:
delete the lone ")" on line 24 and replace the import of call_guardrails with
the correct active guardrails symbol; remove or change the import statement
referencing call_guardrails and ensure the file only imports
run_guardrails_validation (the function actually used in the active guardrails
path) so that the dead pre-session reference to call_guardrails is eliminated
and the module parses cleanly.
- Around line 466-475: Remove the orphaned redundant else block that assigns
response.response.output.content.value = safe_output["error"] and constructs
callback_response using APIResponse.failure_response, then calls
handle_job_error; this block references undefined safe_output and duplicates
logic already handled in the inner else, so delete that entire orphaned else
section (the duplicated assignment and return calling handle_job_error) to
restore valid Python syntax and avoid NameError while keeping the existing error
handling in the inner else branch intact.

---

Outside diff comments:
In `@backend/app/services/llm/jobs.py`:
- Around line 446-453: The failure branch uses
APIResponse.failure_response(error=request.query.input) which passes a Pydantic
QueryInput object where a string is expected; change it to pass the safe text
string (like safe_input["data"]["safe_text"] used in the input-guardrails path)
so error is a plain str or list of str, then call handle_job_error(job_id,
request.callback_url, callback_response) as before; update the block that checks
safe_output["data"]["rephrase_needed"] to build callback_response with error set
to the safe text rather than request.query.input.

---

Duplicate comments:
In `@backend/app/models/llm/request.py`:
- Around line 4-8: Remove the duplicate imports of Field and SQLModel by
consolidating into a single import statement that includes all sqlmodel symbols
used (Field, SQLModel, Index, text) and keep the other imports (Discriminator,
model_validator, HttpUrl, datetime, JSONB) separate; update the import block so
only one "from sqlmodel import ..." line exists to eliminate the F811 duplicate
import for Field/SQLModel.
- Around line 73-76: The mime_type field's type annotation disagrees with its
None default — update the annotation for mime_type back to Optional[str] (and
import Optional from typing if missing) so the Field(None, ...) default is
valid; ensure downstream usage like query_input.content.mime_type or "audio/wav"
continues to work with a nullable mime_type.

In `@backend/app/services/llm/jobs.py`:
- Around line 209-221: The code contains leftover debug scaffolding after the
session.get(Job, job_id) lookup: remove the inline import of select, the
recent-jobs query (all_jobs), and the verbose "Recent jobs in DB" logger entry;
instead, keep a concise error log in the execute_job flow (the logger.error
following the session.get(Job, job_id) check) that only reports job_id and a
clear message that the job was not found, without querying or printing other DB
rows; ensure any removal preserves control flow around
logger.info(f"[execute_job] Attempting to fetch job | job_id={job_id}") and the
subsequent handling of the missing job.

Comment on lines 22 to 24
)
from app.services.llm.guardrails import call_guardrails
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Stray ) on line 24 causes a parse error — CI is failing.

The from app.models.llm.request import (...) block closes correctly on line 22. Line 23 adds a standalone call_guardrails import, then line 24 has an orphaned ) that makes the file unparseable (confirmed by the CI failure: Cannot parse file: 24:0: )).

Additionally, call_guardrails (line 23) is stale: the dead pre-session block references it (line 175), while the active guardrails path uses run_guardrails_validation (already imported via lines 26–29).

🐛 Proposed fix
     TextInput,
     AudioInput,
 )
-from app.services.llm.guardrails import call_guardrails
-)
 from app.models.llm.response import TextOutput
🧰 Tools
🪛 GitHub Actions: Kaapi CI

[error] 24-24: Black formatting failed. Cannot parse file: 24:0: ).

🪛 Ruff (0.15.1)

[warning] 24-24: Expected a statement

(invalid-syntax)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/jobs.py` around lines 22 - 24, Remove the stray
closing parenthesis and the stale import: delete the lone ")" on line 24 and
replace the import of call_guardrails with the correct active guardrails symbol;
remove or change the import statement referencing call_guardrails and ensure the
file only imports run_guardrails_validation (the function actually used in the
active guardrails path) so that the dead pre-session reference to
call_guardrails is eliminated and the module parses cleanly.

Comment on lines 169 to 204
if input_guardrails:
if not isinstance(request.query.input, TextInput):
logger.warning(
"[execute_job] Input guardrails skipped for non-text input type"
)
else:
safe_input = call_guardrails(input_query, input_guardrails, job_id)

logger.info(
f"[execute_job] Input guardrail validation | success={safe_input['success']}."
)

if safe_input.get("bypassed"):
logger.info("[execute_job] Guardrails bypassed (service unavailable)")

elif safe_input["success"]:
# Update the text value within the QueryInput structure
request.query.input.content.value = safe_input["data"]["safe_text"]

if safe_input["data"]["rephrase_needed"]:
callback_response = APIResponse.failure_response(
error=safe_input["data"]["safe_text"],
metadata=request.request_metadata,
)
return handle_job_error(
job_id, request.callback_url, callback_response
)
else:
# Update the text value with error message
request.query.input.content.value = safe_input["error"]

callback_response = APIResponse.failure_response(
error=safe_input["error"],
metadata=request.request_metadata,
)
return handle_job_error(job_id, request.callback_url, callback_response)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Dead pre-session guardrails block — unreachable and references undefined input_query.

At line 169, input_guardrails is still [] (initialized at line 160); the actual population via list_validators_config happens at lines 253–259 inside the session. The if input_guardrails: check is therefore always False and this entire block is unreachable.

Beyond being dead code, it has two additional defects that would surface if it ever ran:

  • Line 175: call_guardrails(input_query, ...)input_query is never defined in this scope (NameError).
  • Lines 177–178: safe_input['success'] is accessed outside the TextInput branch, so safe_input would be undefined for non-text inputs.

The actual working implementation is at lines 261–300. Remove lines 169–204 entirely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/jobs.py` around lines 169 - 204, Remove the
unreachable pre-session guardrails block that begins with the "if
input_guardrails:" check (the code that calls call_guardrails(...) and
manipulates safe_input/request.query.input) because input_guardrails is only
populated later in the session; instead rely on the existing working guardrails
implementation already present later (the session-based block that uses
list_validators_config and call_guardrails). Delete the entire pre-session block
to eliminate dead code and the undefined symbol input_query and ensure only the
session-based guardrail handling is executed.

Comment on lines 411 to 412
if resolved_input and isinstance(resolve_input, AudioInput):
cleanup_temp_file(resolved_input)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Temp-file cleanup never executes — resolve_input is the imported function, not the input object.

resolve_input on line 411 refers to the resolve_input function imported at line 32, not the resolved string resolved_input or the request's input object. isinstance(<function>, AudioInput) is always False, so cleanup_temp_file is never called and every audio request leaks a temp file. The past review fix targeted this exact line but the wrong variable was used.

🐛 Proposed fix
-            if resolved_input and isinstance(resolve_input, AudioInput):
+            if resolved_input and isinstance(request.query.input, AudioInput):
                 cleanup_temp_file(resolved_input)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/jobs.py` around lines 411 - 412, The isinstance
check is using the imported function name resolve_input instead of the resolved
value; replace the check to test the resolved_input variable (i.e., use
isinstance(resolved_input, AudioInput)) and call
cleanup_temp_file(resolved_input) so audio temp files are actually cleaned up.
Locate the block around the resolve_input usage in jobs.py where resolved_input
is set and update the conditional to reference resolved_input and pass that
object to cleanup_temp_file.

Comment on lines 466 to 475
else:
response.response.output.content.value = safe_output["error"]

callback_response = APIResponse.failure_response(
error=safe_output["error"],
metadata=request.request_metadata,
)
return handle_job_error(
job_id, request.callback_url, callback_response
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Orphaned else: block is invalid Python syntax and references undefined safe_output.

Line 466 introduces a second else: at the same indentation (8 spaces) as line 421's else:, which already closes the if not isinstance(response.response.output, TextOutput): conditional. Python allows only one else per conditional, making this a syntax error confirmed by Ruff (lines 466–467: "Expected a statement").

Additionally, if this block somehow executed (e.g., in the not isinstance branch), safe_output (set at line 423 inside the else at line 421) would be undefined, causing a NameError. This block is a copy-paste artifact of the inner else: at line 455 and should be removed entirely.

🐛 Proposed fix — remove the orphaned block
                     else:
                         response.response.output.content.value = safe_output["error"]

                         callback_response = APIResponse.failure_response(
                             error=safe_output["error"],
                             metadata=request.request_metadata,
                         )
                         return handle_job_error(
                             job_id, request.callback_url, callback_response
                         )

-            else:
-                response.response.output.content.value = safe_output["error"]
-
-                callback_response = APIResponse.failure_response(
-                    error=safe_output["error"],
-                    metadata=request.request_metadata,
-                )
-                return handle_job_error(
-                    job_id, request.callback_url, callback_response
-                )

         callback_response = APIResponse.success_response(
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
else:
response.response.output.content.value = safe_output["error"]
callback_response = APIResponse.failure_response(
error=safe_output["error"],
metadata=request.request_metadata,
)
return handle_job_error(
job_id, request.callback_url, callback_response
)
🧰 Tools
🪛 Ruff (0.15.1)

[warning] 466-466: Expected a statement

(invalid-syntax)


[warning] 466-466: Expected a statement

(invalid-syntax)


[warning] 466-467: Expected a statement

(invalid-syntax)


[warning] 467-467: Unexpected indentation

(invalid-syntax)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/llm/jobs.py` around lines 466 - 475, Remove the orphaned
redundant else block that assigns response.response.output.content.value =
safe_output["error"] and constructs callback_response using
APIResponse.failure_response, then calls handle_job_error; this block references
undefined safe_output and duplicates logic already handled in the inner else, so
delete that entire orphaned else section (the duplicated assignment and return
calling handle_job_error) to restore valid Python syntax and avoid NameError
while keeping the existing error handling in the inner else branch intact.

@Prajna1999 Prajna1999 moved this to In Progress in Kaapi-dev Feb 19, 2026
@Prajna1999 Prajna1999 moved this from In Progress to In Review in Kaapi-dev Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

4 participants