Conversation
98f26ba to
d1e4a65
Compare
There was a problem hiding this comment.
Pull request overview
This PR migrates session tracking from the X-Session-ID header to an x-session-id cookie, centralizing request parsing and adding utilities to resolve/create users & sessions based on that cookie across API endpoints and services.
Changes:
- Introduce
extract_session_cookie()/extract_origin_from_request()helpers and apply them across routers + middleware. - Add
resolve_user_and_session()utility and a new/user_and_sessionendpoint that sets the session cookie. - Update data-collection plumbing, tests, and environment/configuration to support cookie-based session IDs and an
ENVsetting.
Reviewed changes
Copilot reviewed 21 out of 24 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| src/app/user/utils/utils.py | Adds async helper to resolve/create user+session from a session UUID. |
| src/app/user/api/router.py | Adds cookie-setting endpoint and refactors bookmark routes to use cookie-derived identity. |
| src/app/tutor/api/router.py | Switches session ID sourcing from header to cookie. |
| src/app/search/api/router.py | Switches session ID sourcing from header to cookie. |
| src/app/api/api_v1/endpoints/chat.py | Switches session ID sourcing from header to cookie for data-collection. |
| src/app/api/api_v1/endpoints/metric.py | Switches session ID sourcing from header to cookie for syllabus download tracking. |
| src/app/middleware/monitor_requests.py | Switches request monitoring session ID from header to cookie. |
| src/app/shared/utils/requests.py | New shared request helpers for cookie + origin extraction. |
| src/app/shared/domain/constants.py | Defines cookie name and TTL constant for session cookie. |
| src/app/services/sql_db/queries_user.py | Adjusts session lookup signature and adds new exception/logging behavior. |
| src/app/services/data_collection.py | Changes session_id typing to UUID and removes string-to-UUID conversion. |
| src/app/api/api_v1/api.py | Updates user router import wiring. |
| src/app/core/config.py | Adds required ENV setting for environment detection. |
| src/app/tests/test_utils.py | Adds unit tests for resolve_user_and_session. |
| src/app/tests/api/api_v1/test_user.py | Updates user API tests to use cookies and new bookmark routes. |
| src/app/tests/api/api_v1/test_search.py | Updates search API tests to send session via Cookie header. |
| pytest.ini | Sets ENV=test for test runs. |
| k8s/welearn-api/values.dev.yaml | Provides ENV config for dev deployment. |
| k8s/welearn-api/values.staging.yaml | Provides ENV config for staging deployment. |
| k8s/welearn-api/values.prod.yaml | Provides ENV config for production deployment. |
| .env.example | Documents ENV for local development. |
Comments suppressed due to low confidence (2)
src/app/user/api/router.py:135
- The route path contains a literal ":document_id" segment, but
document_idis not declared as a path parameter (FastAPI uses{document_id}), so it will be treated as a query param and the URL will literally include:document_id. Either change the route to use a proper path param (/bookmarks/{document_id}) or remove:document_idfrom the path and keepdocument_idas a query parameter.
src/app/user/api/router.py:162 - Same issue as above: the route path includes a literal
:document_idsegment. This is easy to confuse with a real path parameter and will likely break clients expecting/bookmarks/<uuid>. Consider switching to/bookmarks/{document_id}(path param) or/bookmarkswithdocument_idonly as a query parameter.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if user: | ||
| return user.id | ||
| else: | ||
| logger.error(f"User with id {user_id} not found,") |
There was a problem hiding this comment.
The error log message has a trailing comma ("not found,"). If this is meant to be a full sentence, remove the trailing comma to keep logs clean and consistent.
| logger.error(f"User with id {user_id} not found,") | |
| logger.error(f"User with id {user_id} not found") |
| response = client.post( | ||
| f"{settings.API_V1_STR}/user/:user_id/bookmarks/:document_id", | ||
| params={"user_id": user_id, "document_id": document_id}, | ||
| f"{settings.API_V1_STR}/user/bookmarks/:document_id", | ||
| params={"document_id": document_id}, | ||
| headers={"X-API-Key": "test"}, | ||
| cookies={"x-session-id": str(session_id)}, | ||
| ) |
There was a problem hiding this comment.
This test calls /user/bookmarks/:document_id (literal :document_id) and passes document_id via query params. If the intent is to have a path parameter, the route should be /user/bookmarks/{document_id} and the test should call /user/bookmarks/<uuid>. If the intent is a query parameter, remove :document_id from the URL to avoid confusion.
| with session_maker() as s: | ||
| session = s.execute(select(Session).where(Session.id == session_id)).first() | ||
| if session: | ||
| logger.info( | ||
| "Valid session. user_id=%s session_id=%s", | ||
| session[0].inferred_user_id, | ||
| session_id, | ||
| ) | ||
| return session[0].inferred_user_id |
There was a problem hiding this comment.
get_user_from_session_id currently treats any session row as valid and does not check Session.end_at > now (unlike get_or_create_session_sync). This can make expired sessions resolve to a user indefinitely, especially problematic now that cookies can persist long-term. Consider filtering by end_at in the query (and/or deleting/invalidating expired sessions).
|
|
||
|
|
||
| SESSION_COOKIE_NAME = "x-session-id" | ||
| SESSION_TTL_SECONDS = 60 * 60 * 24 * 400 # 400 days |
There was a problem hiding this comment.
SESSION_TTL_SECONDS is set to ~400 days, but get_or_create_session_sync currently creates DB sessions with end_at = now + 24h. This mismatch will cause long-lived cookies to reference expired DB sessions (and can lead to frequent session re-creation / unexpected identity behavior). Consider aligning cookie TTL with DB session validity (or renaming this to a long-lived user-tracking cookie and adjusting the DB model accordingly).
| SESSION_TTL_SECONDS = 60 * 60 * 24 * 400 # 400 days | |
| SESSION_TTL_SECONDS = 60 * 60 * 24 # 1 day |
| else: | ||
| logger.warning( | ||
| f"No X-Session-ID header provided for {request.url.path}" | ||
| f"No X-Session-ID cookie provided for {request.url.path}" |
There was a problem hiding this comment.
The warning message still refers to "X-Session-ID" even though the code now reads the session from a cookie (x-session-id). Consider updating the log message to reference the actual cookie name (ideally using SESSION_COOKIE_NAME) to avoid confusion during ops/debugging.
| f"No X-Session-ID cookie provided for {request.url.path}" | |
| f"No x-session-id cookie provided for {request.url.path}" |
| AZURE_API_VERSION: str | ||
|
|
||
| LLM_MODEL_NAME: str | ||
| ENV: str |
There was a problem hiding this comment.
ENV is now a required Settings field. Because settings = Settings() is instantiated at import time, missing ENV will prevent the app from starting (ValidationError). If you want backward compatibility / safer local runs, consider giving ENV a default (e.g., "development") and/or validating it as a constrained enum.
| ENV: str | |
| ENV: str = "development" |
| user_id = await run_in_threadpool(get_user_from_session_id, session_uuid) | ||
|
|
||
| if not user_id: | ||
| logger.info("No user found. Creating new user and session.") | ||
| user_id = await run_in_threadpool(get_or_create_user_sync, None, referer) | ||
| session_uuid = await run_in_threadpool( | ||
| get_or_create_session_sync, user_id, None, host, referer | ||
| ) |
There was a problem hiding this comment.
resolve_user_and_session creates a new user+session whenever get_user_from_session_id returns falsy. When used from endpoints like bookmarks, an invalid/unknown cookie will silently create a brand-new user and then read/write bookmarks on that new user, which is likely unintended. Consider splitting this into two flows (e.g., resolve_existing_user_from_session that 401s on missing/invalid session vs. an explicit "create user/session" endpoint).
| else: | ||
| HTTPException( | ||
| status_code=404, detail=f"Session with id {session_id} not found" | ||
| ) |
There was a problem hiding this comment.
get_user_from_session_id instantiates an HTTPException in the else branch but never raises it, so the exception is effectively ignored and the function falls through returning None. Either raise the exception or remove it and just return None (and let callers decide how to handle missing sessions).
| else: | |
| HTTPException( | |
| status_code=404, detail=f"Session with id {session_id} not found" | |
| ) |
| def get_or_create_user_sync( | ||
| user_id: uuid.UUID | None = None, referer: str | None = None | ||
| ) -> uuid.UUID: | ||
| with session_maker() as s: | ||
| if user_id: | ||
| user = s.execute( | ||
| select(InferredUser.id).where(InferredUser.id == user_id) | ||
| ).first() | ||
| if user: | ||
| return user.id | ||
| else: | ||
| logger.error(f"User with id {user_id} not found,") | ||
| raise HTTPException( | ||
| status_code=404, detail=f"User with id {user_id} not found" | ||
| ) |
There was a problem hiding this comment.
get_or_create_user_sync is a low-level DB helper but it now raises fastapi.HTTPException. This couples the SQL layer to the web layer and can also be mishandled by callers that catch broad exceptions (turning a 404 into a 500). Prefer raising a domain/ValueError here and translate to HTTPException in the API layer.
Description
Why?
How?
Screenshots (if appropriate):
Types of changes
Checklist: