Conversation
feat: v0 tab completion
|
Hi @wjiayis, thanks for the update. I’ve created a new issue to address the token expiration problem. Regarding the latency issue, do we have visibility into which part of the pipeline contributes most to the high latency? For example, a rough breakdown across:
I’ll take a look at this PR later this evening as well. |
|
@Junyi-99 I haven't gotten to the latency breakdown yet, but I've settled everything else and I'm gonna work on this next. Thanks for helping to review when convenient, I'll update my findings when I have them too! |
|
@wjiayis Got it, thanks for the update. Looking forward to your findings. |
Root CauseThere's a ~20s latency in the inline-suggestion loop, and >99% of the latency comes from waiting for LLM to start responding. This issues arises because I'm passing in a large (but realistic) bibliography (the bibliography of SolutionI think it's reasonable to expect that a regular user's max latency tolerance is ~2s. I'll implement the following 3 solutions to achieve that. Model Selection
Prompt CachingSince bibliography remains generally constant and takes up the bulk of the prompt, I'll use OpenAI's prompt caching - advertised to reduce latency by up to 80% and input token costs by up to 90%.
Prompt RefinementI'll remove info-sparse fields (eg cc: @Junyi-99 |
|
With regards to latency, just changing the model fixed the bulk of the issue. I still implemented prompt caching and prompt refinement. I realize that when using OpenRouter, the first few (more than one) auto-completion would have 6-7s latency, and subsequent auto-completions would have 1-2s latency, as a result of prompt caching. I feel that this is good enough UX, and a first "no-reply" LLM query is probably not necessary, as it complicates the solution. Nevertheless feel free to let me know if you prefer it being implemented. |
### 1. Local Storage Not Updated Access tokens and refresh tokens are only saved to local storage on login, subsequent token refreshes do not update local storage. <br><b>Example:</b> User refreshes to a new token and exits Overleaf, the next time he re-opens PaperDebugger, it uses the old refresh token. <br><b>Proposed solution:</b> Update authStore whenever tokens are set. <br> ### 2. Race Conditions When Refreshing PaperDebugger often calls multiple endpoints at the same time, which results in a race condition if the token needs to be refreshed. <br><b>Example:</b> `v2/chats/models` and `v2/chats/conversations` are called at the same time, and the access token needs refreshing, the refresh endpoint is called twice. In some occasions, the frontend uses the 2nd refresh token received which differs from the one stored in the backend. This can be easily reproduced by setting the JWT expiration in the backend to a very short time. <br><b>Proposed solution:</b> Use a promise for `refresh()`. <br> <br> Unsure if this fixes the exact problem in #110
There was a problem hiding this comment.
Pull request overview
Implements a first version of citation-aware tab completion by detecting \cite{ in the editor, fetching suggested BibTeX keys from a new backend endpoint, and suppressing Overleaf’s default autocomplete/tab behavior to allow accepting inline suggestions.
Changes:
- Frontend: Adds a “citation suggestions” beta setting, detects
\cite{triggers, fetches citation keys, and intercepts Tab to accept inline suggestions. - Backend: Adds
GetCitationKeysRPC/HTTP endpoint, extracts and token-reduces.bibcontent, and queries an LLM for up to ~3 relevant citation keys. - Plumbing: Updates generated proto clients/servers and ignores
CLAUDE.md.
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| webapp/_webapp/src/views/settings/sections/beta-feature-settings.tsx | Renames the completion toggle UI to “citation suggestions”. |
| webapp/_webapp/src/views/settings/index.tsx | Renders the beta features section in Settings. |
| webapp/_webapp/src/query/api.ts | Adds getCitationKeys() API wrapper for the new v2 endpoint. |
| webapp/_webapp/src/pkg/gen/apiclient/chat/v2/chat_pb.ts | Generated TS client types/service updated for GetCitationKeys. |
| webapp/_webapp/src/libs/inline-suggestion.ts | Adds \cite{ trigger handling and suppresses Overleaf autocomplete/tab behavior when suggestions are active. |
| proto/chat/v2/chat.proto | Adds GetCitationKeys RPC + request/response messages. |
| pkg/gen/api/chat/v2/chat_grpc.pb.go | Generated gRPC server/client updated with GetCitationKeys. |
| pkg/gen/api/chat/v2/chat.pb.gw.go | Generated grpc-gateway bindings for /_pd/api/v2/chats/citation-keys. |
| pkg/gen/api/chat/v2/chat.pb.go | Generated Go proto types updated with new messages and RPC. |
| internal/services/toolkit/client/get_citation_keys.go | Implements bibliography extraction and LLM prompt to suggest citation keys. |
| internal/api/chat/get_citation_keys.go | Implements the ChatServerV2 handler for GetCitationKeys. |
| .gitignore | Ignores CLAUDE.md. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // A comma-separated string of keys, or empty if none found | ||
| string citation_keys = 1; |
There was a problem hiding this comment.
Returning citation keys as a comma-separated string makes the API harder to use correctly (parsing, escaping, empty handling) and is inconsistent with other protos that use repeated string for lists. Consider changing citation_keys to repeated string citation_keys = 1; and let clients join with commas as needed.
| // A comma-separated string of keys, or empty if none found | |
| string citation_keys = 1; | |
| // A list of suggested citation keys, empty if none found | |
| repeated string citation_keys = 1; |
| const textBefore = state.doc.sliceString(0, cursorPos - triggerText.length); | ||
| const lastSentence = textBefore | ||
| .split(/(?<=[.!?])\s+/) | ||
| .filter((s) => s.trim().length > 0) // filter out empty sentences | ||
| .slice(-1)[0]; |
There was a problem hiding this comment.
If there is no sentence-ending punctuation before the trigger, lastSentence becomes the entire prefix of the document (potentially very large), which can increase latency/cost and exceed token limits. Consider capping the context (e.g. last N chars/words) and/or using a more robust sentence extraction that falls back to a bounded window.
| @@ -0,0 +1,142 @@ | |||
| package client | |||
|
|
|||
| // TODO: This file should not place in the client package. | |||
There was a problem hiding this comment.
This new file includes a TODO noting it should not live in the client package. Leaving known-misplaced code in-tree makes future refactors harder (imports, package boundaries). Please move this logic into an appropriate package (e.g. a citation/bibliography helper or toolkit subpackage) before merging, and keep client focused on LLM client orchestration.
| // TODO: This file should not place in the client package. |
There was a problem hiding this comment.
@Junyi-99 I'm planning to eventually move this file to wherever internal/services/toolkit/client/get_conversation_title_v2.go will be moved to, since both files contain high-order business logic that doesn't belong to the client folder. Shall I leave this file here for now?
There was a problem hiding this comment.
i got copilot to review the PR. I didnt really look through its suggestions. Could you resolve copilot's comments and integrate any valid / viable suggestions? I think jun yi is busy these few days, i'll review later on this week if its still pending review! Thanks for your hardwork jiayi!
@Junyi-99 actually do we have a recommended PR to main workflow? Should we review only review PRs to main and allow freely merging to staging / dev branch for testing in a simulated prod env? Or should there be a preliminary review before merging to staging too?
| func (a *AIClientV2) GetCitationKeys(ctx context.Context, sentence string, userId bson.ObjectID, projectId string, llmProvider *models.LLMProviderConfig) (string, error) { | ||
| bibliography, err := a.GetBibliographyForCitation(ctx, userId, projectId) | ||
|
|
There was a problem hiding this comment.
New citation-key functionality is added here but there are existing tests for other toolkit client behaviors (e.g. conversation title). Please add unit tests for GetBibliographyForCitation/GetCitationKeys (at minimum: field-exclusion behavior and prompt formatting / empty-citation handling) to prevent regressions.
| export const getCitationKeys = async (data: PlainMessage<GetCitationKeysRequest>) => { | ||
| const response = await apiclientV2.post(`/chats/citation-keys`, data); | ||
| return fromJson(GetCitationKeysResponseSchema, response); | ||
| } |
There was a problem hiding this comment.
This API is invoked as the user types; failures (401, network, etc.) should not spam global error toasts. Pass { ignoreErrorToast: true } to apiclientV2.post here and let the caller decide how to surface errors (the completion code already logs and returns ""). Also, the file consistently terminates exported const assignments with ;—add it here to match lint/style.
| citationKeys, err := s.aiClientV2.GetCitationKeys( | ||
| ctx, | ||
| req.GetSentence(), | ||
| actor.ID, | ||
| req.GetProjectId(), |
There was a problem hiding this comment.
GetCitationKeysRequest includes an optional model_slug, but this handler doesn’t read req.GetModelSlug() (and the downstream call currently hardcodes a model). Either plumb the requested model through to the client call, or remove the field from the proto to avoid a misleading API surface.
| citationKeys, err := s.aiClientV2.GetCitationKeys( | ||
| ctx, | ||
| req.GetSentence(), | ||
| actor.ID, | ||
| req.GetProjectId(), |
There was a problem hiding this comment.
Please validate required inputs (at least sentence and project_id) and return a clear bad-request error when missing, consistent with other chat handlers (e.g. shared.ErrBadRequest("title is required")). Otherwise empty values will fail deeper in services with less actionable errors.
| if (styleEl) return; | ||
| styleEl = document.createElement("style"); | ||
| styleEl.textContent = `.cm-tooltip { display: none !important; }`; | ||
| document.head.appendChild(styleEl); | ||
| } |
There was a problem hiding this comment.
The injected CSS hides all .cm-tooltip elements globally while active, which can suppress unrelated tooltips (lint, hover help, etc.) elsewhere on the page and across multiple editors. Scope the selector to the specific editor instance/container and/or a more specific autocomplete tooltip class so only Overleaf’s dropdown is suppressed.
#33
In short, it'll
\cite{).bibfiles as raw text, and remove irrelevant fields to save tokensgpt-5.2for now) to get at most 3 citation keysOverall latency is 1-2s (tested using OpenAI API).
There are some outstanding issues