Skip to content

feat: v0 tab completion#110

Open
wjiayis wants to merge 39 commits intostagingfrom
feat/tab-completion
Open

feat: v0 tab completion#110
wjiayis wants to merge 39 commits intostagingfrom
feat/tab-completion

Conversation

@wjiayis
Copy link
Member

@wjiayis wjiayis commented Feb 4, 2026

#33

In short, it'll

  1. [Frontend] Recognize that user is trying to add a citation (trigger text is \cite{)
  2. [Frontend] Temporarily suppress default Overleaf dropdown suggestions
  3. [Frontend] Get the last sentence as context for LLM
  4. [Backend] Fetch bibliography in .bib files as raw text, and remove irrelevant fields to save tokens
  5. [Backend] Query a fast LLM (hardcoded to gpt-5.2 for now) to get at most 3 citation keys
  6. [Frontend] Suppress default Overleaf tab-completion to allow users to accept citation suggestions

Overall latency is 1-2s (tested using OpenAI API).

There are some outstanding issues

  1. There will not be citations if user is not logged in -> won't fix
  2. Suggestions are not precise -> in my V1 iteration I'll use XtraMCP to improve bib handling

@wjiayis wjiayis mentioned this pull request Feb 4, 2026
@Junyi-99
Copy link
Member

Junyi-99 commented Feb 5, 2026

Hi @wjiayis, thanks for the update. I’ve created a new issue to address the token expiration problem.

Regarding the latency issue, do we have visibility into which part of the pipeline contributes most to the high latency? For example, a rough breakdown across:

frontend → backend → LLM provider (reasoning + response)

I’ll take a look at this PR later this evening as well.

@Junyi-99 Junyi-99 linked an issue Feb 5, 2026 that may be closed by this pull request
@wjiayis
Copy link
Member Author

wjiayis commented Feb 5, 2026

@Junyi-99 I haven't gotten to the latency breakdown yet, but I've settled everything else and I'm gonna work on this next. Thanks for helping to review when convenient, I'll update my findings when I have them too!

@Junyi-99
Copy link
Member

Junyi-99 commented Feb 5, 2026

@wjiayis Got it, thanks for the update. Looking forward to your findings.

@wjiayis
Copy link
Member Author

wjiayis commented Feb 5, 2026

Root Cause

There's a ~20s latency in the inline-suggestion loop, and >99% of the latency comes from waiting for LLM to start responding. This issues arises because I'm passing in a large (but realistic) bibliography (the bibliography of PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing itself), and gpt-5-nano takes a while to parse it.

Solution

I think it's reasonable to expect that a regular user's max latency tolerance is ~2s. I'll implement the following 3 solutions to achieve that.

Model Selection

gpt-5-nano is takes a long time to process the long bibliography. Just swapping it out for gpt-5.2 brings latency down to 2-4s. But gpt-5.2 is expensive to call. I'll improve latency and cost with the next 2 solutions.

Prompt Caching

Since bibliography remains generally constant and takes up the bulk of the prompt, I'll use OpenAI's prompt caching - advertised to reduce latency by up to 80% and input token costs by up to 90%.

  1. Place bibliography at the start of the prompt (prompt caching uses exact prefix match)
  2. Run a "no-reply" LLM query at the start of each session and when database reloads, and config it to cache 24h
  3. Each time \cite{ is triggered, the cached bibliography is used -> lower latency

Prompt Refinement

I'll remove info-sparse fields (eg doi, url, pages) and retain only info-rich fields (eg title, booktitle), to reduce the total size of bibliography by (hopefully) at least 40%.

cc: @Junyi-99

@kah-seng kah-seng mentioned this pull request Feb 5, 2026
@wjiayis
Copy link
Member Author

wjiayis commented Feb 7, 2026

With regards to latency, just changing the model fixed the bulk of the issue. I still implemented prompt caching and prompt refinement.

I realize that when using OpenRouter, the first few (more than one) auto-completion would have 6-7s latency, and subsequent auto-completions would have 1-2s latency, as a result of prompt caching. I feel that this is good enough UX, and a first "no-reply" LLM query is probably not necessary, as it complicates the solution. Nevertheless feel free to let me know if you prefer it being implemented.

@wjiayis wjiayis marked this pull request as ready for review February 7, 2026 04:50
@wjiayis wjiayis requested review from 4ndrelim and Junyi-99 February 7, 2026 04:53
@wjiayis wjiayis self-assigned this Feb 7, 2026
@wjiayis wjiayis added the enhancement New feature or request label Feb 7, 2026
4ndrelim pushed a commit that referenced this pull request Feb 7, 2026
### 1. Local Storage Not Updated
Access tokens and refresh tokens are only saved to local storage on
login, subsequent token refreshes do not update local storage.
<br><b>Example:</b> User refreshes to a new token and exits Overleaf,
the next time he re-opens PaperDebugger, it uses the old refresh token.
<br><b>Proposed solution:</b> Update authStore whenever tokens are set.

<br>

### 2. Race Conditions When Refreshing
PaperDebugger often calls multiple endpoints at the same time, which
results in a race condition if the token needs to be refreshed.
<br><b>Example:</b> `v2/chats/models` and `v2/chats/conversations` are
called at the same time, and the access token needs refreshing, the
refresh endpoint is called twice. In some occasions, the frontend uses
the 2nd refresh token received which differs from the one stored in the
backend. This can be easily reproduced by setting the JWT expiration in
the backend to a very short time.
<br><b>Proposed solution:</b> Use a promise for `refresh()`.

<br>
<br>

Unsure if this fixes the exact problem in #110
@4ndrelim 4ndrelim requested a review from Copilot February 9, 2026 13:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements a first version of citation-aware tab completion by detecting \cite{ in the editor, fetching suggested BibTeX keys from a new backend endpoint, and suppressing Overleaf’s default autocomplete/tab behavior to allow accepting inline suggestions.

Changes:

  • Frontend: Adds a “citation suggestions” beta setting, detects \cite{ triggers, fetches citation keys, and intercepts Tab to accept inline suggestions.
  • Backend: Adds GetCitationKeys RPC/HTTP endpoint, extracts and token-reduces .bib content, and queries an LLM for up to ~3 relevant citation keys.
  • Plumbing: Updates generated proto clients/servers and ignores CLAUDE.md.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
webapp/_webapp/src/views/settings/sections/beta-feature-settings.tsx Renames the completion toggle UI to “citation suggestions”.
webapp/_webapp/src/views/settings/index.tsx Renders the beta features section in Settings.
webapp/_webapp/src/query/api.ts Adds getCitationKeys() API wrapper for the new v2 endpoint.
webapp/_webapp/src/pkg/gen/apiclient/chat/v2/chat_pb.ts Generated TS client types/service updated for GetCitationKeys.
webapp/_webapp/src/libs/inline-suggestion.ts Adds \cite{ trigger handling and suppresses Overleaf autocomplete/tab behavior when suggestions are active.
proto/chat/v2/chat.proto Adds GetCitationKeys RPC + request/response messages.
pkg/gen/api/chat/v2/chat_grpc.pb.go Generated gRPC server/client updated with GetCitationKeys.
pkg/gen/api/chat/v2/chat.pb.gw.go Generated grpc-gateway bindings for /_pd/api/v2/chats/citation-keys.
pkg/gen/api/chat/v2/chat.pb.go Generated Go proto types updated with new messages and RPC.
internal/services/toolkit/client/get_citation_keys.go Implements bibliography extraction and LLM prompt to suggest citation keys.
internal/api/chat/get_citation_keys.go Implements the ChatServerV2 handler for GetCitationKeys.
.gitignore Ignores CLAUDE.md.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +253 to +254
// A comma-separated string of keys, or empty if none found
string citation_keys = 1;
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning citation keys as a comma-separated string makes the API harder to use correctly (parsing, escaping, empty handling) and is inconsistent with other protos that use repeated string for lists. Consider changing citation_keys to repeated string citation_keys = 1; and let clients join with commas as needed.

Suggested change
// A comma-separated string of keys, or empty if none found
string citation_keys = 1;
// A list of suggested citation keys, empty if none found
repeated string citation_keys = 1;

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +52
const textBefore = state.doc.sliceString(0, cursorPos - triggerText.length);
const lastSentence = textBefore
.split(/(?<=[.!?])\s+/)
.filter((s) => s.trim().length > 0) // filter out empty sentences
.slice(-1)[0];
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no sentence-ending punctuation before the trigger, lastSentence becomes the entire prefix of the document (potentially very large), which can increase latency/cost and exceed token limits. Consider capping the context (e.g. last N chars/words) and/or using a more robust sentence extraction that falls back to a bounded window.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,142 @@
package client

// TODO: This file should not place in the client package.
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new file includes a TODO noting it should not live in the client package. Leaving known-misplaced code in-tree makes future refactors harder (imports, package boundaries). Please move this logic into an appropriate package (e.g. a citation/bibliography helper or toolkit subpackage) before merging, and keep client focused on LLM client orchestration.

Suggested change
// TODO: This file should not place in the client package.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Junyi-99 I'm planning to eventually move this file to wherever internal/services/toolkit/client/get_conversation_title_v2.go will be moved to, since both files contain high-order business logic that doesn't belong to the client folder. Shall I leave this file here for now?

Copy link
Member

@4ndrelim 4ndrelim Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i got copilot to review the PR. I didnt really look through its suggestions. Could you resolve copilot's comments and integrate any valid / viable suggestions? I think jun yi is busy these few days, i'll review later on this week if its still pending review! Thanks for your hardwork jiayi!

@Junyi-99 actually do we have a recommended PR to main workflow? Should we review only review PRs to main and allow freely merging to staging / dev branch for testing in a simulated prod env? Or should there be a preliminary review before merging to staging too?

Comment on lines +110 to +112
func (a *AIClientV2) GetCitationKeys(ctx context.Context, sentence string, userId bson.ObjectID, projectId string, llmProvider *models.LLMProviderConfig) (string, error) {
bibliography, err := a.GetBibliographyForCitation(ctx, userId, projectId)

Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New citation-key functionality is added here but there are existing tests for other toolkit client behaviors (e.g. conversation title). Please add unit tests for GetBibliographyForCitation/GetCitationKeys (at minimum: field-exclusion behavior and prompt formatting / empty-citation handling) to prevent regressions.

Copilot uses AI. Check for mistakes.
Comment on lines +150 to +153
export const getCitationKeys = async (data: PlainMessage<GetCitationKeysRequest>) => {
const response = await apiclientV2.post(`/chats/citation-keys`, data);
return fromJson(GetCitationKeysResponseSchema, response);
}
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is invoked as the user types; failures (401, network, etc.) should not spam global error toasts. Pass { ignoreErrorToast: true } to apiclientV2.post here and let the caller decide how to surface errors (the completion code already logs and returns ""). Also, the file consistently terminates exported const assignments with ;—add it here to match lint/style.

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +33
citationKeys, err := s.aiClientV2.GetCitationKeys(
ctx,
req.GetSentence(),
actor.ID,
req.GetProjectId(),
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetCitationKeysRequest includes an optional model_slug, but this handler doesn’t read req.GetModelSlug() (and the downstream call currently hardcodes a model). Either plumb the requested model through to the client call, or remove the field from the proto to avoid a misleading API surface.

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +33
citationKeys, err := s.aiClientV2.GetCitationKeys(
ctx,
req.GetSentence(),
actor.ID,
req.GetProjectId(),
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please validate required inputs (at least sentence and project_id) and return a clear bad-request error when missing, consistent with other chat handlers (e.g. shared.ErrBadRequest("title is required")). Otherwise empty values will fail deeper in services with less actionable errors.

Copilot uses AI. Check for mistakes.
Comment on lines +607 to +611
if (styleEl) return;
styleEl = document.createElement("style");
styleEl.textContent = `.cm-tooltip { display: none !important; }`;
document.head.appendChild(styleEl);
}
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The injected CSS hides all .cm-tooltip elements globally while active, which can suppress unrelated tooltips (lint, hover help, etc.) elsewhere on the page and across multiple editors. Scope the selector to the specific editor instance/container and/or a more specific autocomplete tooltip class so only Overleaf’s dropdown is suppressed.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Tab-Completion

3 participants