Embed Chatbot into the Pheno Docs Expanded Website#2
Embed Chatbot into the Pheno Docs Expanded Website#2shaykraitman wants to merge 19 commits intoPhenoAI:feature/shay-chatbot-integrationfrom
Conversation
…etabolomics dataset - Fixed PNG image paths in 10 notebooks to include subdirectory prefix - Darkened category link colors in sidebar for better visibility - Updated Population and Events dataset descriptions to be more concise - Added Untargeted Metabolomics to Multi-Omics category and sidebar navigation
- Removed docs/ from .gitignore to enable GitHub Pages serving - Rebuilt site with all latest changes including: - Fixed image paths in notebooks - Darkened category link colors - Updated dataset descriptions - Added Untargeted Metabolomics dataset
- Added 2 new publications: Nature Gluformer paper and NutriMatch - Updated publication metadata with authors from CrossRef/arXiv APIs - Synced 45 relevant publications from Notion - Updated publication images and content - Re-rendered website with latest publications
- Fixed bug where weblink was replaced with bare DOI - Added support for bare DOI identifiers in CrossRef API calls - Now correctly displays authors for Nature Gluformer and NutriMatch papers - All publication links now use full URLs instead of bare DOIs - Updated 2 new publications with complete metadata
…d/knowledge-base-context.txt file
There was a problem hiding this comment.
Pull request overview
Embeds a client-side chatbot widget into the generated Pheno Docs Expanded site and updates deployment/docs to use an AWS Lambda backend URL.
Changes:
- Adds an inline chatbot widget (HTML/CSS/JS) to multiple built
docs/*.htmlpages. - Updates
deploy.shto readBACKEND_URLfrom.envand inject it into the chatbot widget during build. - Expands documentation with Lambda setup and updated architecture/deploy instructions.
Reviewed changes
Copilot reviewed 18 out of 208 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/data_format.html | Regenerated Quarto output + inlined chatbot widget and updated Quarto runtime assets. |
| docs/category_study_core.html | Regenerated Quarto output + inlined chatbot widget and updated Quarto runtime assets. |
| docs/category_sensors_app_logging.html | Regenerated Quarto output + inlined chatbot widget and updated Quarto runtime assets. |
| docs/category_medical_records_surveys.html | Regenerated Quarto output + inlined chatbot widget and updated Quarto runtime assets. |
| docs/category_medical_imaging.html | Regenerated Quarto output + inlined chatbot widget and updated Quarto runtime assets. |
| docs/about.html | Regenerated Quarto output + inlined chatbot widget and updated Quarto runtime assets. |
| deploy.sh | Switches deployment config from API key injection to BACKEND_URL injection + skips context regeneration when present. |
| README.md | Updates repo docs for Lambda-backed chatbot architecture, deployment, and testing. |
| LAMBDA_SETUP.md | Adds Lambda backend setup summary and operational instructions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| function renderMarkdown(content) { | ||
| let html = content; | ||
|
|
||
| // First, convert markdown links [text](url) to HTML <a> tags | ||
| html = html.replace(/\[([^\]]+)\]\(([^)]+)\)/g, '<a href="$2" target="_blank" rel="noopener noreferrer">$1</a>'); | ||
|
|
||
| // Convert double line breaks to paragraphs | ||
| // Split by double newlines first | ||
| const paragraphs = html.split(/\n\n+/); |
There was a problem hiding this comment.
The chatbot renders backend-provided content via innerHTML without sanitizing/escaping. If the backend (or any upstream model output) returns HTML, this can lead to XSS in the GitHub Pages frontend. Consider either (1) strictly escaping all user/backend text and only allowing a very small safe subset (e.g., links) via DOM construction, or (2) sanitizing html with a proven sanitizer (e.g., DOMPurify) before assigning to innerHTML.
| // Render markdown links as HTML | ||
| contentDiv.innerHTML = renderMarkdown(content); |
There was a problem hiding this comment.
The chatbot renders backend-provided content via innerHTML without sanitizing/escaping. If the backend (or any upstream model output) returns HTML, this can lead to XSS in the GitHub Pages frontend. Consider either (1) strictly escaping all user/backend text and only allowing a very small safe subset (e.g., links) via DOM construction, or (2) sanitizing html with a proven sanitizer (e.g., DOMPurify) before assigning to innerHTML.
|
|
||
| // Configuration | ||
| const CONFIG = { | ||
| BACKEND_URL: 'https://y365pk3iuvzvmudftiamclgqiu0oofmj.lambda-url.eu-west-1.on.aws/api/chat' // Will be injected during build |
There was a problem hiding this comment.
The code comment states the backend URL will be injected during build, but the built docs/*.html contains a hard-coded Lambda URL. This defeats deploy.sh's placeholder-based injection approach and risks shipping an environment-specific URL to production. Use a placeholder (e.g., __BACKEND_URL__) in the widget source so deploy.sh can replace it, or ensure Quarto includes the widget source file (not a copied, already-injected version) when rendering.
| BACKEND_URL: 'https://y365pk3iuvzvmudftiamclgqiu0oofmj.lambda-url.eu-west-1.on.aws/api/chat' // Will be injected during build | |
| BACKEND_URL: '__BACKEND_URL__' // Will be injected during build |
| <!-- Simple Chatbot Widget for GitHub Pages --> | ||
| <!-- Add this to your _quarto.yml or include in your pages --> | ||
|
|
||
| <style> | ||
| /* Chat button */ | ||
| .pheno-chat-button { | ||
| position: fixed; |
There was a problem hiding this comment.
The chatbot widget’s full CSS and JS are inlined into each generated docs/*.html page, which increases page size and duplicates code across the site. To reduce download/parse cost and simplify updates, move the widget CSS/JS into shared assets (e.g., styles.css + a single JS file) and include them once globally via Quarto (include-after-body / include-in-header) rather than duplicating per-page.
| </div> | ||
| <div id="quarto-search" class="" title="Search"></div> | ||
| <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarCollapse" aria-controls="navbarCollapse" aria-expanded="false" aria-label="Toggle navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }"> | ||
| <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarCollapse" aria-controls="navbarCollapse" role="menu" aria-expanded="false" aria-label="Toggle navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }"> |
There was a problem hiding this comment.
Some ARIA roles added here are not appropriate for the elements’ behavior and can confuse assistive technologies. For example, the navbar toggler is a <button> and should not have role=\"menu\", and the expandable sidebar control implemented as an <a> should not have role=\"navigation\" (it behaves like a button). Prefer removing incorrect role attributes (native semantics are best) or using role=\"button\" only when necessary, while keeping correct aria-expanded/aria-controls.
| </button> | ||
| <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"></ol></nav> | ||
| <a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }"> | ||
| <a class="flex-grow-1" role="navigation" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }"> |
There was a problem hiding this comment.
Some ARIA roles added here are not appropriate for the elements’ behavior and can confuse assistive technologies. For example, the navbar toggler is a <button> and should not have role=\"menu\", and the expandable sidebar control implemented as an <a> should not have role=\"navigation\" (it behaves like a button). Prefer removing incorrect role attributes (native semantics are best) or using role=\"button\" only when necessary, while keeping correct aria-expanded/aria-controls.
| chatInput.addEventListener('keypress', (e) => { | ||
| if (e.key === 'Enter' && !e.shiftKey) { | ||
| e.preventDefault(); | ||
| handleSend(); | ||
| } | ||
| }); |
There was a problem hiding this comment.
keypress is deprecated in modern browsers and may behave inconsistently across IME/input methods. Use keydown/keyup for Enter handling instead, and ensure the handler doesn’t interfere with text composition events if needed.
| # Replace placeholder with actual backend URL | ||
| sed "s|__BACKEND_URL__|$BACKEND_URL|g" "$WIDGET_FILE" > "$WIDGET_TEMP" |
There was a problem hiding this comment.
The sed replacement is not safe for arbitrary URLs: characters like & (and some backslash sequences) in $BACKEND_URL can be interpreted by sed and lead to incorrect output or partial replacements. Recommend escaping the replacement string before passing it to sed, or switching to a tool that safely handles literal replacements (e.g., a small Python one-liner doing .replace() on the file contents).
| # Restore placeholder in widget (so we don't commit the URL) | ||
| echo "🔒 Step 5: Restoring placeholder in widget..." | ||
| sed "s|$OPENROUTER_API_KEY|__OPENROUTER_API_KEY__|g" "$WIDGET_FILE" > "$WIDGET_TEMP" | ||
| sed "s|$BACKEND_URL|__BACKEND_URL__|g" "$WIDGET_FILE" > "$WIDGET_TEMP" |
There was a problem hiding this comment.
The sed replacement is not safe for arbitrary URLs: characters like & (and some backslash sequences) in $BACKEND_URL can be interpreted by sed and lead to incorrect output or partial replacements. Recommend escaping the replacement string before passing it to sed, or switching to a tool that safely handles literal replacements (e.g., a small Python one-liner doing .replace() on the file contents).
| # Create knowledge base context (only if it doesn't exist or if manually created) | ||
| CONTEXT_FILE="pheno_knowledge_base_expanded/knowledge-base-context.txt" | ||
| if [ ! -f "$CONTEXT_FILE" ]; then | ||
| echo "📚 Step 2: Creating knowledge base context (file doesn't exist)..." | ||
| ./create-knowledge-base.sh | ||
| echo "" | ||
| else | ||
| echo "📚 Step 2: Using existing knowledge base context (skipping regeneration)" | ||
| echo " File: $CONTEXT_FILE" | ||
| echo " Size: $(du -h "$CONTEXT_FILE" | cut -f1)" | ||
| echo " (To regenerate, delete the file and run deploy again)" | ||
| echo "" |
There was a problem hiding this comment.
Skipping knowledge-base regeneration when the context file exists can easily ship stale context after documentation updates (the most common case). Consider regenerating by default, or at least regenerating when source files are newer than $CONTEXT_FILE (e.g., via timestamp checks), with an option/flag to skip regeneration when desired.
| # Create knowledge base context (only if it doesn't exist or if manually created) | |
| CONTEXT_FILE="pheno_knowledge_base_expanded/knowledge-base-context.txt" | |
| if [ ! -f "$CONTEXT_FILE" ]; then | |
| echo "📚 Step 2: Creating knowledge base context (file doesn't exist)..." | |
| ./create-knowledge-base.sh | |
| echo "" | |
| else | |
| echo "📚 Step 2: Using existing knowledge base context (skipping regeneration)" | |
| echo " File: $CONTEXT_FILE" | |
| echo " Size: $(du -h "$CONTEXT_FILE" | cut -f1)" | |
| echo " (To regenerate, delete the file and run deploy again)" | |
| echo "" | |
| # Create knowledge base context | |
| CONTEXT_FILE="pheno_knowledge_base_expanded/knowledge-base-context.txt" | |
| if [ "$SKIP_KB_REGEN" = "1" ] || [ "$SKIP_KB_REGEN" = "true" ]; then | |
| if [ -f "$CONTEXT_FILE" ]; then | |
| echo "📚 Step 2: Using existing knowledge base context (regeneration skipped via SKIP_KB_REGEN)" | |
| echo " File: $CONTEXT_FILE" | |
| echo " Size: $(du -h "$CONTEXT_FILE" | cut -f1)" | |
| echo " (Unset SKIP_KB_REGEN to force regeneration on deploy)" | |
| echo "" | |
| else | |
| echo "❌ ERROR: SKIP_KB_REGEN is set but knowledge base context file was not found:" | |
| echo " Expected file: $CONTEXT_FILE" | |
| echo " Either create it (run ./create-knowledge-base.sh) or unset SKIP_KB_REGEN and re-run deploy." | |
| exit 1 | |
| fi | |
| else | |
| echo "📚 Step 2: Creating/regenerating knowledge base context..." | |
| ./create-knowledge-base.sh | |
| echo "" |
To embed the chatbot into the existing website, the following steps are required:
Run the deploy.sh script, based on the .env file that contains the Lambda function URL.
Update the Lambda CORS configuration to allow access from the production frontend URL (it is currently allowing access only from the fork URL).
After these changes, the chatbot should be embedded and function smoothly.