Skip to content

Embed Chatbot into the Pheno Docs Expanded Website#2

Open
shaykraitman wants to merge 19 commits intoPhenoAI:feature/shay-chatbot-integrationfrom
shaykraitman:main
Open

Embed Chatbot into the Pheno Docs Expanded Website#2
shaykraitman wants to merge 19 commits intoPhenoAI:feature/shay-chatbot-integrationfrom
shaykraitman:main

Conversation

@shaykraitman
Copy link
Collaborator

To embed the chatbot into the existing website, the following steps are required:
Run the deploy.sh script, based on the .env file that contains the Lambda function URL.
Update the Lambda CORS configuration to allow access from the production frontend URL (it is currently allowing access only from the fork URL).
After these changes, the chatbot should be embedded and function smoothly.

MariaGorodetski and others added 19 commits February 4, 2026 12:57
…etabolomics dataset

- Fixed PNG image paths in 10 notebooks to include subdirectory prefix
- Darkened category link colors in sidebar for better visibility
- Updated Population and Events dataset descriptions to be more concise
- Added Untargeted Metabolomics to Multi-Omics category and sidebar navigation
- Removed docs/ from .gitignore to enable GitHub Pages serving
- Rebuilt site with all latest changes including:
  - Fixed image paths in notebooks
  - Darkened category link colors
  - Updated dataset descriptions
  - Added Untargeted Metabolomics dataset
- Added 2 new publications: Nature Gluformer paper and NutriMatch
- Updated publication metadata with authors from CrossRef/arXiv APIs
- Synced 45 relevant publications from Notion
- Updated publication images and content
- Re-rendered website with latest publications
- Fixed bug where weblink was replaced with bare DOI
- Added support for bare DOI identifiers in CrossRef API calls
- Now correctly displays authors for Nature Gluformer and NutriMatch papers
- All publication links now use full URLs instead of bare DOIs
- Updated 2 new publications with complete metadata
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Embeds a client-side chatbot widget into the generated Pheno Docs Expanded site and updates deployment/docs to use an AWS Lambda backend URL.

Changes:

  • Adds an inline chatbot widget (HTML/CSS/JS) to multiple built docs/*.html pages.
  • Updates deploy.sh to read BACKEND_URL from .env and inject it into the chatbot widget during build.
  • Expands documentation with Lambda setup and updated architecture/deploy instructions.

Reviewed changes

Copilot reviewed 18 out of 208 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
docs/data_format.html Regenerated Quarto output + inlined chatbot widget and updated Quarto runtime assets.
docs/category_study_core.html Regenerated Quarto output + inlined chatbot widget and updated Quarto runtime assets.
docs/category_sensors_app_logging.html Regenerated Quarto output + inlined chatbot widget and updated Quarto runtime assets.
docs/category_medical_records_surveys.html Regenerated Quarto output + inlined chatbot widget and updated Quarto runtime assets.
docs/category_medical_imaging.html Regenerated Quarto output + inlined chatbot widget and updated Quarto runtime assets.
docs/about.html Regenerated Quarto output + inlined chatbot widget and updated Quarto runtime assets.
deploy.sh Switches deployment config from API key injection to BACKEND_URL injection + skips context regeneration when present.
README.md Updates repo docs for Lambda-backed chatbot architecture, deployment, and testing.
LAMBDA_SETUP.md Adds Lambda backend setup summary and operational instructions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1111 to +1119
function renderMarkdown(content) {
let html = content;

// First, convert markdown links [text](url) to HTML <a> tags
html = html.replace(/\[([^\]]+)\]\(([^)]+)\)/g, '<a href="$2" target="_blank" rel="noopener noreferrer">$1</a>');

// Convert double line breaks to paragraphs
// Split by double newlines first
const paragraphs = html.split(/\n\n+/);
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chatbot renders backend-provided content via innerHTML without sanitizing/escaping. If the backend (or any upstream model output) returns HTML, this can lead to XSS in the GitHub Pages frontend. Consider either (1) strictly escaping all user/backend text and only allowing a very small safe subset (e.g., links) via DOM construction, or (2) sanitizing html with a proven sanitizer (e.g., DOMPurify) before assigning to innerHTML.

Copilot uses AI. Check for mistakes.
Comment on lines +1162 to +1163
// Render markdown links as HTML
contentDiv.innerHTML = renderMarkdown(content);
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chatbot renders backend-provided content via innerHTML without sanitizing/escaping. If the backend (or any upstream model output) returns HTML, this can lead to XSS in the GitHub Pages frontend. Consider either (1) strictly escaping all user/backend text and only allowing a very small safe subset (e.g., links) via DOM construction, or (2) sanitizing html with a proven sanitizer (e.g., DOMPurify) before assigning to innerHTML.

Copilot uses AI. Check for mistakes.

// Configuration
const CONFIG = {
BACKEND_URL: 'https://y365pk3iuvzvmudftiamclgqiu0oofmj.lambda-url.eu-west-1.on.aws/api/chat' // Will be injected during build
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code comment states the backend URL will be injected during build, but the built docs/*.html contains a hard-coded Lambda URL. This defeats deploy.sh's placeholder-based injection approach and risks shipping an environment-specific URL to production. Use a placeholder (e.g., __BACKEND_URL__) in the widget source so deploy.sh can replace it, or ensure Quarto includes the widget source file (not a copied, already-injected version) when rendering.

Suggested change
BACKEND_URL: 'https://y365pk3iuvzvmudftiamclgqiu0oofmj.lambda-url.eu-west-1.on.aws/api/chat' // Will be injected during build
BACKEND_URL: '__BACKEND_URL__' // Will be injected during build

Copilot uses AI. Check for mistakes.
Comment on lines +769 to +775
<!-- Simple Chatbot Widget for GitHub Pages -->
<!-- Add this to your _quarto.yml or include in your pages -->

<style>
/* Chat button */
.pheno-chat-button {
position: fixed;
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chatbot widget’s full CSS and JS are inlined into each generated docs/*.html page, which increases page size and duplicates code across the site. To reduce download/parse cost and simplify updates, move the widget CSS/JS into shared assets (e.g., styles.css + a single JS file) and include them once globally via Quarto (include-after-body / include-in-header) rather than duplicating per-page.

Copilot uses AI. Check for mistakes.
</div>
<div id="quarto-search" class="" title="Search"></div>
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarCollapse" aria-controls="navbarCollapse" aria-expanded="false" aria-label="Toggle navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarCollapse" aria-controls="navbarCollapse" role="menu" aria-expanded="false" aria-label="Toggle navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some ARIA roles added here are not appropriate for the elements’ behavior and can confuse assistive technologies. For example, the navbar toggler is a <button> and should not have role=\"menu\", and the expandable sidebar control implemented as an <a> should not have role=\"navigation\" (it behaves like a button). Prefer removing incorrect role attributes (native semantics are best) or using role=\"button\" only when necessary, while keeping correct aria-expanded/aria-controls.

Copilot uses AI. Check for mistakes.
</button>
<nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"></ol></nav>
<a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
<a class="flex-grow-1" role="navigation" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some ARIA roles added here are not appropriate for the elements’ behavior and can confuse assistive technologies. For example, the navbar toggler is a <button> and should not have role=\"menu\", and the expandable sidebar control implemented as an <a> should not have role=\"navigation\" (it behaves like a button). Prefer removing incorrect role attributes (native semantics are best) or using role=\"button\" only when necessary, while keeping correct aria-expanded/aria-controls.

Copilot uses AI. Check for mistakes.
Comment on lines +1040 to +1045
chatInput.addEventListener('keypress', (e) => {
if (e.key === 'Enter' && !e.shiftKey) {
e.preventDefault();
handleSend();
}
});
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keypress is deprecated in modern browsers and may behave inconsistently across IME/input methods. Use keydown/keyup for Enter handling instead, and ensure the handler doesn’t interfere with text composition events if needed.

Copilot uses AI. Check for mistakes.
Comment on lines +59 to +60
# Replace placeholder with actual backend URL
sed "s|__BACKEND_URL__|$BACKEND_URL|g" "$WIDGET_FILE" > "$WIDGET_TEMP"
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sed replacement is not safe for arbitrary URLs: characters like & (and some backslash sequences) in $BACKEND_URL can be interpreted by sed and lead to incorrect output or partial replacements. Recommend escaping the replacement string before passing it to sed, or switching to a tool that safely handles literal replacements (e.g., a small Python one-liner doing .replace() on the file contents).

Copilot uses AI. Check for mistakes.
# Restore placeholder in widget (so we don't commit the URL)
echo "🔒 Step 5: Restoring placeholder in widget..."
sed "s|$OPENROUTER_API_KEY|__OPENROUTER_API_KEY__|g" "$WIDGET_FILE" > "$WIDGET_TEMP"
sed "s|$BACKEND_URL|__BACKEND_URL__|g" "$WIDGET_FILE" > "$WIDGET_TEMP"
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sed replacement is not safe for arbitrary URLs: characters like & (and some backslash sequences) in $BACKEND_URL can be interpreted by sed and lead to incorrect output or partial replacements. Recommend escaping the replacement string before passing it to sed, or switching to a tool that safely handles literal replacements (e.g., a small Python one-liner doing .replace() on the file contents).

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +51
# Create knowledge base context (only if it doesn't exist or if manually created)
CONTEXT_FILE="pheno_knowledge_base_expanded/knowledge-base-context.txt"
if [ ! -f "$CONTEXT_FILE" ]; then
echo "📚 Step 2: Creating knowledge base context (file doesn't exist)..."
./create-knowledge-base.sh
echo ""
else
echo "📚 Step 2: Using existing knowledge base context (skipping regeneration)"
echo " File: $CONTEXT_FILE"
echo " Size: $(du -h "$CONTEXT_FILE" | cut -f1)"
echo " (To regenerate, delete the file and run deploy again)"
echo ""
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping knowledge-base regeneration when the context file exists can easily ship stale context after documentation updates (the most common case). Consider regenerating by default, or at least regenerating when source files are newer than $CONTEXT_FILE (e.g., via timestamp checks), with an option/flag to skip regeneration when desired.

Suggested change
# Create knowledge base context (only if it doesn't exist or if manually created)
CONTEXT_FILE="pheno_knowledge_base_expanded/knowledge-base-context.txt"
if [ ! -f "$CONTEXT_FILE" ]; then
echo "📚 Step 2: Creating knowledge base context (file doesn't exist)..."
./create-knowledge-base.sh
echo ""
else
echo "📚 Step 2: Using existing knowledge base context (skipping regeneration)"
echo " File: $CONTEXT_FILE"
echo " Size: $(du -h "$CONTEXT_FILE" | cut -f1)"
echo " (To regenerate, delete the file and run deploy again)"
echo ""
# Create knowledge base context
CONTEXT_FILE="pheno_knowledge_base_expanded/knowledge-base-context.txt"
if [ "$SKIP_KB_REGEN" = "1" ] || [ "$SKIP_KB_REGEN" = "true" ]; then
if [ -f "$CONTEXT_FILE" ]; then
echo "📚 Step 2: Using existing knowledge base context (regeneration skipped via SKIP_KB_REGEN)"
echo " File: $CONTEXT_FILE"
echo " Size: $(du -h "$CONTEXT_FILE" | cut -f1)"
echo " (Unset SKIP_KB_REGEN to force regeneration on deploy)"
echo ""
else
echo "❌ ERROR: SKIP_KB_REGEN is set but knowledge base context file was not found:"
echo " Expected file: $CONTEXT_FILE"
echo " Either create it (run ./create-knowledge-base.sh) or unset SKIP_KB_REGEN and re-run deploy."
exit 1
fi
else
echo "📚 Step 2: Creating/regenerating knowledge base context..."
./create-knowledge-base.sh
echo ""

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants