Embed Chatbot into the Pheno Docs Expanded Website by shaykraitman · Pull Request #2 · PhenoAI/pheno-docs-expanded

shaykraitman · 2026-02-10T09:04:19Z

To embed the chatbot into the existing website, the following steps are required:
Run the deploy.sh script, based on the .env file that contains the Lambda function URL.
Update the Lambda CORS configuration to allow access from the production frontend URL (it is currently allowing access only from the fork URL).
After these changes, the chatbot should be embedded and function smoothly.

…etabolomics dataset - Fixed PNG image paths in 10 notebooks to include subdirectory prefix - Darkened category link colors in sidebar for better visibility - Updated Population and Events dataset descriptions to be more concise - Added Untargeted Metabolomics to Multi-Omics category and sidebar navigation

- Removed docs/ from .gitignore to enable GitHub Pages serving - Rebuilt site with all latest changes including: - Fixed image paths in notebooks - Darkened category link colors - Updated dataset descriptions - Added Untargeted Metabolomics dataset

- Added 2 new publications: Nature Gluformer paper and NutriMatch - Updated publication metadata with authors from CrossRef/arXiv APIs - Synced 45 relevant publications from Notion - Updated publication images and content - Re-rendered website with latest publications

- Fixed bug where weblink was replaced with bare DOI - Added support for bare DOI identifiers in CrossRef API calls - Now correctly displays authors for Nature Gluformer and NutriMatch papers - All publication links now use full URLs instead of bare DOIs - Updated 2 new publications with complete metadata

…e persistence

…ip from tracking

…d/knowledge-base-context.txt file

Copilot

Pull request overview

Embeds a client-side chatbot widget into the generated Pheno Docs Expanded site and updates deployment/docs to use an AWS Lambda backend URL.

Changes:

Adds an inline chatbot widget (HTML/CSS/JS) to multiple built docs/*.html pages.
Updates deploy.sh to read BACKEND_URL from .env and inject it into the chatbot widget during build.
Expands documentation with Lambda setup and updated architecture/deploy instructions.

Reviewed changes

Copilot reviewed 18 out of 208 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
docs/data_format.html	Regenerated Quarto output + inlined chatbot widget and updated Quarto runtime assets.
docs/category_study_core.html	Regenerated Quarto output + inlined chatbot widget and updated Quarto runtime assets.
docs/category_sensors_app_logging.html	Regenerated Quarto output + inlined chatbot widget and updated Quarto runtime assets.
docs/category_medical_records_surveys.html	Regenerated Quarto output + inlined chatbot widget and updated Quarto runtime assets.
docs/category_medical_imaging.html	Regenerated Quarto output + inlined chatbot widget and updated Quarto runtime assets.
docs/about.html	Regenerated Quarto output + inlined chatbot widget and updated Quarto runtime assets.
deploy.sh	Switches deployment config from API key injection to `BACKEND_URL` injection + skips context regeneration when present.
README.md	Updates repo docs for Lambda-backed chatbot architecture, deployment, and testing.
LAMBDA_SETUP.md	Adds Lambda backend setup summary and operational instructions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-10T09:08:33Z

docs/data_format.html

+    function renderMarkdown(content) {
+        let html = content;
+
+        // First, convert markdown links [text](url) to HTML <a> tags
+        html = html.replace(/\[([^\]]+)\]\(([^)]+)\)/g, '<a href="$2" target="_blank" rel="noopener noreferrer">$1</a>');
+
+        // Convert double line breaks to paragraphs
+        // Split by double newlines first
+        const paragraphs = html.split(/\n\n+/);


The chatbot renders backend-provided content via innerHTML without sanitizing/escaping. If the backend (or any upstream model output) returns HTML, this can lead to XSS in the GitHub Pages frontend. Consider either (1) strictly escaping all user/backend text and only allowing a very small safe subset (e.g., links) via DOM construction, or (2) sanitizing html with a proven sanitizer (e.g., DOMPurify) before assigning to innerHTML.

Copilot · 2026-02-10T09:08:34Z

docs/data_format.html

+        // Render markdown links as HTML
+        contentDiv.innerHTML = renderMarkdown(content);


The chatbot renders backend-provided content via innerHTML without sanitizing/escaping. If the backend (or any upstream model output) returns HTML, this can lead to XSS in the GitHub Pages frontend. Consider either (1) strictly escaping all user/backend text and only allowing a very small safe subset (e.g., links) via DOM construction, or (2) sanitizing html with a proven sanitizer (e.g., DOMPurify) before assigning to innerHTML.

Copilot · 2026-02-10T09:08:34Z

docs/data_format.html

+
+    // Configuration
+    const CONFIG = {
+        BACKEND_URL: 'https://y365pk3iuvzvmudftiamclgqiu0oofmj.lambda-url.eu-west-1.on.aws/api/chat'  // Will be injected during build


The code comment states the backend URL will be injected during build, but the built docs/*.html contains a hard-coded Lambda URL. This defeats deploy.sh's placeholder-based injection approach and risks shipping an environment-specific URL to production. Use a placeholder (e.g., __BACKEND_URL__) in the widget source so deploy.sh can replace it, or ensure Quarto includes the widget source file (not a copied, already-injected version) when rendering.

Suggested change

BACKEND_URL: 'https://y365pk3iuvzvmudftiamclgqiu0oofmj.lambda-url.eu-west-1.on.aws/api/chat' // Will be injected during build

BACKEND_URL: '__BACKEND_URL__' // Will be injected during build

Copilot · 2026-02-10T09:08:35Z

docs/about.html

+<!-- Simple Chatbot Widget for GitHub Pages -->
+<!-- Add this to your _quarto.yml or include in your pages -->
+
+<style>
+    /* Chat button */
+    .pheno-chat-button {
+        position: fixed;


The chatbot widget’s full CSS and JS are inlined into each generated docs/*.html page, which increases page size and duplicates code across the site. To reduce download/parse cost and simplify updates, move the widget CSS/JS into shared assets (e.g., styles.css + a single JS file) and include them once globally via Quarto (include-after-body / include-in-header) rather than duplicating per-page.

Copilot · 2026-02-10T09:08:35Z

docs/data_format.html

  </div>
            <div id="quarto-search" class="" title="Search"></div>
-          <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarCollapse" aria-controls="navbarCollapse" aria-expanded="false" aria-label="Toggle navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">
+          <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarCollapse" aria-controls="navbarCollapse" role="menu" aria-expanded="false" aria-label="Toggle navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">


Some ARIA roles added here are not appropriate for the elements’ behavior and can confuse assistive technologies. For example, the navbar toggler is a <button> and should not have role=\"menu\", and the expandable sidebar control implemented as an <a> should not have role=\"navigation\" (it behaves like a button). Prefer removing incorrect role attributes (native semantics are best) or using role=\"button\" only when necessary, while keeping correct aria-expanded/aria-controls.

Copilot · 2026-02-10T09:08:35Z

docs/data_format.html

      </button>
        <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"></ol></nav>
-        <a class="flex-grow-1" role="button" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">      
+        <a class="flex-grow-1" role="navigation" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }">      


Some ARIA roles added here are not appropriate for the elements’ behavior and can confuse assistive technologies. For example, the navbar toggler is a <button> and should not have role=\"menu\", and the expandable sidebar control implemented as an <a> should not have role=\"navigation\" (it behaves like a button). Prefer removing incorrect role attributes (native semantics are best) or using role=\"button\" only when necessary, while keeping correct aria-expanded/aria-controls.

Copilot · 2026-02-10T09:08:36Z

docs/category_study_core.html

+        chatInput.addEventListener('keypress', (e) => {
+            if (e.key === 'Enter' && !e.shiftKey) {
+                e.preventDefault();
+                handleSend();
+            }
+        });


keypress is deprecated in modern browsers and may behave inconsistently across IME/input methods. Use keydown/keyup for Enter handling instead, and ensure the handler doesn’t interfere with text composition events if needed.

Copilot · 2026-02-10T09:08:36Z

deploy.sh

+# Replace placeholder with actual backend URL
+sed "s|__BACKEND_URL__|$BACKEND_URL|g" "$WIDGET_FILE" > "$WIDGET_TEMP"


The sed replacement is not safe for arbitrary URLs: characters like & (and some backslash sequences) in $BACKEND_URL can be interpreted by sed and lead to incorrect output or partial replacements. Recommend escaping the replacement string before passing it to sed, or switching to a tool that safely handles literal replacements (e.g., a small Python one-liner doing .replace() on the file contents).

Copilot · 2026-02-10T09:08:36Z

deploy.sh

+# Restore placeholder in widget (so we don't commit the URL)
 echo "🔒 Step 5: Restoring placeholder in widget..."
-sed "s|$OPENROUTER_API_KEY|__OPENROUTER_API_KEY__|g" "$WIDGET_FILE" > "$WIDGET_TEMP"
+sed "s|$BACKEND_URL|__BACKEND_URL__|g" "$WIDGET_FILE" > "$WIDGET_TEMP"


The sed replacement is not safe for arbitrary URLs: characters like & (and some backslash sequences) in $BACKEND_URL can be interpreted by sed and lead to incorrect output or partial replacements. Recommend escaping the replacement string before passing it to sed, or switching to a tool that safely handles literal replacements (e.g., a small Python one-liner doing .replace() on the file contents).

Copilot · 2026-02-10T09:08:36Z

deploy.sh

+# Create knowledge base context (only if it doesn't exist or if manually created)
+CONTEXT_FILE="pheno_knowledge_base_expanded/knowledge-base-context.txt"
+if [ ! -f "$CONTEXT_FILE" ]; then
+    echo "📚 Step 2: Creating knowledge base context (file doesn't exist)..."
+    ./create-knowledge-base.sh
+    echo ""
+else
+    echo "📚 Step 2: Using existing knowledge base context (skipping regeneration)"
+    echo "   File: $CONTEXT_FILE"
+    echo "   Size: $(du -h "$CONTEXT_FILE" | cut -f1)"
+    echo "   (To regenerate, delete the file and run deploy again)"
+    echo ""


Skipping knowledge-base regeneration when the context file exists can easily ship stale context after documentation updates (the most common case). Consider regenerating by default, or at least regenerating when source files are newer than $CONTEXT_FILE (e.g., via timestamp checks), with an option/flag to skip regeneration when desired.

Suggested change

# Create knowledge base context (only if it doesn't exist or if manually created)

CONTEXT_FILE="pheno_knowledge_base_expanded/knowledge-base-context.txt"

if [ ! -f "$CONTEXT_FILE" ]; then

echo "📚 Step 2: Creating knowledge base context (file doesn't exist)..."

./create-knowledge-base.sh

echo ""

else

echo "📚 Step 2: Using existing knowledge base context (skipping regeneration)"

echo " File: $CONTEXT_FILE"

echo " Size: $(du -h "$CONTEXT_FILE" | cut -f1)"

echo " (To regenerate, delete the file and run deploy again)"

echo ""

# Create knowledge base context

CONTEXT_FILE="pheno_knowledge_base_expanded/knowledge-base-context.txt"

if [ "$SKIP_KB_REGEN" = "1" ] || [ "$SKIP_KB_REGEN" = "true" ]; then

if [ -f "$CONTEXT_FILE" ]; then

echo "📚 Step 2: Using existing knowledge base context (regeneration skipped via SKIP_KB_REGEN)"

echo " File: $CONTEXT_FILE"

echo " Size: $(du -h "$CONTEXT_FILE" | cut -f1)"

echo " (Unset SKIP_KB_REGEN to force regeneration on deploy)"

echo ""

else

echo "❌ ERROR: SKIP_KB_REGEN is set but knowledge base context file was not found:"

echo " Expected file: $CONTEXT_FILE"

echo " Either create it (run ./create-knowledge-base.sh) or unset SKIP_KB_REGEN and re-run deploy."

exit 1

fi

else

echo "📚 Step 2: Creating/regenerating knowledge base context..."

./create-knowledge-base.sh

echo ""

MariaGorodetski and others added 19 commits February 4, 2026 12:57

git ignore

84f8da1

Stop tracking docs/ folder (build output)

6dce12d

removed extra link from all notebooks

c747a29

Recreate voice notebook from markdown and fix bullet formatting

96f3bc6

Render all site changes and update gait notebook

6141065

Deploy chatbot to GitHub Pages

4fdd3a3

chatbot integration

320265c

Enable chatbot widget and redeploy

1ad7ff9

Fix duplicate CORS headers and Lambda URL path

3900c09

Update the README.md

8169540

update the knowledge-base

da40f53

Improve chatbot UX: better formatting, shorter responses, localStorag…

26efd75

…e persistence

Improve chatbot UX: shorter responses, better formatting, localStorag…

5ee7838

…e persistence

Merge feat/chatbot-integration into main - remove lambda-deployment.z…

9e18c8d

…ip from tracking

edit .gitignore file so it won't include pheno_knowledge_base_expande…

2e8943a

…d/knowledge-base-context.txt file

shaykraitman requested review from MariaGorodetski and Copilot February 10, 2026 09:05

Copilot AI reviewed Feb 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embed Chatbot into the Pheno Docs Expanded Website#2

Embed Chatbot into the Pheno Docs Expanded Website#2
shaykraitman wants to merge 19 commits intoPhenoAI:feature/shay-chatbot-integrationfrom
shaykraitman:main

shaykraitman commented Feb 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		// Render markdown links as HTML
		contentDiv.innerHTML = renderMarkdown(content);

	BACKEND_URL: 'https://y365pk3iuvzvmudftiamclgqiu0oofmj.lambda-url.eu-west-1.on.aws/api/chat' // Will be injected during build
	BACKEND_URL: '__BACKEND_URL__' // Will be injected during build

		# Replace placeholder with actual backend URL
		sed "s\|__BACKEND_URL__\|$BACKEND_URL\|g" "$WIDGET_FILE" > "$WIDGET_TEMP"

Conversation

shaykraitman commented Feb 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants