FEAT Add WordDocConverter for Word document generation by ducktv1203 · Pull Request #1365 · Azure/PyRIT

ducktv1203 · 2026-02-10T12:10:21Z

Description:
Adds WordDocConverter - a file converter that transforms text prompts into Word documents (.docx). Issue #424.

Two modes:

Direct generation: plain text prompt is written into a new .docx with configurable font name and size
Template-based generation: an existing .docx file containing jinja2 {{ prompt }} placeholders is used as a template; placeholders are replaced with the prompt text while preserving all original formatting, tables, headers, and footers. The original file is never modified.

Files changed:

pyrit/prompt_converter/word_doc_converter.py: converter implementation
pyrit/prompt_converter/init.py: export WordDocConverter
pyproject.toml: added python-docx>=1.2.0 dependency
tests/unit/converter/test_word_doc_converter.py: 21 unit tests

Tests and Documentation:
Tests: 21 unit tests covering init/validation, direct generation, template-based generation (body paragraphs, tables, multiple placeholders, no-placeholder passthrough), end-to-end with real .docx output, and identifier correctness. All passed. (21/21)

…te rendering and integration with PyRIT's data serialization system.

ducktv1203 · 2026-02-10T12:12:15Z

@microsoft-github-policy-service agree

Copilot

Pull request overview

Adds a new PyRIT file converter that emits Word documents (.docx) from text prompts, complementing the existing PDF file-converter tooling and documentation.

Changes:

Introduces WordDocConverter with direct .docx generation and template-based placeholder injection.
Exports WordDocConverter from pyrit.prompt_converter.
Adds unit tests and updates converter documentation; updates project dependencies for python-docx.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`pyrit/prompt_converter/word_doc_converter.py`	Implements `.docx` generation + template injection and serialization.
`pyrit/prompt_converter/__init__.py`	Exposes `WordDocConverter` in the package exports.
`pyproject.toml`	Adds `python-docx` dependency (but also introduces a duplicate `pypdf` entry).
`tests/unit/converter/test_word_doc_converter.py`	Adds unit tests for direct + template modes and identifier behavior.
`doc/code/converters/5_file_converters.py`	Documents Word document conversion usage alongside PDF converters.

Copilot · 2026-02-10T13:12:12Z

pyrit/prompt_converter/word_doc_converter.py

+    async def _serialize_docx(self, docx_bytes: bytes) -> DataTypeSerializer:
+        """
+        Save the generated ``.docx`` bytes through PyRIT's data serializer.
+
+        The serializer picks a unique filename and writes the bytes to the configured storage location (local disk by default).
+
+        Args:
+            docx_bytes: Raw content of the Word document.
+
+        Returns:
+            DataTypeSerializer: Serializer whose ``.value`` contains the output path.
+        """
+        docx_serializer = data_serializer_factory(
+            category="prompt-memory-entries",
+            data_type="binary_path",
+            extension="docx",
+        )
+
+        await docx_serializer.save_data(docx_bytes)
+
+        return docx_serializer


Async private method _serialize_docx doesn’t follow the project convention that async methods must end with _async. Rename it (e.g., _serialize_docx_async) and update the call site and related tests/patches accordingly.

Copilot · 2026-02-10T13:12:13Z

pyrit/prompt_converter/word_doc_converter.py

+        # Rewind to read from the start of the stored bytes.
+        self._existing_doc_bytes.seek(0)
+        document = Document(self._existing_doc_bytes)
+


self._existing_doc_bytes is a shared BytesIO whose cursor is mutated via seek(0). If convert_async is called concurrently on the same converter instance, the shared cursor can cause races/corrupted reads. Store immutable bytes instead and create a new BytesIO per conversion (or otherwise guard access).

Suggested change

# Rewind to read from the start of the stored bytes.

self._existing_doc_bytes.seek(0)

document = Document(self._existing_doc_bytes)

existing_doc_bytes = self._existing_doc_bytes

if isinstance(existing_doc_bytes, BytesIO):

template_bytes = existing_doc_bytes.getvalue()

else:

template_bytes = existing_doc_bytes

document_stream = BytesIO(template_bytes)

document = Document(document_stream)

Copilot · 2026-02-10T13:12:13Z

pyrit/prompt_converter/word_doc_converter.py

+            (e.g. ``{{ prompt }}``) while preserving all original formatting. 
+        If no template was provided, a new document is generated where each line of the prompt (split on ``\\n``)
+            becomes a new paragraph with the configured font settings.
+
+        Args:
+            prompt: The text to embed in the Word document.
+            input_type: Must be ``text``.


This docstring claims template-based injection preserves “all original formatting”, but _render_paragraph rewrites runs and can collapse mixed formatting when placeholders span runs (it clears subsequent runs and applies first-run formatting to the whole rendered text). Please either (a) implement run-aware replacement that preserves mixed formatting, or (b) adjust the docstring/behavior description to reflect this limitation.

Suggested change

(e.g. ``{{ prompt }}``) while preserving all original formatting.

If no template was provided, a new document is generated where each line of the prompt (split on ``\\n``)

becomes a new paragraph with the configured font settings.

Args:

prompt: The text to embed in the Word document.

input_type: Must be ``text``.

(e.g. ``{{ prompt }}``), using the original document's layout and styling as a base. Mixed formatting within or

across placeholder regions may not be preserved exactly.

If no template was provided, a new document is generated where each line of the prompt (split on ``\\n``)

becomes a new paragraph with the configured font settings.

Args:

prompt (str): The text to embed in the Word document.

input_type (PromptDataType): Must be ``text``.

Copilot · 2026-02-10T13:12:14Z

pyrit/prompt_converter/word_doc_converter.py

+            template = Template(full_text)
+            rendered_text = template.render(**template_vars)
+        except Exception as e:
+            logger.warning(f"Failed to render paragraph template: {e}")
+            return


Rendering arbitrary Jinja2 templates from document text via Template(full_text).render(...) is unsafe if the template content is not fully trusted (Jinja2 templates can be abused for code execution/data access). Consider using jinja2.sandbox.SandboxedEnvironment, restricting to a simple {{ prompt }} replacement, or otherwise documenting and enforcing that templates must be trusted.

Copilot · 2026-02-10T13:12:14Z

pyrit/prompt_converter/word_doc_converter.py

+from io import BytesIO
+from pathlib import Path
+from typing import Optional
+from docx import Document
+from docx.shared import Pt
+from jinja2 import Template
+
+from pyrit.common.logger import logger
+from pyrit.identifiers import ConverterIdentifier
+from pyrit.models import PromptDataType, data_serializer_factory
+from pyrit.models.data_type_serializer import DataTypeSerializer
+from pyrit.prompt_converter.prompt_converter import ConverterResult, PromptConverter


Import grouping is inconsistent with other modules (stdlib vs third-party vs local). Add a blank line between standard-library imports (io/pathlib/typing) and third-party imports (docx/jinja2) to match the repository’s import organization pattern.

Copilot · 2026-02-10T13:12:14Z

pyproject.toml

    "pyodbc>=5.1.0",
    "python-dotenv>=1.0.1",
+    "python-docx>=1.2.0",
+    "pypdf>=5.1.0",


dependencies lists pypdf twice with different minimum versions (>=5.1.0 and >=6.6.2). This is conflicting/ambiguous for resolvers and should be collapsed to a single requirement (likely keep only the stricter >=6.6.2 unless there’s a specific reason to lower it).

Suggested change

"pypdf>=5.1.0",

Copilot · 2026-02-10T13:12:15Z

doc/code/converters/5_file_converters.py

+# The `WordDocConverter` generates Word documents (.docx) from text using `python-docx`. It supports two modes:
+#
+# 1. **Direct generation**: Convert plain text strings into Word documents. The prompt becomes the document content.
+# 2. **Template-based generation**: Supply an existing `.docx` file containing jinja2 placeholders (e.g., `{{ prompt }}`). The converter replaces placeholders with the prompt text while preserving the original document's formatting, tables, headers, and footers. The original file is never modified — a new file is always generated.


The docs state that template-based generation preserves the original document’s formatting. Given the current implementation can collapse run-level formatting when placeholders span multiple runs, please either update the documentation to mention this limitation or improve the implementation to truly preserve mixed formatting.

Copilot · 2026-02-10T13:12:15Z

doc/code/converters/5_file_converters.py

+# This mode takes an existing `.docx` file that contains jinja2 `{{ prompt }}` placeholders and replaces them with the provided prompt text. This is useful for embedding adversarial content into realistic document templates (e.g., resumes, reports, invoices) while preserving all original formatting.
+
+# %%
+import tempfile


This import of module tempfile is redundant, as it was previously imported on line 144.

Suggested change

import tempfile

Copilot · 2026-02-10T13:12:15Z

pyrit/prompt_converter/word_doc_converter.py

+from pyrit.prompt_converter.prompt_converter import ConverterResult, PromptConverter
+
+
+class WordDocConverter(PromptConverter):


This class does not call PromptConverter.init during initialization. (WordDocConverter.init may be missing a call to a base class init)

ducktv1203 added 5 commits February 7, 2026 18:41

FEAT: Add python-docx library

9027fdd

FEAT: register WordDocConverter in prompt_converter __init__

8daf888

FEAT: Word document converter implementation, including Jinja2 templa…

9ffe8c9

…te rendering and integration with PyRIT's data serialization system.

FEAT: Add Word document converter unit tests

9983804

Merge branch 'main' into feat/word-document-converter

ab9a5c0

ducktv1203 marked this pull request as draft February 10, 2026 12:13

Add WordDocConverter documentation examples

2e6b7e7

romanlutz requested a review from Copilot February 10, 2026 13:05

Copilot started reviewing on behalf of romanlutz February 10, 2026 13:06 View session

Copilot AI reviewed Feb 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT Add WordDocConverter for Word document generation#1365

FEAT Add WordDocConverter for Word document generation#1365
ducktv1203 wants to merge 6 commits intoAzure:mainfrom
ducktv1203:feat/word-document-converter

ducktv1203 commented Feb 10, 2026

Uh oh!

ducktv1203 commented Feb 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-        # Rewind to read from the start of the stored bytes.
-        self._existing_doc_bytes.seek(0)
-        document = Document(self._existing_doc_bytes)
+        existing_doc_bytes = self._existing_doc_bytes
+        if isinstance(existing_doc_bytes, BytesIO):
+            template_bytes = existing_doc_bytes.getvalue()
+        else:
+            template_bytes = existing_doc_bytes
+        document_stream = BytesIO(template_bytes)
+        document = Document(document_stream)

		from pyrit.prompt_converter.prompt_converter import ConverterResult, PromptConverter


		class WordDocConverter(PromptConverter):

Conversation

ducktv1203 commented Feb 10, 2026

Uh oh!

ducktv1203 commented Feb 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant