A DOCX editor for AI Agents: lightweight CLI toolkit for safe, reproducible .docx extraction and patching with python-docx.
- Extract DOCX into structured JSON (
docx-structure.v2) - Optionally emit RAG-friendly chunks (
docx-llm-chunks.v1) - Apply block-targeted patches with validation guards
- Support Markdown-aware writeback (inline + block-level)
- Fill arbitrary table cells from JSON (
layout: "cell-map") for generic matrix/table use cases
- Python 3.10+
python-docx
Install:
py -3 -m pip install python-docxUpstream project (python-docx):
Extract structure:
py -3 scripts/extract_docx_for_llm.py --in input.docx --out structure.v2.jsonOptional RAG output:
py -3 scripts/extract_docx_for_llm.py --in input.docx --out structure.v2.json --rag-output rag.v1.jsonApply patch:
py -3 scripts/apply_docx_patch.py --in input.docx --out output.docx --patch patch.jsonFill generic table cells from JSON spec:
py -3 scripts/fill_docx_table_from_json.py --in input.docx --out output.docx --spec cell-map.jsonExample spec (SWOT 2x2):
references/examples/table-cell-map-swot.jsonNotes:
table_index,row,colare 1-based.- Optional per-cell flags:
mode(replace|append) andclear_first(true|false).
Preview helpers:
py -3 scripts/docx_preview.py --helpCurrently supported in scripts/apply_docx_patch.py:
replace_textset_paragraphdelete_paragraphreplace_paragraph_rangereplace_paragraph_range_markdown
Full schemas and contracts are documented in SKILL.md and references/python-docx-quickref.md.
- Fails on unexpected match counts (
expected_matches) - Refuses risky cross-run replacements
- Supports expectation guards (
expected_*) to avoid wrong target writes - Heading-protection for range replacements by default
py -3 scripts/selftest.pyscripts/– executable toolsreferences/– quick reference notes and spec snippets:references/ooxml-numbering-notes.md(OOXML numbering model + restart behavior)references/python-docx-list-behavior-notes.md(official python-docx list/style behavior)
SKILL.md– canonical workflow/spec documentation
For numbering/list bugs, prefer Microsoft Learn OpenXML docs as the trusted external source before applying fixes.
This repository is the source of truth. After changes, sync tracked/non-ignored files into the agent skill folder (e.g. boku-martin/skills/python-docx-editor) so runtime uses the same version.
Example (PowerShell):
git -C C:\Users\dagobert-ai\.openclaw\workspace\projects\python-docx-editor `
ls-files --cached --others --exclude-standardCopy that file list to the target skill directory.
GNU General Public License v3.0 (GPL-3.0). See LICENSE.
Project is focused on practical CLI workflows and LLM-assisted editing pipelines.