Harness Engineering and Best Practices for Coding Agents

A reference guide and Claude Code plugin for long-running AI coding agent harnesses

Andrej Karpathy put it bluntly: "The agents do not listen to my instructions." They bloat abstractions, copy-paste code blocks, and ignore style guidance, no matter how carefully you write your AGENTS.md. If the person who coined "context engineering" can't get agents to follow written rules, the answer isn't better prompting. It's mechanical enforcement: git hooks that block bad code before it lands, linters that catch what instructions can't, and rule files that load only when relevant so the agent's finite attention isn't wasted.

Harness engineering is context engineering applied to coding agents: structuring rule files, planning before building, enforcing quality with automation, and keeping documentation in sync with code so the agent stays aligned across long sessions.

This repo contains:

A reference guide that maps 20+ best practices from OpenAI, Augment Code, Anthropic, and practitioners like Andrej Karpathy (AI researcher, co-founder of OpenAI), Boris Cherny (creator of Claude Code), and Thariq Shihipar (Claude Code team at Anthropic) to concrete implementation patterns.
A Claude Code plugin that configures your developer environment for agent-assisted development. Install the plugin and get two skills that do the actual work:
- /readiness: Analyzes any existing codebase and produces a scored readiness report across 8 pillars and 5 maturity levels. Shows you exactly where you stand, what's missing, and what to fix first. Saves reports for delta tracking over time. Works with any language or stack.
- /setup: Configures your project end-to-end through Socratic questioning. It scaffolds CLAUDE.md files, installs enforcement scripts (secret scanning, file size limits, test colocation, auto-generated docs, drift detection), wires up git hooks that run those scripts on every commit and push, sets up linter/formatter configs, and creates .claude/settings.json with safe permission defaults. The scripts and hooks it installs are the enforcement layer. They catch what instructions and prose cannot. Supports any language or stack; Node/TypeScript is the recommended default for web projects, but the skill adapts to Python, Go, Rust, C/C++, and more.

Pipeline: Plan → Execute → Commit → Push

Quick Start

Prerequisites: Claude Code with plugin support, Git, and Node.js (v18+) for Node/TypeScript projects.

1. Install the plugin

In Claude Code, run:

/plugin marketplace add jrenaldi79/harness-engineering
/plugin install harness-engineering@harness-engineering

That's it. Once installed, you don't need to remember any commands. Just ask naturally:

"How ready is my codebase for AI agents?" "Analyze my project" "Set up enforcement in my project"

Claude will recognize what you need and invoke the right skill automatically. You can also use the slash commands directly: /readiness and /setup.

2. Assess your codebase (existing projects)

/readiness

Or just ask: "How ready is my project?", "Analyze my codebase", "What should I improve?"

For existing projects, start here. The readiness report scores your project across 8 pillars, assigns a maturity level (1-5), and gives you prioritized recommendations. It saves the report to readiness-report.md so you can track improvement over time.

For new/empty projects, skip to step 3.

3. Set up your project

/setup

Or just ask: "Set up my project", "Bootstrap this repo", "Add quality enforcement"

The /setup skill walks you through Socratic questions to determine your stack (language, framework, testing approach, deployment target), then scaffolds and configures the project. It works with any language. Node/TypeScript is the recommended default for web projects, but the skill adapts to Python, Go, Rust, C/C++, and more.

What happens during setup:

Step	What Happens
Discovery	Socratic questions determine your stack, language, and project goals
Init	Creates project structure and initializes git
Dependencies	Installs tooling appropriate for your stack
Scripts	Copies enforcement scripts into `scripts/`
Hooks	Sets up pre-commit and pre-push git hooks
Configs	Copies linter, formatter, and environment configs
Permissions	Creates `.claude/settings.json` with allow/deny lists for safe defaults
Rules	Installs `.claude/rules/` with path-scoped rules (TDD, code quality, testing, TypeScript)
CLAUDE.md	Generates tailored `CLAUDE.md` files for global and project scope

Works on macOS, Linux, and Windows.

4. Start building

git add -A && git commit -m "Initial project setup"

The pre-commit hook runs automatically. If everything passes, your harness is active.

Troubleshooting

Problem	Solution
`/plugin install` not recognized	Make sure you're using Claude Code (the CLI), not the web chat. Plugin support requires Claude Code.
Skills don't trigger after install	Restart your Claude Code session. Skills load on session start.
`/readiness` can't find setup references	Both skills are part of the same plugin. Reinstall with `/plugin marketplace add jrenaldi79/harness-engineering` then `/plugin install harness-engineering@harness-engineering`.

Agentic Planning & Execution

A well-maintained harness keeps your agent aligned. But the largest improvements come from planning before writing code. Long-running coding agents need structured planning, adversarial review, and test-driven development to produce reliable results.

Why Planning Matters

Agents that start coding without a plan produce more churn than progress. They make architectural decisions in isolation, create inconsistent patterns, and build features that don't fit together. Better planning upfront fixes this.

Do not rely on built-in "plan mode" in any AI coding tool. These are shallow outlines, not engineering plans. Instead, use agentic planning systems that produce structured specs, decompose work into atomic tasks, and include review gates.

TDD Is Required

Write failing tests first, implement the minimum to pass, then refactor. Every task follows Red-Green-Refactor. The global CLAUDE.md template enforces this.

Integrated Development Workflows

These agentic development systems plug directly into your harness. They don't just plan. They enforce a structured workflow from ideation through implementation, code review, and quality gates:

For Large Projects: BMAD Method

BMAD (Breakthrough Method for Agile AI-Driven Development) is a full-lifecycle framework with 9 specialized agent personas, 34+ workflows, and built-in adversarial review. Install it for projects with significant architectural scope.

npx bmad-method install

Phase	What Happens	Key Agent
Analysis	Brainstorming, domain research, product brief	Analyst
Planning	PRD creation, UX design	Product Manager
Solutioning	Architecture design, epic/story breakdown, readiness check	Architect
Implementation	Sprint planning, development, code review, QA	Developer, QA

The adversarial review matters most here. BMAD reviewers are mandated to find issues. "Looks good" with zero findings triggers re-analysis. This catches architectural flaws before they become expensive. The reviewer operates with fresh context (no access to the original reasoning), which prevents confirmation bias.

Use /bmad-help to see what step comes next. Start fresh chats for each workflow to avoid context window limits.

For Feature Development: Superpowers Plugin

Superpowers is a Claude Code plugin with 16 composable skills that enforce a structured workflow: brainstorm, plan, implement, review.

/plugin marketplace add obra/superpowers-marketplace
/plugin install superpowers@superpowers-marketplace

Use /superpowers:brainstorming instead of built-in plan mode. It runs a Socratic design session: asking clarifying questions, exploring 2-3 approaches with tradeoffs, and producing a validated design document before any code is written.

Key skills:

Skill	What It Does
`brainstorming`	Socratic design session with structured output
`writing-plans`	Decomposes designs into 2-5 minute atomic tasks with exact file paths and code
`subagent-driven-development`	Dispatches a fresh subagent per task with two-stage review (spec compliance + code quality)
`dispatching-parallel-agents`	Runs multiple independent subagents concurrently
`test-driven-development`	Enforces strict Red-Green-Refactor per task
`systematic-debugging`	4-phase root-cause investigation before any fix

The parallel workflow is the strongest feature. The writing-plans skill decomposes a design into atomic tasks (2-5 minutes each) with explicit file paths, dependencies, and execution order. It groups independent tasks into chunks that can run concurrently, while sequencing tasks that depend on each other's output. The subagent-driven-development skill then dispatches a fresh subagent per task. Each gets clean context with only the spec it needs, implements with TDD, and goes through two-stage review (spec compliance, then code quality) before the next task begins. The main agent acts as controller, providing context and answering subagent questions without polluting its own context window.

Prefer this over agent swarms. Superpowers scopes work into dependency-aware atomic units assigned to individual subagents, each with clean context. Agent teams (swarms) share state and coordinate via messaging, which creates overhead and context pollution. Sequential subagent dispatch with a controller is more reliable: no crosstalk, no merge conflicts, no duplicate work. Each subagent's output is verified before it feeds into the next task.

For Multi-LLM Adversarial Review: Claude Sidecar

Claude Sidecar spawns parallel conversations with other LLMs (Gemini, GPT, DeepSeek, Grok, 200+ models via OpenRouter) alongside your Claude Code session. Use it to stress-test architecture plans with multiple models before committing to a direction.

npm install -g claude-sidecar
sidecar start --model gemini-pro --agent plan --prompt "Review this architecture for weaknesses"

The sidecar model receives your full Claude Code conversation context automatically. Plan mode (read-only) ensures the reviewer can analyze but not modify files. Results fold back into your main session as structured summaries.

This works well for architectural decisions: send the same plan to 2-3 different models simultaneously, get independent critiques, then synthesize the best feedback.

For Comprehensive Agent Configuration: Everything Claude Code

Everything Claude Code is a comprehensive agent harness with 17 specialized agents, 81+ skills, and 43+ slash commands. It includes its own multi-model planning system (/multi-plan routes to specialist models, /multi-execute coordinates parallel implementation).

/plugin marketplace add affaan-m/everything-claude-code
/plugin install everything-claude-code@everything-claude-code

Its /orchestrate command chains planner, TDD guide, code reviewer, security reviewer, and architect agents. It also has a continuous learning system that extracts patterns from sessions into reusable skills.

Choosing the Right Tool

Scenario	Tool	Why
Large project, complex architecture	BMAD	Full lifecycle with adversarial review at each gate
Feature work, day-to-day development	Superpowers	Brainstorm → plan → TDD → subagent execution → code review in one enforced workflow
Stress-testing a design decision	Claude Sidecar	Independent review from multiple models
Full agent harness with everything built in	Everything Claude Code	Comprehensive: 17 agents, orchestration, continuous learning

These workflows sit on top of this kit's enforcement layer. The workflows steer development. The harness enforces quality at every commit.

Mapping to Industry Best Practices

This kit implements the core patterns from leading voices in agent-assisted engineering:

OpenAI: Harness Engineering (Feb 2026): Built a product with zero manually-written code using Codex. By Ryan Lopopolo.
Augment Code: Your Agent's Context Is a Junk Drawer (Feb 2026): Research-backed analysis of why more context makes agents worse. By Sylvain Giuliani.
Boris Cherny: Creator of Claude Code at Anthropic. Tips shared via Threads and interviews.
Thariq Shihipar (@trq212): Claude Code team at Anthropic. Published lessons on prompt caching, agent design, and spec-driven development.
Andrej Karpathy: AI researcher, former head of AI at Tesla, co-founder of OpenAI. Coined "context engineering" as the successor to prompt engineering. Later acknowledged that agents "do not listen to my instructions" in AGENTS.md. They bloat abstractions, copy-paste code, and ignore style guidance. The case for automated enforcement over advisory prose.
Birgitta Böckeler: Principal technologist at Thoughtworks. Context engineering for coding agents on Martin Fowler's site.
Simon Willison: Creator of Datasette, Django co-creator. Agentic engineering patterns and practical CLAUDE.md guidance.
Jesse Vincent: Creator of Superpowers plugin for Claude Code. Discovered that compliance beats comprehension in skill design.
DHH: Creator of Ruby on Rails, co-founder of Basecamp/37signals. Convention over configuration as the foundation for agent-friendly codebases.
Akshay Kothari: COO of Notion. Shared his Claude Code development setup emphasizing CLAUDE.md as single source of truth, pre-commit hooks, lint-staged, and file size enforcement. Validated our enforcement approach.
Factory.ai: Using Linters to Direct Agents (Sep 2025) by Alvin Sng. "Agents write the code; linters write the law." Formalized seven lint rule categories. Also published Agent Readiness, a framework for measuring how well a codebase supports autonomous development across eight pillars and five maturity levels, with automated remediation via agent-powered PRs. Inspired our /readiness skill.

Best Practice	Sources	What They Found	This Kit's Implementation
Map, not a manual	OpenAI, Augment, Boris, Willison	OpenAI: "Give Codex a map, not a 1,000-page manual." Augment: ETH Zurich research shows context files reduce task success rates while increasing cost 20%+. Boris: Their CLAUDE.md is ~2.5k tokens. Willison: "As few instructions as possible."	Two-Tier CLAUDE.md system: ~200-300 line global + ~200-500 line project file. Templates enforce conciseness by design.
Index over encyclopedia	OpenAI, Augment	OpenAI: AGENTS.md should be ~100 lines as a table of contents. Augment: Vercel compressed 40KB of docs into an 8KB index file and got 100% pass rate on build/lint/test.	Docs Map pattern in `project-claude.md` links to `docs/*.md` files. CLAUDE.md stays lean; detail lives in docs.
Progressive disclosure	OpenAI, Augment, Thariq	OpenAI: Agents start with a small entry point and are taught where to look next. Augment: "Two buckets": only document what the agent can't derive from code itself. Thariq: Let agents discover tools incrementally rather than pre-loading everything.	Three tiers: Tier 1 (CLAUDE.md, every conversation), Tier 2 (`docs/`, on demand), Tier 3 (`docs/plans/`, rarely).
Finite attention budget	Augment, Karpathy, Anthropic	Karpathy: "Context engineering is the delicate art of filling the context window with just the right information." Augment: Instruction-following degrades as constraint density increases. Anthropic: "All components compete for the same finite resource."	200-300 line CLAUDE.md target. Templates use `<!-- TIP -->` comments to guide what to include vs. omit.
Failure-backed rules only	Augment, Willison, DHH	Augment: "Would the agent make a mistake without this? If no, delete it." Willison: Only universally applicable instructions. DHH: Conventions create 20 years of training data, so don't re-explain what agents already know.	Templates include only actionable rules: commands, gotchas, constraints. No generic best practices or restated conventions.
Repository as system of record	OpenAI, Augment, Thariq	OpenAI: "What Codex can't see doesn't exist." Augment: Don't restate what's in code. Thariq: "The file system is an elegant way of representing state that your agent could read into context."	Templates encode architecture, commands, and gotchas in CLAUDE.md and `docs/`. AUTO markers write generated content to files.
Linters over instructions	Augment, OpenAI, Böckeler, Vincent, Factory.ai, Karpathy	Augment: "Never send an LLM to do a linter's job." Böckeler: "Agents flounder in unconstrained environments." Vincent: "Hard gates test compliance." Factory.ai: "Agents write the code; linters write the law." Karpathy: Agents bloat abstractions and ignore style guidance in AGENTS.md.	Three-layer enforcement: git hooks block violations mechanically, `.claude/rules/` provides path-scoped advisory context, CLAUDE.md sets global principles. Priority: automated checks > rules > prose.
Grep-ability	Factory.ai	Named exports over defaults, absolute imports, consistent error types. When every symbol has exactly one name across the codebase, agents can search/replace with confidence during multi-file refactors. Default exports let consumers pick any name, breaking grep-based navigation.	`rules/typescript.md` advises naming conventions. ESLint template includes `import/no-default-export`.
Automated enforcement	OpenAI, Boris, Thariq, Akshay Kothari	OpenAI: Custom linters and CI validate docs are up to date. Boris: PostToolUse hooks auto-format every file edit. Thariq: Hooks for deterministic verification. Akshay: Notion's Claude Code setup uses pre-commit hooks, lint-staged, file size limits, and secret scanning, the same stack this kit provides.	Pre-commit hooks run 6 checks automatically: lint, secret scan, file size, test colocation, doc generation, drift warning.
Auto-generated docs	OpenAI	A "doc-gardening" agent scans for stale documentation and opens fix-up PRs.	`generate-docs.js` auto-regenerates CLAUDE.md sections from source code on every commit via AUTO markers.
Drift detection / self-improvement	OpenAI, Augment, Boris	OpenAI: Documentation "rots instantly." Boris: "Update your CLAUDE.md so you don't make that mistake again." Claude writes rules for itself, compounding institutional knowledge.	`validate-docs.js` warns when source files change without CLAUDE.md updates. Global template includes self-improvement loop guidance.
Enforce invariants, not implementations	OpenAI, Augment, DHH	OpenAI: "Set boundaries, allow autonomy locally." DHH: "Convention over configuration." Agents predict conventional code extremely well. Augment: Don't restate conventions your linter already enforces.	File size limits (300 lines), complexity red flags, and configurable `CONFIG` objects. Rules are strict; how you meet them is flexible.
Verification feedback loops	Boris, Thariq, Karpathy	Boris: "Give Claude a way to verify its work" for 2-3x quality improvement." Thariq's agent loop: Gather Context → Take Action → Verify Work. Karpathy: "Give it success criteria and watch it go."	Global template enforces TDD (Red-Green-Refactor). Pre-push hook blocks on test failure. Pre-commit runs lint + secret scan.
Spec-driven development	Thariq, Boris	Thariq: Have Claude interview you with 40+ questions to build a comprehensive spec before coding. Execute in a separate session. Boris: "Start in Plan mode, iterate until satisfied, then auto-accept."	Referenced in Planning Tools section. BMAD's analysis phase and Superpowers brainstorming implement this pattern.
Parallel sessions via worktrees	Boris	"The single biggest productivity unlock, and the top tip from the team." Run 3-5+ Claude sessions simultaneously with separate git worktrees.	Referenced in Planning Tools section. Superpowers `using-git-worktrees` skill automates this.
Design for prompt caching	Thariq	"You fundamentally have to design agents for prompt caching first." Static content first, dynamic last. Never switch models mid-conversation; use subagents instead.	Two-tier CLAUDE.md is inherently cache-friendly: static global + static project files loaded once at conversation start.
Codify repetitive workflows	Boris, Thariq	Boris: "Convert anything done more than once daily into a slash command." Check into git for team sharing. Include inline bash preprocessing to pre-compute context.	Bootstrap creates `scripts/` directory with 6 enforcement scripts. Templates encourage building project-specific commands and skills.
Subagent dispatch over swarms	Boris, Vincent	Boris: Use subagents to keep main context clean by offloading subtasks to preserve focus. Vincent: Decompose plans into dependency-aware atomic units; dispatch one subagent per task with two-stage review. Sequential dispatch with a controller avoids crosstalk and merge conflicts.	Referenced in Planning Tools section. Superpowers `subagent-driven-development` and `dispatching-parallel-agents` implement this.
Golden principles	OpenAI	Opinionated rules encoded in the repo, with background tasks that scan for deviations and open refactoring PRs.	Global CLAUDE.md template encodes universal standards (TDD, naming, security). Enforcement scripts catch deviations on every commit.
Structured architecture	OpenAI	Rigid layered domain architecture with validated dependency directions, enforced by custom linters and structural tests.	Bootstrap creates `src/`, `tests/`, `scripts/`, `docs/` structure. Templates guide modular design with file and function size constraints.
Smart CI / test caching	OpenAI	"Corrections are cheap, and waiting is expensive." Minimize blocking gates, maximize throughput.	SHA-based test caching in pre-push hook. Tests only re-run when code changes, not on every push.
Secret detection	OpenAI	Security as a mechanical constraint, not a discipline problem.	`check-secrets.js` pattern-matches for API keys, tokens, and private keys. Blocks commits automatically.

What this kit doesn't cover (advanced patterns from these sources): Chrome DevTools integration for UI testing, local observability stacks (logs/metrics/traces), agent-to-agent code review, git worktree isolation per change, execution plans as first-class CI artifacts, automated context engines that derive patterns from code without configuration, and prompt-based stop hooks for long-running autonomous tasks. These are enterprise-scale patterns that build on top of the foundation this kit provides.

Readiness Analysis

Before running /setup, use /readiness to understand where your codebase stands. The readiness skill evaluates your project across 8 pillars and 37 criteria, producing a scored report with a maturity level from 1 to 5.

/readiness

8 Evaluation Pillars

Pillar	What It Checks	Scope
Style & Validation	Linter, formatter, lint-on-commit, no default exports	Repo
Testing	Test runner, colocation, coverage, TDD enforcement	App
Git Hooks & Enforcement	Pre-commit, pre-push, secret scanning, file size limits, smart caching	Repo
Documentation	CLAUDE.md quality: Commands, Architecture, Gotchas, AUTO sections, drift, content quality	App
Agent Configuration	Settings, allow/deny lists, path-scoped rules, enforcement hierarchy	Repo
Code Quality	File size limits, secret scanning, consistent style	App
Dev Environment	.env.example, build commands, dependency health	Repo
Agentic Workflow	Planning system installed (BMAD, Superpowers, gStack), plan-before-build, session-start validation	Repo

5 Maturity Levels

Level	Name	What It Means
1	Bare	Has manifest + git. That's it.
2	Basic	Linter + formatter + test runner exist and work.
3	Enforced	Git hooks block bad commits. CLAUDE.md exists with essential sections. Agent settings configured.
4	Automated	Auto-generated docs, drift detection, path-scoped rules, smart test caching. Agentic workflow installed.
5	Autonomous	Full harness coverage. TDD enforced. Docs in sync. Plan-before-build + session-start validation.

How It Works

The readiness skill is agent-guided, not script-based. The SKILL.md provides a structured evaluation framework. The agent does the actual analysis using its intelligence. This makes it stack-agnostic (works with any language), adaptive (understands nuance like "tests exist but are stubs"), and low-maintenance.

It uses the /setup skill's templates, scripts, and references as a reference library, reading them to understand what good enforcement looks like, then comparing against what's actually in your project. This means it produces specific recommendations ("Add a Commands section to your existing CLAUDE.md") rather than blunt ones ("Run /setup to overwrite everything").

The skill runs 3 parallel subagents to keep the main context clean:

Style, Testing & Code Quality: reads enforcement script references
Hooks, Config, Environment & Workflow: reads hook and settings templates
Documentation: reads CLAUDE.md templates and quality guide

Report Output

Reports are saved to readiness-report.md with YAML frontmatter for machine-parseable delta tracking. Run /readiness again later to see what improved or regressed.

The skill also provides conversational insights: prose analysis of what's working, what's costing you time, and nuanced observations the checklist can't capture.

Monorepo Support

For monorepos (workspaces, Nx, Turborepo, Cargo workspace, Go workspace), repo-scoped criteria are evaluated once and app-scoped criteria are evaluated per package. The overall level is gated by the weakest app.

Remediation

After the report, the skill offers to apply targeted fixes: editing existing files, adding missing sections, creating new files only where nothing exists. If no agentic workflow system is detected, it recommends Superpowers and can help install it.

What You Get

harness-engineering/
├── .claude-plugin/
│   └── plugin.json               # Plugin manifest
├── skills/readiness/
│   └── SKILL.md                  # Codebase analysis, runs on /readiness
├── skills/setup/
│   ├── SKILL.md                  # Main orchestrator, runs on /setup
│   ├── scripts/
│   │   ├── init-project.js       # Node/TS project scaffolding
│   │   ├── install-enforcement.js # Copies enforcement tooling into target project
│   │   ├── generate-claude-md.js # Generates tailored CLAUDE.md files
│   │   ├── lib/
│   │   │   ├── check-secrets.js      # Blocks commits containing API keys or tokens
│   │   │   ├── check-file-sizes.js   # Blocks files over 300 lines
│   │   │   ├── check-test-colocation.js # Blocks source files without colocated tests
│   │   │   ├── validate-docs.js      # Warns when CLAUDE.md drifts from code
│   │   │   ├── generate-docs.js      # Auto-regenerates CLAUDE.md sections from source
│   │   │   └── generate-docs-helpers.js
│   │   └── hooks/
│   │       ├── pre-commit            # Runs all checks on every commit (<2s)
│   │       └── pre-push              # Runs test suite before push (with smart caching)
│   ├── templates/
│   │   ├── global-claude.md          # Cross-project standards (TDD, quality, conventions)
│   │   ├── project-claude.md         # Per-project guidance (architecture, commands, gotchas)
│   │   ├── rules/                    # Path-scoped rules (auto-loaded by file pattern)
│   │   │   ├── tdd.md               # TDD enforcement (src/**, lib/**)
│   │   │   ├── code-quality.md      # File size limits, complexity (src/**, scripts/**)
│   │   │   ├── testing.md           # Test patterns (tests/**, *.test.*)
│   │   │   └── typescript.md        # Naming, imports (*.ts, *.tsx, *.js)
│   │   ├── eslint-base.js            # Baseline ESLint rules
│   │   ├── lint-staged.config.js     # Auto-fix on staged files
│   │   ├── .prettierrc               # Code formatting
│   │   ├── .gitignore                # Standard gitignore
│   │   ├── .env.example              # Environment variable placeholder
│   │   └── settings.json             # Claude Code permissions (allow/deny lists)
│   └── references/                   # Stack patterns, enforcement docs, CLAUDE.md guide
├── tests/                            # Tests for plugin development
└── README.md                         # You are here

How It Works

CLAUDE.md and Path-Scoped Rules

Claude Code reads CLAUDE.md files at two levels, plus .claude/rules/ for path-scoped guidance. The global file sets universal standards. The project file provides project-specific context. Rules files load automatically when Claude works on matching file patterns.

	Global CLAUDE.md	Project CLAUDE.md
Location	Parent directory (e.g., `~/projects/CLAUDE.md`)	Project root (e.g., `~/projects/my-app/CLAUDE.md`)
Purpose	TDD, code quality, naming, security	Architecture, commands, modules, gotchas
Size	~200-300 lines	~200-500 lines
Changes	Rarely	With the code (auto-generated sections update on commit)

Precedence: Project-specific files override global guidance when there are conflicts.

What Belongs Where

Global CLAUDE.md (shared across projects): Operating principles, workflow guidelines, security checklists, rule enforcement hierarchy.

Project CLAUDE.md (specific to one codebase): Architecture diagrams, essential commands, directory structure, module index, critical gotchas, docs map.

.claude/rules/ (path-scoped, loaded on demand): TDD enforcement (when touching src/), file size limits and complexity checks (when touching code files), test patterns (when touching tests/), naming and import conventions (when touching .ts/.js files). Rules use globs: YAML frontmatter so they only load when Claude works on matching files, keeping the context window lean.

This three-layer system means CLAUDE.md stays under 200-300 lines (global context every session), while detailed path-specific guidance loads automatically only when relevant.

History: .claude/rules/ was introduced in Claude Code v2.0.64 (December 2025). Rules use YAML frontmatter with globs: to scope activation by file pattern. The feature was inspired by Cursor's .cursor/rules/ (which shipped earlier in 2025) and uses a nearly identical format. Note: the official docs reference paths: as the frontmatter key, but community testing (Issue #17204) found that globs: works more reliably.

Progressive Disclosure

Not everything belongs in CLAUDE.md. The agent reads the full file on every conversation, so bloat costs tokens and dilutes signal. Use four tiers:

Tier	What	Where	When Loaded
1	Architecture, commands, operating principles, gotchas	`CLAUDE.md`	Every conversation
1.5	Path-scoped rules (TDD, code quality, test patterns)	`.claude/rules/`	Auto-loaded when touching matching files
2	Detailed topic documentation	`docs/*.md`	On demand, via Docs Map links
3	Design documents, plans, decision records	`docs/plans/`	Rarely, when exploring history

Rule of thumb:

Keep in CLAUDE.md	Move to `.claude/rules/`	Move to `docs/`
Agent needs it for every task	Only needed when working on specific file types	Only needed for specific domains
Changes with code structure	Enforces coding standards per path	Stable reference material
Under 20 lines per topic	Path-scoped with `globs:` frontmatter	Over 20 lines of detail
Commands, operating principles, gotchas	TDD, quality checks, naming, test patterns	Tutorials, explanations, history

The Docs Map pattern in CLAUDE.md links to topic docs so agents can find detail when they need it:

## Docs Map

| Topic | File |
|-------|------|
| Testing strategy and patterns | [docs/testing.md](docs/testing.md) |
| Configuration and env vars | [docs/configuration.md](docs/configuration.md) |

Auto-Generated Sections

Sections of CLAUDE.md can regenerate automatically from your source code, so you don't have to manually keep code and docs in sync.

How It Works

Add marker pairs to your CLAUDE.md:

<!-- AUTO:tree -->
...this content regenerates automatically...
<!-- /AUTO:tree -->

The generate-docs.js script scans your source directories and replaces content between markers with fresh data.

Marker	What It Generates	Source
`tree`	ASCII directory structure with JSDoc annotations	Walks `src/`, `scripts/`, `tests/`
`modules`	Module table with purpose and key exports from source files	Extracts JSDoc + `module.exports` from source files

Two Modes

node scripts/generate-docs.js          # Write mode: regenerate + auto-stage
node scripts/generate-docs.js --check  # Check mode: validate only (for CI)

Write mode runs automatically in the pre-commit hook. Check mode exits with code 1 if sections are stale, which is useful for CI pipelines.

The script also validates that all markdown cross-links in CLAUDE.md point to files that actually exist.

Extending

To add a new auto-generated section:

Add markers:  ... 
Write a builder function that returns the content as a string
Add the marker name to the generated map in generate-docs.js

Git Hook Enforcement

Hooks are the enforcement layer. They run automatically and block commits or pushes that violate quality standards, so enforcement doesn't depend on developer discipline.

Pre-Commit (runs on every commit, <2s)

Step	What It Does	Blocks?
1. lint-staged	Runs ESLint with auto-fix on staged files	Yes, if unfixable errors
2. Secret scan	Pattern-matches for API keys, tokens, private keys	Yes, if secrets found
3. File size check	Rejects files over 300 lines	Yes, if oversized
4. Test colocation	Verifies source files in `src/` have matching `.test.` or `.spec.` files	Yes, if missing tests
5. Doc generation	Regenerates AUTO markers, auto-stages `CLAUDE.md`	No
6. Drift warning	Warns if source files changed without `CLAUDE.md` update	No

Pre-Push (runs on every push)

Step	What It Does	Blocks?
1. Test suite	Runs `npm run test:all` (skipped if cached)	Yes, if tests fail
2. Audit	Checks `npm audit` for vulnerabilities	No (warning only)

Smart Test Caching

The pre-push hook uses SHA-based caching to avoid re-running tests unnecessarily:

npm test passes → posttest script writes HEAD SHA to .test-passed
You commit → SHA changes, cache invalidated
You push → hook compares SHAs. Match? Skip. Mismatch? Run tests.

Each developer has their own local cache (.test-passed is gitignored).

Quality Gates

File Size Limits

Enforced mechanically by the pre-commit hook.

Entity	Max Lines	Why
Any file	300	Forces modular design. Large files are hard for agents and humans to reason about.
Any function	50	Prevents monolithic functions. Each function should do one thing.

Complexity Red Flags

Stop and refactor when you see:

Pattern	Action
>5 nested if/else	Extract conditions to named functions
>3 try/catch in one function	Split error handling into separate concerns
>10 imports	Module is doing too much; split it
Duplicate logic	Extract to shared utilities

Customization

Every enforcement script has a CONFIG object at the top. Edit patterns, limits, and paths without touching the logic.

Script	What to Customize
`check-secrets.js`	`CONFIG.patterns` (secret regexes), `CONFIG.allowlistPaths` (excluded files)
`check-file-sizes.js`	`CONFIG.maxLines` (default: 300), `CONFIG.include`/`CONFIG.exclude` (file globs)
`check-test-colocation.js`	`CONFIG.include`/`CONFIG.exclude` (file globs), `CONFIG.testSuffixes` (default: `.test`, `.spec`)
`validate-docs.js`	`CONFIG.docFile`, `CONFIG.trackedDirs`, `CONFIG.mappings`
`generate-docs.js`	`TREE_DIRS` (directories to scan), `SKIP_DIRS` in helpers (directories to exclude)

The templates use  HTML comments that are invisible when rendered but visible when editing. They guide you through customization without cluttering the final document.

See the scripts in skills/setup/scripts/lib/ for full details on each enforcement script, and skills/setup/scripts/hooks/ for the git hook implementations.

Design Decisions

Cross-platform setup. The scaffolding scripts are written in Node.js, not bash. They work on macOS, Linux, and Windows without requiring WSL or Git Bash.

Configurable, not hardcoded. Every enforcement script uses a CONFIG object at the top. Customize patterns, limits, and paths without understanding the implementation.

HTML comment instructions. The  comments in templates are invisible in rendered markdown but visible when editing. Templates serve as both documentation and fill-in-the-blank forms.

200-300 line target. CLAUDE.md should be small enough that agents process the full content without diluting the important parts. Detailed docs go in docs/ and are loaded on demand.

Auto-generation over manual sync. The generate-docs.js script eliminates the most common source of harness drift: developers changing code without updating docs. The pre-commit hook regenerates automatically.

SHA-based test caching. Running the full test suite on every push is wasteful if you just ran tests. The cache is per-developer, automatically invalidated by new commits, and zero-config.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
.claude-plugin		.claude-plugin
.claude		.claude
.github		.github
assets		assets
scripts		scripts
skills		skills
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md

Folders and files

Latest commit

History

Repository files navigation

Harness Engineering and Best Practices for Coding Agents

Table of Contents

Quick Start

1. Install the plugin

2. Assess your codebase (existing projects)

3. Set up your project

4. Start building

Troubleshooting

Agentic Planning & Execution

Why Planning Matters

TDD Is Required

Integrated Development Workflows

For Large Projects: BMAD Method

For Feature Development: Superpowers Plugin

For Multi-LLM Adversarial Review: Claude Sidecar

For Comprehensive Agent Configuration: Everything Claude Code

Choosing the Right Tool

Mapping to Industry Best Practices

Readiness Analysis

8 Evaluation Pillars

5 Maturity Levels

How It Works

Report Output

Monorepo Support

Remediation

What You Get

How It Works

CLAUDE.md and Path-Scoped Rules

What Belongs Where

Progressive Disclosure

Auto-Generated Sections

How It Works

Two Modes

Extending

Git Hook Enforcement

Pre-Commit (runs on every commit, <2s)

Pre-Push (runs on every push)

Smart Test Caching

Quality Gates

File Size Limits

Complexity Red Flags

Customization

Design Decisions

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages