Skip to content

seekcontext/harness0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

harness0 — Layer 0 of Agent Reliability

PyPI version Python 3.11+ License: MIT

Harness Engine for AI Agents.
A reliability layer that makes any agent stable.

Quick Start · 5-Layer Model · Individual Layers · User Manual · Architecture


Agent = Loop(Model + Harness)

You provide the Model. harness0 provides the Harness — the engineering infrastructure that makes the model work reliably in production.

┌─────────────────────────────────────────┐
│           Your Application              │
├─────────────────────────────────────────┤
│  Orchestration (pick one)               │
│  LangChain / CrewAI / PydanticAI / DIY  │
├─────────────────────────────────────────┤
│  ★ harness0 — reliability layer         │  ← Layer 0
│  Context · Tools · Security             │
│  Feedback · Entropy                     │
├─────────────────────────────────────────┤
│  LLM API                                │
│  OpenAI / Anthropic / DeepSeek / Local  │
└─────────────────────────────────────────┘

Keep using whatever framework you already use. harness0 is complementary — it adds the reliability layer underneath.

Concept origin: Harness Engineering was introduced by OpenAI (Feb 2026), based on building a 1M-line, fully agent-generated codebase. The key insight: the model is the engine; the harness is what makes it driveable. harness0 is the first open-source library built entirely around this discipline.

The Problem

Every agent developer hits the same walls:

Problem Root Cause harness0 Layer
"Demo works, production fails" No structured context management L1 Context Assembly
"More tools = less stable" Tools are an ungoverned bag of functions L2 Tool Governance
"Afraid to let agents run commands" Security relies on prompt-level trust L3 Security Guard
"Agent fails but doesn't know why" System errors aren't translated for the model L4 Feedback Loop
"Agent drifts on long tasks" Context decays — stale rules, bloated history L5 Entropy Management

Existing frameworks solve orchestration. harness0 solves reliability.


Quick Start

pip install harness0
import asyncio
from openai import AsyncOpenAI
from harness0 import HarnessEngine, RiskLevel

engine = HarnessEngine.default()

@engine.tool(risk_level=RiskLevel.READ)
async def read_file(path: str) -> str:
    """Read a file and return its contents."""
    return open(path).read()

@engine.tool(risk_level=RiskLevel.EXECUTE, requires_approval=True, timeout=30)
async def run_command(command: str) -> str:
    """Execute a shell command."""
    import subprocess
    return subprocess.check_output(command, shell=True, text=True)

async def main():
    result = await engine.run("Summarise README.md", llm_client=AsyncOpenAI())
    print(result.output)

asyncio.run(main())

v0.0.5 — L1–L5 and HarnessEngine are implemented and functional. Framework integrations are planned. See TODO.md.

With harness.yaml — declarative configuration
llm:
  provider: openai
  model: gpt-4o

context:
  layers:
    - name: base
      source: prompts/base.md
      priority: 0
      disclosure_level: index
    - name: security-guide
      source: docs/security.md
      priority: 10
      disclosure_level: detail
      keywords: ["security", "permission", "auth"]
  total_token_budget: 8000

security:
  blocked_commands: ["rm -rf", "sudo", "> /dev/sda"]
  approval_mode: risky_only

entropy:
  gardener_enabled: true
  gardener_interval_turns: 5
  golden_rules:
    - id: no_duplicate_tools
      description: "No two tools may share the same description"
      severity: error
    - id: no_stale_layers
      description: "All FileSource layers must be fresher than 24h"
      severity: warning
engine = HarnessEngine.from_config("harness.yaml")

The 5-Layer Harness Model

L1: Context Assembly

Prompts are assembly systems, not documents.

Give the agent a map, not a 1,000-page manual. INDEX layers are always injected (base prompt, rules summary). DETAIL layers are keyword-gated — loaded only when the task mentions relevant terms. Per-layer and total token budgets prevent context overflow.

assembler = ContextAssembler(layers=[
    ContextLayer(name="base", source=FileSource("base.md"),
                 disclosure_level=DisclosureLevel.INDEX),          # always loaded
    ContextLayer(name="security", source=FileSource("security.md"),
                 disclosure_level=DisclosureLevel.DETAIL,          # loaded only for security tasks
                 keywords=["security", "auth"]),
    ContextLayer(name="state", source=CallableSource(get_state),
                 freshness=Freshness.PER_TURN),                    # dynamic per turn
], total_token_budget=8000)

L2: Tool Governance

Tools are governed capabilities, not a bag of functions.

Four risk levels (READWRITEEXECUTECRITICAL), schema validation, output truncation, and full audit trail. Every tool call passes through a unified pipeline; every failure emits a structured signal the agent can act on.

@engine.tool(risk_level=RiskLevel.EXECUTE, requires_approval=True, timeout=30)
async def run_command(command: str) -> str: ...
ToolCall → Validate → CommandGuard → Approval → Execute → Truncate → Audit → ToolResult

L3: Security Guard

Security at runtime, not in prompts.

Three lines of defense: CommandGuard (pattern blocklist with fix instructions), ProcessSandbox (configurable resource limits), ApprovalManager (human-in-the-loop with SHA-256 fingerprint cache — approve once per session, not once per call).

result = engine.command_guard.check("sudo rm -rf /tmp")
result.allowed          # False
result.matched_pattern  # "sudo"
result.signal.fix_instructions
# "1. Do NOT retry — matches the security blocklist.
#  2. Reason: `sudo` causes irreversible side effects.
#  3. Safer alternatives: run without sudo, or use targeted delete."

Even if the model "misbehaves," the system has hard boundaries.

L4: Feedback Loop

System events must be translated into model-consumable signals.

The agent should never see a bare PermissionError. It should see what happened, why, and what to do next:

System Event Without L4 With L4
Command blocked PermissionError "Command blocked: sudo. Step 1: don't retry. Step 2: run without sudo."
Output truncated Silent cutoff "Output truncated 12K→5K tokens. Narrow your search scope."
Subprocess timeout TimeoutError "Exceeded 30s timeout. Break into smaller steps or increase timeout."
Schema invalid ValidationError "Missing required parameter content. Check the tool schema and retry."

Every signal carries a fix_instructions field — numbered steps the agent can execute immediately. Signals are rendered as XML and auto-injected into the next turn's context via L1:

<harness:signal id="a3f8c1d2" type="constraint" source="security.command_guard">
  <message>Command `sudo apt install` blocked.</message>
  <fix_instructions>1. Do NOT retry.
2. Install without sudo.
3. Or request user approval.</fix_instructions>
</harness:signal>

L5: Entropy Management

Active quality maintenance, not passive compression.

Agent context decays over time. Other frameworks react only when tokens overflow. harness0 proactively detects and repairs degradation every turn:

Passive (other frameworks) Active (harness0)
Trigger Token count near limit Every turn, proactively
Method LLM summarizes old messages Detect + classify + targeted fix
Stale signal removal No Yes
Duplicate detection No Yes
Conflict detection No Yes
Background GC No YesEntropyGardener

Golden rules are mechanically verifiable invariants declared in YAML. Violations emit FeedbackSignals — the agent can self-repair:

entropy:
  golden_rules:
    - id: no_duplicate_tools
      description: "No two tools may share the same description"
      severity: error
    - id: no_stale_layers
      description: "All FileSource layers must be fresher than 24h"
      severity: warning

Cross-Layer Coordination

The 5 layers are not independent pipelines — they form a coordinated feedback loop:

L3 SecurityGuard blocks "rm -rf /"
  → L4 FeedbackTranslator generates signal with fix_instructions
    → L1 ContextAssembler injects signal into next turn's context
      → LLM receives actionable feedback, adjusts behavior
        → L5 EntropyManager garbage-collects stale signals later

Full API reference → User Manual


Use Individual Layers

Every layer is independently importable. No full buy-in required.

from harness0.context import ContextAssembler       # L1 — multi-layer prompt assembly
from harness0.tools import ToolInterceptor           # L2 — governed tool execution
from harness0.security import CommandGuard            # L3 — security enforcement
from harness0.feedback import FeedbackTranslator      # L4 — better error messages for models
from harness0.entropy import EntropyManager           # L5 — context quality maintenance

Use just L3 for security, just L1 for prompt assembly, or all 5 together. Each layer has zero dependencies on the others.

Individual layer usage examples → User Manual §11


How It Compares

Based on publicly available documentation as of March 2026. See competitive-analysis.md for methodology.

Capability LangChain OpenAI SDK MS AGT harness0
Multi-layer context assembly Basic (2-tier) ✅ L1
Progressive disclosure ✅ INDEX/DETAIL
Tool risk classification allow/reject Policy engine ✅ 4-level
Sandbox execution Remote 4 privilege rings ✅ Lightweight
Approval workflows HITL approve/reject Yes ✅ + fingerprint cache
Feedback translation ✅ L4
Entropy detection + GC ✅ L5
Golden rule enforcement EntropyGardener
Declarative config OPA/Rego harness.yaml
Framework agnostic No No Yes

Three capabilities no major framework addresses: multi-layer context assembly, feedback translation, and entropy management.


Framework Integrations [Planned]

harness0 works with your existing framework. Adapters are on the roadmap:

Framework Install Strategy
LangChain pip install harness0[langchain] Middleware hooks
OpenAI Agents SDK pip install harness0[openai] Input/output/tool guardrails
PydanticAI pip install harness0[pydantic-ai] Dependency injection
CrewAI pip install harness0[crewai] @harness_tool decorator

Integration architecture → Architecture docs


Why "harness0"?

The 0 means Layer 0 — the foundational reliability substrate beneath every agent framework, like Layer 0 in networking is the physical medium all higher layers depend on. Ground zero of agent reliability.

Three lessons from OpenAI's harness engineering that directly shaped the design:

  1. "Give the agent a map, not a manual" → L1 Progressive Disclosure (INDEX/DETAIL)
  2. "Error messages must contain fix instructions" → L4 fix_instructions on every signal
  3. "Entropy is inevitable — automate the gardening" → L5 EntropyGardener with golden rules

Requirements

  • Python 3.11+
  • Dependencies: pydantic>=2.0 · pyyaml>=6.0 · tiktoken>=0.7 · httpx>=0.27 · aiofiles>=24.0
  • Any openai.AsyncOpenAI-compatible LLM client

Contributing

Contributions welcome. See TODO.md for the full roadmap.

Priority areas: test suite · LLM provider layer · built-in tool plugins · framework adapters · entropy detection strategies

License

MIT


User Manual · Architecture · Competitive Analysis · Vision · Roadmap