Epistemic Question: What is the optimal agent autonomy tier progression for analytics teams?

## Epistemic Question

**Core Question:** What is the empirically optimal progression of agent autonomy tiers for analytics teams, and how do we know when a team is ready to advance to the next tier?

## Why This Matters

[Agile Agentic Analytics](https://github.com/weisberg/knowledge_base_public/wiki/Vibe-Analytics-Agile-Agentic-Analytics#mitigation-patterns-that-fit-scrum-rituals) documents tiered autonomy as a mitigation pattern:

> **Tiered Autonomy:** Use escalation levels only when DoD gates and sandbox boundaries justify it:
> - **Read-only** → low risk, fast exploration
> - **Workspace-write** → medium risk, requires review
> - **Full auto** → high risk, requires strong gates

The [Implementation Roadmap](https://github.com/weisberg/knowledge_base_public/wiki/Vibe-Analytics-Implementation-Roadmap) outlines a three-phase 18-month transformation, but doesn't prescribe exactly *when* to escalate autonomy levels within each phase.

**The risk:** Escalate too fast → quality degradation, security incidents, team trust erosion. Escalate too slow → teams don't realize productivity gains, adoption stalls.

## The Progression Puzzle

**Hypothetical autonomy tier progression:**

| Tier | Agent Permissions | Human Review | Appropriate For | Exit Criteria to Next Tier |
|------|-------------------|--------------|-----------------|---------------------------|
| **Tier 0: Assisted Ideation** | Read-only, no execution | Every output reviewed before use | Onboarding, unfamiliar domains | ??? |
| **Tier 1: Sandboxed Execution** | Workspace read/write, no external calls | Spot-check 20% of outputs | Established workflows with strong tests | ??? |
| **Tier 2: CI/CD Integration** | Can trigger builds, run tests, create PRs | Automated gates + human approval | Mature CI/CD, comprehensive DoD | ??? |
| **Tier 3: Autonomous Deployment** | Can merge PRs, deploy to staging | Automated gates only, post-hoc audit | High-confidence environments, strong rollback | ??? |
| **Tier 4: Production Auto-Deploy** | Can deploy to production with policy constraints | Monitored but not pre-approved | Mission-critical systems with exhaustive testing | ??? (or never?) |

**The question marks are the epistemic gap:** What objective criteria determine readiness to advance?

## Open Questions to Explore

1. **Readiness signals:** What are the leading indicators that a team is ready for increased autonomy? (Error rate? Review turnaround time? Agent thrash rate? Team confidence?)

2. **Regression triggers:** What signals should cause a *demotion* to lower autonomy? (Security incident? Quality escape? Stakeholder trust loss?)

3. **Domain variance:** Do different analytical domains (exploratory vs. confirmatory, regulated vs. unregulated) require different autonomy progressions?

4. **Skill mix dependency:** Does autonomy progression depend more on senior developer proficiency with agents, or junior developer coverage by strong DoD gates?

5. **Cost-benefit inflection points:** At which tier does the ROI of autonomy peak? (Hypothesis: Tier 2-3 is the sweet spot; Tier 4 rarely justifies the risk in analytics)

6. **Organizational vs. technical readiness:** Can technical readiness (strong tests, CI/CD) outpace organizational readiness (stakeholder trust), or vice versa?

## Hypotheses to Test

**Hypothesis 1: The "Error Rate Threshold"**
- Teams should advance to the next autonomy tier only when error rate drops below X% for Y consecutive sprints
- **Testable:** Track error rates across teams at different autonomy levels; identify empirical thresholds

**Hypothesis 2: The "Trust Calibration Curve"**
- Team confidence in agent outputs must exceed a threshold before advancing autonomy, but *over-confidence* is equally dangerous
- **Testable:** Survey team confidence vs. actual agent error rate; identify calibration gaps

**Hypothesis 3: The "DoD Maturity Prerequisite"**
- Strong DoD gates are the *prerequisite* for autonomy escalation, not a consequence of it
- **Testable:** Compare DoD comprehensiveness across teams; correlate with successful autonomy tier transitions

**Hypothesis 4: The "Diminishing Returns Tier"**
- Beyond Tier 2-3 (CI/CD integration), additional autonomy adds marginal value but exponential risk
- **Testable:** Measure productivity gains per autonomy tier; identify inflection points

**Hypothesis 5: The "Regulatory Ceiling"**
- In regulated domains (financial services, healthcare), Tier 4 (production auto-deploy) is structurally incompatible with audit requirements
- **Testable:** Survey regulatory precedent; identify hard constraints

## Potential Research Directions

- [ ] Build "Autonomy Readiness Assessment" rubric (technical, organizational, domain, risk dimensions)
- [ ] Create empirical database: team characteristics → autonomy tier → outcomes (productivity, quality, incidents)
- [ ] Design "autonomy tier demotion protocol" — when/how to safely reduce autonomy after incidents
- [ ] Explore "split autonomy" models: Tier 3 for analytics pipelines, Tier 1 for customer-facing code (domain-specific tiers)
- [ ] Study "autonomy fatigue" — does constant human oversight at low tiers burn out teams and kill adoption?

## Example Decision Framework (Hypothesis)

**Advance from Tier 1 (Sandboxed) to Tier 2 (CI/CD) when:**
- [ ] Agent error rate <5% over 50 consecutive executions (technical readiness)
- [ ] No critical failures (compliance violations, major data errors) in past 3 sprints (risk management)
- [ ] CI/CD gates comprehensive: tests, linters, security scans, policy-as-code (infrastructure readiness)
- [ ] Team self-reports confidence >4/5 in agent outputs (organizational readiness)
- [ ] DoD documented and enforced for ≥80% of work items (process maturity)

**But this is just a hypothesis.** We need empirical validation.

## Success Criteria for Answering This Question

We will know we've made progress when we can:
1. Provide quantitative decision rules for autonomy tier transitions (not just "when it feels right")
2. Define tier-specific DoD requirements (what gates are mandatory at each tier?)
3. Establish "circuit breaker" rules: automatic demotion triggers when quality/trust degrades
4. Create tier-appropriate eval suites (each tier has different acceptable failure modes)

## Cross-References

- [Agile Agentic Analytics: Tiered Autonomy](https://github.com/weisberg/knowledge_base_public/wiki/Vibe-Analytics-Agile-Agentic-Analytics#2-tiered-autonomy)
- [Implementation Roadmap: Three-Phase Progression](https://github.com/weisberg/knowledge_base_public/wiki/Vibe-Analytics-Implementation-Roadmap)
- [Risks and Mitigation: Quality Degradation](https://github.com/weisberg/knowledge_base_public/wiki/Vibe-Analytics-Risks-and-Mitigation#risk-2-quality-degradation-from-ai-errors)
- [HFIS Practical Guide: Three-Tier Boundary System](https://github.com/weisberg/knowledge_base_public/wiki/Vibe-Analytics-HFIS-Practical-Guide#component-3-scope-and-boundaries-three-tier-system)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epistemic Question: What is the optimal agent autonomy tier progression for analytics teams? #4

Epistemic Question

Why This Matters

The Progression Puzzle

Open Questions to Explore

Hypotheses to Test

Potential Research Directions

Example Decision Framework (Hypothesis)

Success Criteria for Answering This Question

Cross-References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Tier	Agent Permissions	Human Review	Appropriate For	Exit Criteria to Next Tier
Tier 0: Assisted Ideation	Read-only, no execution	Every output reviewed before use	Onboarding, unfamiliar domains	???
Tier 1: Sandboxed Execution	Workspace read/write, no external calls	Spot-check 20% of outputs	Established workflows with strong tests	???
Tier 2: CI/CD Integration	Can trigger builds, run tests, create PRs	Automated gates + human approval	Mature CI/CD, comprehensive DoD	???
Tier 3: Autonomous Deployment	Can merge PRs, deploy to staging	Automated gates only, post-hoc audit	High-confidence environments, strong rollback	???
Tier 4: Production Auto-Deploy	Can deploy to production with policy constraints	Monitored but not pre-approved	Mission-critical systems with exhaustive testing	??? (or never?)

Epistemic Question: What is the optimal agent autonomy tier progression for analytics teams? #4

Description

Epistemic Question

Why This Matters

The Progression Puzzle

Open Questions to Explore

Hypotheses to Test

Potential Research Directions

Example Decision Framework (Hypothesis)

Success Criteria for Answering This Question

Cross-References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions