Epistemic Question
Core Question: What is the empirically optimal progression of agent autonomy tiers for analytics teams, and how do we know when a team is ready to advance to the next tier?
Why This Matters
Agile Agentic Analytics documents tiered autonomy as a mitigation pattern:
Tiered Autonomy: Use escalation levels only when DoD gates and sandbox boundaries justify it:
- Read-only → low risk, fast exploration
- Workspace-write → medium risk, requires review
- Full auto → high risk, requires strong gates
The Implementation Roadmap outlines a three-phase 18-month transformation, but doesn't prescribe exactly when to escalate autonomy levels within each phase.
The risk: Escalate too fast → quality degradation, security incidents, team trust erosion. Escalate too slow → teams don't realize productivity gains, adoption stalls.
The Progression Puzzle
Hypothetical autonomy tier progression:
| Tier |
Agent Permissions |
Human Review |
Appropriate For |
Exit Criteria to Next Tier |
| Tier 0: Assisted Ideation |
Read-only, no execution |
Every output reviewed before use |
Onboarding, unfamiliar domains |
??? |
| Tier 1: Sandboxed Execution |
Workspace read/write, no external calls |
Spot-check 20% of outputs |
Established workflows with strong tests |
??? |
| Tier 2: CI/CD Integration |
Can trigger builds, run tests, create PRs |
Automated gates + human approval |
Mature CI/CD, comprehensive DoD |
??? |
| Tier 3: Autonomous Deployment |
Can merge PRs, deploy to staging |
Automated gates only, post-hoc audit |
High-confidence environments, strong rollback |
??? |
| Tier 4: Production Auto-Deploy |
Can deploy to production with policy constraints |
Monitored but not pre-approved |
Mission-critical systems with exhaustive testing |
??? (or never?) |
The question marks are the epistemic gap: What objective criteria determine readiness to advance?
Open Questions to Explore
-
Readiness signals: What are the leading indicators that a team is ready for increased autonomy? (Error rate? Review turnaround time? Agent thrash rate? Team confidence?)
-
Regression triggers: What signals should cause a demotion to lower autonomy? (Security incident? Quality escape? Stakeholder trust loss?)
-
Domain variance: Do different analytical domains (exploratory vs. confirmatory, regulated vs. unregulated) require different autonomy progressions?
-
Skill mix dependency: Does autonomy progression depend more on senior developer proficiency with agents, or junior developer coverage by strong DoD gates?
-
Cost-benefit inflection points: At which tier does the ROI of autonomy peak? (Hypothesis: Tier 2-3 is the sweet spot; Tier 4 rarely justifies the risk in analytics)
-
Organizational vs. technical readiness: Can technical readiness (strong tests, CI/CD) outpace organizational readiness (stakeholder trust), or vice versa?
Hypotheses to Test
Hypothesis 1: The "Error Rate Threshold"
- Teams should advance to the next autonomy tier only when error rate drops below X% for Y consecutive sprints
- Testable: Track error rates across teams at different autonomy levels; identify empirical thresholds
Hypothesis 2: The "Trust Calibration Curve"
- Team confidence in agent outputs must exceed a threshold before advancing autonomy, but over-confidence is equally dangerous
- Testable: Survey team confidence vs. actual agent error rate; identify calibration gaps
Hypothesis 3: The "DoD Maturity Prerequisite"
- Strong DoD gates are the prerequisite for autonomy escalation, not a consequence of it
- Testable: Compare DoD comprehensiveness across teams; correlate with successful autonomy tier transitions
Hypothesis 4: The "Diminishing Returns Tier"
- Beyond Tier 2-3 (CI/CD integration), additional autonomy adds marginal value but exponential risk
- Testable: Measure productivity gains per autonomy tier; identify inflection points
Hypothesis 5: The "Regulatory Ceiling"
- In regulated domains (financial services, healthcare), Tier 4 (production auto-deploy) is structurally incompatible with audit requirements
- Testable: Survey regulatory precedent; identify hard constraints
Potential Research Directions
Example Decision Framework (Hypothesis)
Advance from Tier 1 (Sandboxed) to Tier 2 (CI/CD) when:
But this is just a hypothesis. We need empirical validation.
Success Criteria for Answering This Question
We will know we've made progress when we can:
- Provide quantitative decision rules for autonomy tier transitions (not just "when it feels right")
- Define tier-specific DoD requirements (what gates are mandatory at each tier?)
- Establish "circuit breaker" rules: automatic demotion triggers when quality/trust degrades
- Create tier-appropriate eval suites (each tier has different acceptable failure modes)
Cross-References
Epistemic Question
Core Question: What is the empirically optimal progression of agent autonomy tiers for analytics teams, and how do we know when a team is ready to advance to the next tier?
Why This Matters
Agile Agentic Analytics documents tiered autonomy as a mitigation pattern:
The Implementation Roadmap outlines a three-phase 18-month transformation, but doesn't prescribe exactly when to escalate autonomy levels within each phase.
The risk: Escalate too fast → quality degradation, security incidents, team trust erosion. Escalate too slow → teams don't realize productivity gains, adoption stalls.
The Progression Puzzle
Hypothetical autonomy tier progression:
The question marks are the epistemic gap: What objective criteria determine readiness to advance?
Open Questions to Explore
Readiness signals: What are the leading indicators that a team is ready for increased autonomy? (Error rate? Review turnaround time? Agent thrash rate? Team confidence?)
Regression triggers: What signals should cause a demotion to lower autonomy? (Security incident? Quality escape? Stakeholder trust loss?)
Domain variance: Do different analytical domains (exploratory vs. confirmatory, regulated vs. unregulated) require different autonomy progressions?
Skill mix dependency: Does autonomy progression depend more on senior developer proficiency with agents, or junior developer coverage by strong DoD gates?
Cost-benefit inflection points: At which tier does the ROI of autonomy peak? (Hypothesis: Tier 2-3 is the sweet spot; Tier 4 rarely justifies the risk in analytics)
Organizational vs. technical readiness: Can technical readiness (strong tests, CI/CD) outpace organizational readiness (stakeholder trust), or vice versa?
Hypotheses to Test
Hypothesis 1: The "Error Rate Threshold"
Hypothesis 2: The "Trust Calibration Curve"
Hypothesis 3: The "DoD Maturity Prerequisite"
Hypothesis 4: The "Diminishing Returns Tier"
Hypothesis 5: The "Regulatory Ceiling"
Potential Research Directions
Example Decision Framework (Hypothesis)
Advance from Tier 1 (Sandboxed) to Tier 2 (CI/CD) when:
But this is just a hypothesis. We need empirical validation.
Success Criteria for Answering This Question
We will know we've made progress when we can:
Cross-References