Autonomous Shipping Operating Model
This is the lightweight GitHub-native harness for autonomous 24/7 development.
Goal
Ship a high volume of small, reviewed, testable changes every day without giving unattended agents direct production power.
Principles
- optimize for trustworthy throughput, not chaotic output
- keep changes small and file ownership clear
- require review by a different agent than the author
- stage before production for anything that changes contracts, auth, runtime, or infrastructure
- enforce patch and changed-file guardrails so one autonomous pass stays small
Lanes
Lane A: Safe Auto-Merge
- docs
- test repairs
- small CLI and SDK parity fixes
- small frontend proxy or type alignment
Requirements:
- one reviewer agent
- required CI green
- labels
autonomy:safeandsize:xsorsize:s
Lane B: Staging-First Product Work
- backend route changes
- dashboard features
- observability improvements
Requirements:
- one reviewer agent
- required CI green
- staging deploy success
Lane C: Human-Gated Risk
- auth/session changes
- runtime protocol changes
- infrastructure changes
- database migrations
Requirements:
- human approval
- targeted verification evidence
Control Loop
- Scheduled workflow runs every 15 minutes.
- Orchestrator scans open issues labeled
autonomy:ready. - Work is assigned to a single owning agent based on area labels.
- Authoring agent creates a branch and PR.
- Reviewer agent checks scope, ownership, validation, and drift. If reviewer login mapping is configured, PR assignment follows the work order.
- CI gates run.
- Safe PRs auto-merge; risky PRs stop for human approval.
- Stale claimed issues with no open PR are automatically released back into the queue after the configured timeout.
Required Labels
- area labels:
area:api,area:web,area:auth,area:cli-sdk,area:runtime,area:test,area:infra,area:ops,area:docs - risk labels:
risk:low,risk:medium,risk:high - autonomy labels:
autonomy:ready,autonomy:safe,autonomy:blocked,autonomy:claimed - size labels:
size:xs,size:s,size:m,size:l
Recommended Protections
- branch protection on
main - require PRs for all changes
- require at least one approval
- require CI
- enable auto-merge only after labels and checks pass
- block force-pushes to protected branches
Validation Truth Contract
Keep CI aligned with what the repo actually supports today.
scripts/test.shis the authoritative repo-wide validation entrypoint for CI.- Python API truth comes from
pytest tests/api, lint/format checks, and compile checks. - Frontend truth comes from generated API types, lint, build, and the checked-in Playwright smoke suite.
- Playwright is currently a production-smoke lane, not a generic localhost integration lane. If the suite assumes hosted behavior (for example Turnstile bootstrapping), CI should provide the matching test shim instead of failing on expected external-service gaps.
- Infra validation should live in dedicated workflows/targets (
make tf-validate,make monitor-validate, drift workflows) rather than silently piggybacking on unrelated app PRs. - When a validation lane starts failing for config drift rather than product breakage, either fix the fixture/assumption immediately or remove that lane from required CI until it is truthful again.
What Not To Automate Yet
- direct production infra changes
- auth-breaking changes
- broad schema migrations
- destructive data operations
- unreviewed runtime protocol changes
First Four Agents To Activate
mission-control-orchestratorqa-reliability-engineercli-sdk-contract-keepercontrol-plane-steward
These four reduce drift and improve validation enough to safely expand the team.
