Autonomous Shipping Operating Model

This is the lightweight GitHub-native harness for autonomous 24/7 development.

Goal

Ship a high volume of small, reviewed, testable changes every day without giving unattended agents direct production power.

Principles

optimize for trustworthy throughput, not chaotic output
keep changes small and file ownership clear
require review by a different agent than the author
stage before production for anything that changes contracts, auth, runtime, or infrastructure
enforce patch and changed-file guardrails so one autonomous pass stays small

Lanes

Lane A: Safe Auto-Merge

docs
test repairs
small CLI and SDK parity fixes
small frontend proxy or type alignment

Requirements:

one reviewer agent
required CI green
labels autonomy:safe and size:xs or size:s

Lane B: Staging-First Product Work

backend route changes
dashboard features
observability improvements

Requirements:

one reviewer agent
required CI green
staging deploy success

Lane C: Human-Gated Risk

auth/session changes
runtime protocol changes
infrastructure changes
database migrations

Requirements:

human approval
targeted verification evidence

Control Loop

Scheduled workflow runs every 15 minutes.
Orchestrator scans open issues labeled autonomy:ready.
Work is assigned to a single owning agent based on area labels.
Authoring agent creates a branch and PR.
Reviewer agent checks scope, ownership, validation, and drift. If reviewer login mapping is configured, PR assignment follows the work order.
CI gates run.
Safe PRs auto-merge; risky PRs stop for human approval.
Stale claimed issues with no open PR are automatically released back into the queue after the configured timeout.

Required Labels

area labels: area:api, area:web, area:auth, area:cli-sdk, area:runtime, area:test, area:infra, area:ops, area:docs
risk labels: risk:low, risk:medium, risk:high
autonomy labels: autonomy:ready, autonomy:safe, autonomy:blocked, autonomy:claimed
size labels: size:xs, size:s, size:m, size:l

Recommended Protections

branch protection on main
require PRs for all changes
require at least one approval
require CI
enable auto-merge only after labels and checks pass
block force-pushes to protected branches

Validation Truth Contract

Keep CI aligned with what the repo actually supports today.

scripts/test.sh is the authoritative repo-wide validation entrypoint for CI.
Python API truth comes from pytest tests/api, lint/format checks, and compile checks.
Frontend truth comes from generated API types, lint, build, and the checked-in Playwright smoke suite.
Playwright is currently a production-smoke lane, not a generic localhost integration lane. If the suite assumes hosted behavior (for example Turnstile bootstrapping), CI should provide the matching test shim instead of failing on expected external-service gaps.
Infra validation should live in dedicated workflows/targets (make tf-validate, make monitor-validate, drift workflows) rather than silently piggybacking on unrelated app PRs.
When a validation lane starts failing for config drift rather than product breakage, either fix the fixture/assumption immediately or remove that lane from required CI until it is truthful again.

What Not To Automate Yet

direct production infra changes
auth-breaking changes
broad schema migrations
destructive data operations
unreviewed runtime protocol changes

First Four Agents To Activate

mission-control-orchestrator
qa-reliability-engineer
cli-sdk-contract-keeper
control-plane-steward

These four reduce drift and improve validation enough to safely expand the team.

Autonomous Shipping Operating Model

This is the lightweight GitHub-native harness for autonomous 24/7 development.

Goal

Ship a high volume of small, reviewed, testable changes every day without giving unattended agents direct production power.

Principles

optimize for trustworthy throughput, not chaotic output
keep changes small and file ownership clear
require review by a different agent than the author
stage before production for anything that changes contracts, auth, runtime, or infrastructure
enforce patch and changed-file guardrails so one autonomous pass stays small

Lanes

Lane A: Safe Auto-Merge

docs
test repairs
small CLI and SDK parity fixes
small frontend proxy or type alignment

Requirements:

one reviewer agent
required CI green
labels autonomy:safe and size:xs or size:s

Lane B: Staging-First Product Work

backend route changes
dashboard features
observability improvements

Requirements:

one reviewer agent
required CI green
staging deploy success

Lane C: Human-Gated Risk

auth/session changes
runtime protocol changes
infrastructure changes
database migrations

Requirements:

human approval
targeted verification evidence

Control Loop

Scheduled workflow runs every 15 minutes.
Orchestrator scans open issues labeled autonomy:ready.
Work is assigned to a single owning agent based on area labels.
Authoring agent creates a branch and PR.
Reviewer agent checks scope, ownership, validation, and drift. If reviewer login mapping is configured, PR assignment follows the work order.
CI gates run.
Safe PRs auto-merge; risky PRs stop for human approval.
Stale claimed issues with no open PR are automatically released back into the queue after the configured timeout.

Required Labels

area labels: area:api, area:web, area:auth, area:cli-sdk, area:runtime, area:test, area:infra, area:ops, area:docs
risk labels: risk:low, risk:medium, risk:high
autonomy labels: autonomy:ready, autonomy:safe, autonomy:blocked, autonomy:claimed
size labels: size:xs, size:s, size:m, size:l

Recommended Protections

branch protection on main
require PRs for all changes
require at least one approval
require CI
enable auto-merge only after labels and checks pass
block force-pushes to protected branches

Validation Truth Contract

Keep CI aligned with what the repo actually supports today.

scripts/test.sh is the authoritative repo-wide validation entrypoint for CI.
Python API truth comes from pytest tests/api, lint/format checks, and compile checks.
Frontend truth comes from generated API types, lint, build, and the checked-in Playwright smoke suite.
Playwright is currently a production-smoke lane, not a generic localhost integration lane. If the suite assumes hosted behavior (for example Turnstile bootstrapping), CI should provide the matching test shim instead of failing on expected external-service gaps.
Infra validation should live in dedicated workflows/targets (make tf-validate, make monitor-validate, drift workflows) rather than silently piggybacking on unrelated app PRs.
When a validation lane starts failing for config drift rather than product breakage, either fix the fixture/assumption immediately or remove that lane from required CI until it is truthful again.

What Not To Automate Yet

direct production infra changes
auth-breaking changes
broad schema migrations
destructive data operations
unreviewed runtime protocol changes

First Four Agents To Activate

mission-control-orchestrator
qa-reliability-engineer
cli-sdk-contract-keeper
control-plane-steward

These four reduce drift and improve validation enough to safely expand the team.

Autonomous Shipping Operating Model#

Goal#

Principles#

Lanes#

Lane A: Safe Auto-Merge#

Lane B: Staging-First Product Work#

Lane C: Human-Gated Risk#

Control Loop#

Required Labels#

Recommended Protections#

Validation Truth Contract#

What Not To Automate Yet#

First Four Agents To Activate#

Autonomous Shipping Operating Model#

Goal#

Principles#

Lanes#

Lane A: Safe Auto-Merge#

Lane B: Staging-First Product Work#

Lane C: Human-Gated Risk#

Control Loop#

Required Labels#

Recommended Protections#

Validation Truth Contract#

What Not To Automate Yet#

First Four Agents To Activate#

Autonomous Shipping Operating Model

Goal

Principles

Lanes

Lane A: Safe Auto-Merge

Lane B: Staging-First Product Work

Lane C: Human-Gated Risk

Control Loop

Required Labels

Recommended Protections

Validation Truth Contract

What Not To Automate Yet

First Four Agents To Activate

Autonomous Shipping Operating Model

Goal

Principles

Lanes

Lane A: Safe Auto-Merge

Lane B: Staging-First Product Work

Lane C: Human-Gated Risk

Control Loop

Required Labels

Recommended Protections

Validation Truth Contract

What Not To Automate Yet

First Four Agents To Activate