πŸ“– MUTX Docs
GitHubΒ·mutx.dev
Welcome
Manifesto
Whitepaper
Roadmap
Documentation Hub
Autonomous Agent Team
MUTX Infrastructure
Python SDK
Support
Contributing
Security Policy
Licensing
Contributor Covenant Code of Conduct
AGENTS.md
App Dashboard
Changelog Status
Claim to Reality Gap Matrix
Governance
Migration Runbook
Monitoring
Mutation Testing
OTel
Overview
Quickstart
Surface Matrix
Technical Whitepaper
Webhook Governance
  1. Docsβ€Ί
  2. Welcome

MUTX Infrastructure#

Production-readiness baseline for Terraform, Ansible, Docker Compose, and monitoring.

Layout#

  • terraform/ – DigitalOcean VPC + droplet + firewall provisioning
  • ansible/ – host hardening and container deployment playbooks
  • helm/ – Kubernetes Helm chart for MUTX deployment
  • monitoring/ – Prometheus + Grafana config

Terraform (DigitalOcean)#

cd infrastructure
make tf-fmt
make tf-validate
make ansible-inventory

# Staging
make tf-plan-staging

# Production
make tf-plan-production

# Apply from terraform/ after reviewing plan
cd terraform
terraform apply -var-file=environments/production/terraform.tfvars

Health Checks#

The API exposes two health check endpoints for Kubernetes/production use:

  • /health - Liveness probe, returns 200 if the service is alive
  • /ready - Readiness probe, returns 503 if not ready to accept traffic

Both endpoints check database connectivity status.

Notes#

  • Backend is environment-scoped via terraform/environments/<env>/backend.hcl.
  • Keep admin_cidr scoped to VPN/home-office CIDR. Avoid 0.0.0.0/0 in production.
  • Customer IDs must be unique.
  • Customer VPC CIDRs are validated at plan time.

Ansible Provisioning#

cd infrastructure
make ansible-provision
make ansible-deploy-agent

Install Ansible collections and lint before applying:

cd infrastructure
make ansible-deps
make ansible-lint

Required environment variables before running playbooks:

  • POSTGRES_PASSWORD
  • REDIS_PASSWORD
  • AGENT_API_KEY
  • AGENT_SECRET_KEY
  • ADMIN_CIDR (recommended, defaults to 0.0.0.0/0)
  • PRIVATE_CIDR (optional, defaults to 10.0.0.0/8)
  • TAILSCALE_AUTH_KEY (optional)

Monitoring Stack#

cp .env.monitoring.example .env.monitoring
# set strong credentials/passwords
cd infrastructure
make monitor-up

The monitoring compose file binds UI and exporter ports to localhost by default. Stop it with cd infrastructure && make monitor-down. Prometheus now loads alerting rules from infrastructure/monitoring/prometheus/alerts.yml and scrapes the API via host.docker.internal:8000 (works for local Docker Desktop + Linux host-gateway mapping). Grafana provisioning and dashboard mounts are resolved relative to infrastructure/docker/docker-compose.monitoring.yml, so the stack reads from ../monitoring/grafana/... when launched from infrastructure/.

CI Drift Detection#

A scheduled GitHub Actions workflow now checks Terraform drift daily for both staging and production using terraform plan -detailed-exitcode:

  • Workflow: .github/workflows/infrastructure-drift.yml
  • Trigger: daily cron + manual dispatch
  • Behavior: uploads plan artifacts and fails when drift is detected

Required GitHub secrets:

  • DO_TOKEN
  • TF_STATE_ACCESS_KEY_ID
  • TF_STATE_SECRET_ACCESS_KEY

Extra Validation Targets#

From infrastructure/:

# Drift-style exit codes (0=no changes, 2=changes)
make tf-plan-staging-detailed
make tf-plan-production-detailed

# Validate Prometheus config + alert rules
make monitor-validate

These infrastructure checks are authoritative for infra changes, but they are not the same thing as the app CI lane in .github/workflows/ci.yml. Keep infra-specific validation explicit so application PRs do not fail on hidden cross-surface assumptions.

Next Hardening Items#

  • Replace static inventory usage with Terraform-generated inventory by default in Ansible playbook wrappers.
  • Wire alert delivery (Alertmanager/notification channel) for critical rules.
  • Default local Alertmanager routing now drops alerts unless they are explicitly labeled notify="webhook", preventing noisy failed webhook retries in dev setups without a receiver.
  • Add automated backup/restore verification for PostgreSQL and Redis volumes.
PreviousRuntime Protocol EngineerNextPython SDK

Last updated via GitBook sync β€” source at GitHub

On this page

LayoutTerraform (DigitalOcean)Health ChecksNotesAnsible ProvisioningMonitoring StackCI Drift DetectionExtra Validation TargetsNext Hardening Items