Self-healing Kubernetes infrastructure that actually stays up β automated lifecycle management, distributed tracing, and VPN-only security that just works
Production-grade platform featuring intelligent lifecycle automation that self-heals DNS and VPN
nodes before they break, comprehensive observability with distributed tracing (Prometheus, Grafana,
Loki, Thanos, Tempo), and category-based namespace isolation. Deploy AI workloads, run production
services, sleep through the night. All infrastructure hermetically sealed behind Headscale VPN mesh,
deployed with a single terraform apply.
Key Features:
- β Category-based namespace isolation (core, gitops, inference, infra, monitoring)
- β Headscale VPN mesh networking (100.64.0.0/10)
- β FreeIPA identity management (LDAP/Kerberos) for all services
- β Self-healing DNS with automatic cleanup and init container updates
- β Automated TLS certificates (Let's Encrypt via cert-manager)
- β Comprehensive observability (Prometheus, Grafana, Loki, Thanos, Tempo)
- β Distributed tracing with OpenTelemetry integration (Open-WebUI traces to Tempo)
- β Fixed and correlated dashboards (Kubernetes, Headscale, Open-WebUI metrics)
- β Failure-resilient deployment (no circular dependencies)
- β Services: Headscale, FreeIPA, Grafana, Prometheus, Tempo, Loki, Thanos, Open-WebUI, Ollama, Redmine, GitLab, ArgoCD, NetBox
βββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββββββββββ
β VPN Client ββββββΆβ Tailscale ββββββΆβ Kubernetes Cluster β
β 100.64.0.0 β β VPN Mesh β β (Category Namespaces) β
βββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββΌββββββββββββββββββββββββββ
β β β
βββββββΌβββββββ ββββββββΌβββββββ ββββββββββΌββββββββ
β core β β gitops β β inference β
ββββββββββββββ€ βββββββββββββββ€ ββββββββββββββββββ€
β Headscale β β Redmine β β Open-WebUI β
β FreeIPA β β (GitLab) β β Ollama β
ββββββββββββββ βββββββββββββββ ββββββββββββββββββ
β
βββββββΌβββββββ ββββββββββββββββ
βmonitoring β β infra β
ββββββββββββββ€ ββββββββββββββββ€
β Prometheus β β (NetBox) β
β Grafana β β (Vault) β
β Loki β β β
β Thanos β ββββββββββββββββ
ββββββββββββββ
All services: nginx-tls (443) βββΆ Application (8080) + Tailscale (VPN)
All services use the 3-Container StatefulSet Pattern (nginx-tls + application + Tailscale sidecar).
Full Architecture Documentation β
Prerequisites: Kubernetes cluster, Terraform, kubectl, Cloudflare account with API token
Complete Prerequisites Guide β
# Clone and configure
git clone https://github.com/vegcom/charon.git
cd charon
# Set up credentials
cat > .env << EOF
CLOUDFLARE_API_TOKEN="your-cloudflare-api-token"
REDMINE_DB_HOST="postgres.example.com"
REDMINE_DB_PORT="5432"
REDMINE_DB_USER="redmine_user"
REDMINE_DB_NAME="redmine_production"
REDMINE_DB_PASSWORD="secure-password-here"
EOF
# Configure Terraform
cd terraform
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your settings
# Deploy everything
source ../.env
export TF_VAR_cloudflare_api_token="$CLOUDFLARE_API_TOKEN"
terraform init
terraform applyWhat deploys automatically:
- cert-manager + Let's Encrypt
- Headscale VPN with user/key management
- FreeIPA identity management
- All services with TLS certificates and LDAP auth
- DNS records (self-healing with init container updates)
- Prometheus, Grafana, Loki, Thanos, Tempo observability stack
- Distributed tracing with OpenTelemetry integration
- VPN mesh network
- Custom Docker images built in-cluster (NetBox plugins, lifecycle automation)
Detailed Quick Start Guide β
# Generate pre-auth key
kubectl exec -n core headscale-0 -- headscale preauthkeys create \
--user default --reusable --expiration 90d
# Connect your device
tailscale up --login-server https://vpn.example.com --authkey <key>
# Access services (VPN required)
open https://grafana.example.com # Monitoring dashboards
open https://ipa.example.com # Identity management
open https://redmine.example.com # Project management
open https://ai.example.com # AI chat interfaceServices organized by function with strict RBAC boundaries:
- core - Infrastructure everyone depends on (Headscale VPN, FreeIPA identity)
- gitops - Development tools (Redmine, GitLab)
- inference - AI/LLM workloads (Open-WebUI, Ollama)
- infra - Operations tooling (NetBox, Vault)
- monitoring - Observability stack (Prometheus, Grafana, Loki, Thanos, AlertManager)
Benefits:
- Clear security boundaries
- Independent resource quotas
- Simplified RBAC management
- Logical service grouping
Centralized LDAP/Kerberos authentication for all services:
- Single sign-on across all applications
- User and group management
- LDAPS (port 636) for secure authentication
- Automated LDAP configuration via scripts
Full monitoring, logging, distributed tracing, and long-term storage:
- Prometheus - Current metrics collection (35+ targets across 10 jobs)
- Grafana - Dashboards and visualization with Tempo correlations
- Loki - Log aggregation (short-term, emptyDir)
- Thanos - Long-term metrics storage (2x 50Gi retain PVCs)
- Tempo - Distributed tracing with OpenTelemetry integration
- Promtail - Log collection from all pods
Automated DNS management with multiple update strategies:
- Fallback IPs (node IPs) prevent deployment failures
- Init container updates on pod startup
- Async updates when pods connect to VPN
- Automatic cleanup of stale records
- Per-service RBAC with strict permissions
Architecture designed to handle failures gracefully:
- DNS creates with fallback IPs first
- Services deploy independently
- System works even when pods can't start
- No circular dependency failures
- Self-healing when pods recover
Standardized StatefulSet architecture for all services:
- nginx-tls - HTTPS termination and reverse proxy
- Application - Main service (localhost only)
- Tailscale - VPN sidecar
Benefits: Security isolation, stability, reliability, consistency
StatefulSet Pattern Details β
Complete Documentation Index β
# View VPN devices
kubectl exec -n core headscale-0 -- headscale nodes list
# Backup Redmine database
REDMINE_DB_HOST=... REDMINE_DB_PORT=... python3 scripts/redmine/backup_restore_db.py backup
# Restore database
REDMINE_DB_HOST=... python3 scripts/redmine/backup_restore_db.py restore --file backup.sql
# Configure LDAP for Redmine
python3 scripts/redmine/configure_ldap.py \
--ldap-host freeipa.core.svc.cluster.local \
--ldap-port 636 \
--bind-dn "uid=admin,cn=users,cn=accounts,dc=example,dc=org" \
--bind-password "password" \
--base-dn "cn=users,cn=accounts,dc=example,dc=org"Pods not starting? Check storage class and PVCs
kubectl get pvc -A
kubectl describe pod <pod-name> -n <namespace>DNS not resolving? Verify Cloudflare credentials
dig vpn.example.comCertificates failing? Check cert-manager and Cloudflare token permissions
kubectl get certificate -A
kubectl logs -n cert-manager -l app=cert-managerVPN issues? Verify Headscale is running and external ingress accessible
kubectl logs -n core headscale-0
curl -I https://vpn.example.com/healthLDAP auth not working? Check FreeIPA connectivity and credentials
kubectl exec -n core freeipa-0 -- ldapsearch -x -b "cn=users,cn=accounts,dc=example,dc=org"Full Troubleshooting Guide β
- Category-based namespace isolation with strict RBAC
- FreeIPA LDAP/Kerberos for all service authentication
- API tokens managed via Terraform (stored in .env file, never committed)
- Automated TLS certificates (Let's Encrypt)
- VPN-only service access (100.64.0.0/10 range)
- Per-service RBAC with minimal permissions
- Volume encryption at rest
- Cross-namespace RBAC documented and scoped
Contributions welcome! See Contributing Guide for:
- Development environment setup
- Code standards (Terraform, Python)
- Testing procedures
- Pull request process
- Branch naming conventions (
feat/,fix/,docs/, etc.)
Key rule: All infrastructure changes via Terraform following Dependency Patterns
CONTRIBUTING.md | Code Standards
cd terraform
terraform destroy # Removes everything except external databasesWarning: Deletes all services and data! External databases (like Redmine's Akamai PostgreSQL) are not affected.
Maintained by @vegcom
MIT License
Quick Links: Documentation | Quick Start | Troubleshooting | Contributing