Skip to content

WIP ci: replace staging workflows with LXC-local testing#886

Open
hpk42 wants to merge 32 commits intohpk/lxcdeployfrom
hpk/lxc-ci
Open

WIP ci: replace staging workflows with LXC-local testing#886
hpk42 wants to merge 32 commits intohpk/lxcdeployfrom
hpk/lxc-ci

Conversation

@hpk42
Copy link
Contributor

@hpk42 hpk42 commented Mar 7, 2026

Replace the two staging-server CI workflows and their helper files with a single lxc-test job in ci.yaml
the new CI can run concurrently on multiple PRs without canceling each other, is faster and does not need any remote server.

The CI job caches lxc container images during the PR lifetime. A different PR will have their own set of images.
The PR also contains some internal output refactorings and parallelizes two cmdeploy-run's into two local lxc containers at the same time. The same run that happens in the now relatively small ci.yml can also be done locally -- "cmdeploy lxc-test".

Note that this PR' does currently not exercise the ACME relay setup. I think it's best to run a full ACME setup (without any copying around of files) from a daily-running job from main branch. The lxc-local testing tests out most everything else.

hpk42 added 12 commits March 8, 2026 23:40
Add cmdeploy "lxc-test" command to run cmdeploy against local containers,
with supplementary lxc-start, lxc-stop and lxc-status subcommands.
See doc/source/lxc.rst for full documentation including prerequisites,
DNS setup, TLS handling, DNS-free testing, and known limitations.

Apart from adding lxc-specific docs, tests, and implementation files in the cmdeploy/lxc directory,
this PR adds the --ssh-config option to cmdeploy run/dns/status/test commands and pyinfra invocations,
and also to sshexec (Execnet) handling.  This allows for the host to need no DNS entries for a relay,
and route all resolution through ssh-config.  This is used by the "lxc-test" command, which performs
a completely local setup -- again, see docs for more details.

While working on DNS/SSH things i also unified all zone-file handling
to use actual BIND format as it is easy enough to parse back.
…roy-all

After restarting pdns/pdns-recursor, wait up to 10 s for the recursor
to actually answer a query before proceeding.  Likewise, after
configure_dns(), poll from inside the relay container until the
configured DNS IP responds.

On --destroy-all, unset the incusbr0 dns.mode and raw.dnsmasq network
options so the next lxc-start starts from a clean bridge state.

DNSConfigurationError (caught in main()) is raised on timeout so the
CLI prints a clean error instead of failing later with a cryptic message.
… output

Move the Out output-printer class to cmdeploy/util.py so it is shared
across CLI modules.  All print/shell calls in lxc/cli.py, lxc/incus.py,
and dns.py now route through Out instead of bare print().

Key additions:
- Out.section() / Out.section_line(): coloured section headers scaled
  to the current terminal width (or $_CMDEPLOY_WIDTH for sub-processes).
- Out.shell(): merges stdout/stderr, prefixes each output line, and
  prints a red error line with the exit code on failure.
- Out.new_prefixed_out(): indented sub-printer that shares section_timings.
- 'cmdeploy -v / -vv' exposes the verbosity levels.
- Tests for Out added to test_util.py.
Move the policy-rc.d install/remove boilerplate into a shared context
manager in basedeploy.py so both UnboundDeployer and DovecotDeployer use
the same abstraction, and the DNS container's _install_powerdns() inline
shell uses the same pattern.

DovecotDeployer now wraps its three package installs in
blocked_service_startup() to prevent Dovecot from auto-starting on
initial install — avoiding bind conflicts on IPv4-only systems.
Replace the two-function find_relay_image / _find_relay_image pair
with Incus.find_image(aliases), which returns the first alias
that exists in the local image store, or None.

Container.launch() passes [RELAY_IMAGE_ALIAS, BASE_IMAGE_ALIAS]
and ensure_base_image() passes [BASE_IMAGE_ALIAS].
@hpk42
Copy link
Contributor Author

hpk42 commented Mar 9, 2026

How does this PR relate to the https://github.com/chatmail/hetzner-relay effort? I don't think it replaces it, but you remove the yaml workflows where I wanted to integrate the hetzner pool.

not sure -- have not looked much into the hetzner-relay effort. Let's discuss later in the week.

@hpk42 hpk42 force-pushed the hpk/lxc-ci branch 3 times, most recently from 24e6806 to 72bdf0f Compare March 9, 2026 22:41
@hpk42 hpk42 force-pushed the hpk/lxc-ci branch 8 times, most recently from 8ee6aa3 to d7009ae Compare March 10, 2026 18:08
hpk42 added 6 commits March 10, 2026 21:14
Capture stderr separately in Incus.run() instead of merging
it into stdout, which corrupted JSON parsing in run_json().
Add --quiet flag to reduce incus noise.
Suppress stderr in Remote subprocess to prevent pytest-xdist
communication issues.
ChatmailDeployer now receives the full config object
so it can access mailboxes_dir and create this directory
during deployment, preventing mail delivery failures.
lxc-start --run deployed the relay but never loaded DNS zones
or restarted filtermail-incoming, so DKIM verification and
outgoing mail failed.
fixes silent fail on invalid configuration file.
Remove the staging VPS-based test-and-deploy workflows and
replace them with LXC-local CI. Refactor image caching
to use per-container aliases instead of a single relay image.
Ensure postfix/dovecot/opendkim restarts after unbound restarts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants