Change-safe billing infrastructure for multi-tenant SaaS systems.
GrantLedger is a multi-tenant SaaS billing platform built around correctness, explicit contracts, and operational resilience.
It is designed for billing workflows where idempotency, replay handling, asynchronous orchestration, and boundary clarity are non-negotiable.
Current state on
main: the platform baseline is complete throughARCH-033, including runtime security and environment contracts, API and worker metrics, a self-hosted deployment stack with smoke validation, a guided demo scenario, and a supply-chain security baseline with container scanning and SBOM generation.
- schema-first contracts at the boundary;
- idempotent write flows and replay-safe webhook handling;
- executable API and worker runtimes with health, readiness, and metrics;
- self-hosted deployment validation with Prometheus and Grafana support;
- guided demo coverage for end-to-end local validation;
- CI and security automation with image scanning, SBOM generation, and CodeQL.
- Node.js
>=22 <23 - npm
>=10 <11 - Docker (for Postgres validation)
npm cinpm run typecheck
npm run build
npm run testnpm run quality:gatecp deploy/self-hosted/.env.example deploy/self-hosted/.env
docker compose -f deploy/self-hosted/docker-compose.yml --env-file deploy/self-hosted/.env up -d --build
npm run demo:selfhostnpm run db:up
DATABASE_URL='postgresql://grantledger_app:grantledger_app@localhost:5432/grantledger_rls' npm run db:migrate
DATABASE_URL='postgresql://grantledger_app:grantledger_app@localhost:5432/grantledger_rls' npm run test:pgIf your local grantledger-postgres volume already exists, recreate it once so that initialisation scripts from db/init are applied cleanly.
Billing rarely stays simple for long.
What begins as a handful of rules usually grows into retries, replay handling, partial failures, timezone-sensitive periods, and business logic spread across handlers, jobs, and external integrations.
GrantLedger exists to model that reality directly. The goal is not to showcase architecture for its own sake. The goal is to provide a practical foundation for SaaS billing flows that remain:
- reliable under retries and concurrency;
- explicit at the boundaries;
- auditable when something goes wrong;
- maintainable as the product and team grow.
GrantLedger aims to give product and engineering teams confidence that billing behaviour is:
- consistent;
- auditable;
- resilient under retries and concurrency;
- understandable under operational stress;
- evolvable without losing architectural coherence.
- Tenant-aware request context resolution with explicit authentication and access failure semantics.
- Checkout orchestration through an application-level payment provider contract.
- Subscription state-machine commands with idempotent mutation orchestration.
- Webhook normalisation and deduplication with canonical event publishing contracts.
- Schema-first invoice API contracts with Zod-inferred types to reduce contract drift.
- Unified datetime policy (Luxon-based) across invoice orchestration paths.
- Boundary-level payload normalisation to reduce duplication and preserve API consistency.
- Replay controls and observer-based operational hooks for async invoice lifecycle monitoring.
- Asynchronous invoice generation flow across API, application, worker, and durable Postgres infrastructure:
- enqueue with
Idempotency-Key - poll status by
jobId - process work with retry scheduling and terminal dead-letter behaviour
- enqueue with
- Standard API error envelope:
{ message, code, details?, traceId? } - Application-layer idempotency state model:
processing | completed | failed - Conflict safety:
- same key + different payload ->
409 - same key while processing ->
409
- same key + different payload ->
- Async invoice contract:
- enqueue ->
202 AcceptedwithjobId - status ->
queued | processing | completed | failed - transient processing failures can return the job to
queuedwith retry context
- enqueue ->
packages/domain- entities, invariants, state transitions, deterministic calculations
- no transport, framework, or provider concerns
packages/application- use-case orchestration, ports/interfaces, idempotency, retry, replay, and audit flow
- no HTTP-specific mapping
apps/api- boundary validation, header/context resolution, runtime composition, and transport mapping
apps/worker- worker loop orchestration and operational execution of asynchronous flows
packages/contracts- canonical types and Zod schemas for boundary contracts
packages/shared- reusable cross-cutting helpers such as time handling, i18n, observability helpers, and idempotency utilities
packages/infra-postgres- durable repositories, job stores, webhook persistence, and tenant-scoped infrastructure wiring
The API layer now follows a clearer separation of concerns:
handlers/- transport-facing behaviour only
bootstrap/- runtime assembly and environment-specific dependency wiring
http/- transport primitives and shared HTTP mapping helpers
This keeps HTTP handlers focused on request/response concerns while moving infrastructure selection into dedicated bootstrap modules.
apps/* -> application -> domain
contracts, shared, and infrastructure adapters remain foundational packages consumed by the higher layers.
apps/
api/
src/
bootstrap/
handlers/
http/
infrastructure/
worker/
admin/
packages/
application/
contracts/
domain/
infra-postgres/
shared/
docs/
adr/
architecture/
governance/flowchart LR
C["Client"] -->|POST enqueue + Idempotency-Key| API["API handler"]
API --> APP1["application.enqueueInvoiceGeneration"]
APP1 --> JOB["JobStore (queued)"]
C -->|GET status jobId| API
W["Worker runInvoiceWorkerOnce"] --> APP2["application.processNextInvoiceGenerationJob"]
APP2 --> JOB
APP2 --> INV["InvoiceRepository"]
APP2 --> AUD["AuditLogger / Observer hooks"]
JOB -->|completed / failed| API
@grantledger/domain- business rules and invariants
@grantledger/application- use cases such as
subscription,invoice,idempotency,payment-webhook,auth-context, andcheckout
- use cases such as
@grantledger/contracts- shared contracts and Zod schemas across domain, application, and API boundaries
@grantledger/shared- time policy, i18n baseline, observability helpers, payload hashing, and standard error helpers
@grantledger/infra-postgres- durable persistence and Postgres-specific runtime wiring
@grantledger/api/@grantledger/worker- transport-facing orchestration adapters built as testable functions
- Node.js
22.x - TypeScript (
strict, project references,exactOptionalPropertyTypes) - npm workspaces
- Zod for schema-first boundary validation
- Luxon for timezone-safe datetime handling
- Vitest for test execution
- ESLint for static analysis
- GitHub Actions for CI and security automation
Testing is intentionally split by feedback speed and risk profile.
npm run test- fast default validation across application, API, and worker behaviour
npm run test:coverage- default coverage-oriented run for the same fast suite
npm run test:pg- dedicated Postgres integration validation for durable persistence paths in
packages/infra-postgres
- dedicated Postgres integration validation for durable persistence paths in
This split is deliberate:
- local iteration stays fast;
- durable persistence behaviour is still validated explicitly;
- CI can enforce both fast feedback and infrastructure realism without forcing every local run through Postgres.
Current test scope prioritises business-critical behaviour:
packages/application/src/**/*.test.ts- idempotency core
- subscription idempotency
- webhook deduplication
- invoice enqueue/process idempotency and retry lifecycle
apps/api/src/**/*.test.ts- integration-style handler tests for auth, checkout, subscription, invoice, webhook, and error mapping
apps/worker/src/**/*.test.ts- worker loop behaviour such as
idle,processed, retry scheduling, dead-letter handling, and observer-failure resilience
- worker loop behaviour such as
packages/infra-postgres/src/**/*.integration.test.ts- durable persistence, tenant isolation, invoice jobs, idempotency state, and webhook storage
npm run typecheck
npm run build
npm run testPERSISTENCE_DRIVER=memory npm run api:devThe API host exposes:
GET /healthzGET /readyzPOST /v1/auth/subscriptionsPOST /v1/checkout/sessionsPOST /v1/invoices/generationPOST /v1/invoices/generation/statusPOST /v1/webhooks/provider
PERSISTENCE_DRIVER=memory npm run worker:devnpm run quality:gate
DATABASE_URL='postgresql://grantledger_app:grantledger_app@localhost:5432/grantledger_rls' npm run test:pgUse the orchestrator when opening or updating delivery PRs with issue/project metadata sync.
DATABASE_URL='postgresql://grantledger_app:grantledger_app@localhost:5432/grantledger_rls' \
bash ./scripts/delivery-bootstrap.sh \
--issue-number <ISSUE_NUMBER> \
--issue-body /tmp/issue.md \
--pr-title "<PR_TITLE>" \
--pr-body /tmp/pr.md \
--branch <BRANCH_NAME>Add --skip-gates only when the relevant checks were already executed on the same branch and commit.
Use the closeout orchestrator after checks pass to synchronise PR, issue, and project completion.
bash ./scripts/delivery-closeout.sh --pr <PR_NUMBER>Use --issue <ISSUE_NUMBER> when the PR body does not contain Closes #N.
GrantLedger now includes a minimal executable runtime baseline for both primary applications:
apps/api/src/server.ts- a thin Node HTTP host around the existing API handlers
- includes
healthz,readyz, andmetrics - reuses the same Postgres pool for request handling and readiness checks when
PERSISTENCE_DRIVER=postgres
apps/worker/src/main.ts- a long-running worker process around
runInvoiceWorkerOnce - supports configurable polling through
WORKER_POLL_INTERVAL_MS - exposes Prometheus-style metrics on a dedicated HTTP endpoint
- a long-running worker process around
.env.example- provides a baseline local/runtime configuration template
API_HOST- optional
- defaults to
0.0.0.0
API_PORT- optional
- defaults to
3000
API_JSON_BODY_LIMIT_BYTES- optional
- defaults to
1048576(1 MiB)
PERSISTENCE_DRIVER- optional
memoryby defaultpostgresrequiresDATABASE_URL
DATABASE_URL- required when
PERSISTENCE_DRIVER=postgres
- required when
STRIPE_WEBHOOK_SECRET- required only when processing Stripe webhooks through the runtime host
GRANTLEDGER_VERSION- optional
- included in structured logs and metrics labels when provided
.env.example- provides a baseline local/runtime configuration template
PERSISTENCE_DRIVER- optional
memoryby defaultpostgresrequiresDATABASE_URL
DATABASE_URL- required when
PERSISTENCE_DRIVER=postgres
- required when
WORKER_TENANT_ID- required when
PERSISTENCE_DRIVER=postgres
- required when
WORKER_ID- optional stable worker identifier override
JOB_LEASE_SECONDS- optional
- defaults to
30
JOB_HEARTBEAT_SECONDS- optional
- defaults to
10
WORKER_POLL_INTERVAL_MS- optional
- defaults to
1000
WORKER_METRICS_HOST- optional
- defaults to
0.0.0.0
WORKER_METRICS_PORT- optional
- defaults to
9464
Build and run the applications directly:
npm run build
PERSISTENCE_DRIVER=memory npm run api:start
PERSISTENCE_DRIVER=memory npm run worker:startOr build container images:
docker build -f apps/api/Dockerfile -t grantledger-api .
docker build -f apps/worker/Dockerfile -t grantledger-worker .Both runtime images now execute as a non-root user by default.
GrantLedger now exposes a minimal operational observability surface suitable for local validation and review:
- API metrics:
GET /metrics- request count, request duration, error count, and health/readiness state
- Worker metrics:
- dedicated HTTP metrics host and port
- cycle count, cycle duration, failure count, queue depth, retry/dead-letter gauges, and terminal failure rate
- Structured logs:
- stable
service,runtime,environment, and optionalversionmetadata
- stable
Observability assets now live under deploy/observability:
deploy/observability/prometheus.local.yml- local Prometheus scrape configuration for API and worker metrics
deploy/observability/prometheus.self-hosted.yml- self-hosted scrape configuration for the compose stack
deploy/observability/grafana/dashboards/grantledger-runtime.json- starter Grafana dashboard covering API and worker runtime signals
deploy/observability/grafana/provisioning/- Grafana datasource and dashboard provisioning for the self-hosted stack
Start the services locally:
PERSISTENCE_DRIVER=memory npm run api:start
PERSISTENCE_DRIVER=memory npm run worker:startThen scrape:
- API metrics at
http://localhost:3000/metrics - Worker metrics at
http://localhost:9464/metrics
Use the Prometheus config in deploy/observability/prometheus.local.yml, and import the Grafana dashboard from deploy/observability/grafana/dashboards/grantledger-runtime.json.
GrantLedger now includes a production-like self-hosted stack under deploy/self-hosted.
Included services:
postgresmigrateapiworkerprometheusgrafana
cp deploy/self-hosted/.env.example deploy/self-hosted/.env
npm run selfhost:smokeThe smoke path will:
- build the API and worker images
- boot the full self-hosted stack
- wait for database, API, worker, Prometheus, and Grafana readiness
- verify key metrics are exposed
- fail non-zero if the stack is unhealthy
- The root
docker-compose.ymlremains the local development baseline for Postgres only. - The self-hosted stack is intentionally isolated under
deploy/self-hostedso development and production-like validation stay separate. - The self-hosted stack uses isolated defaults (
API_PORT=3000,API_HOST_PORT=13000,POSTGRES_PORT=15432,WORKER_METRICS_PORT=19464,PROMETHEUS_PORT=19090,GRAFANA_PORT=13001) to avoid clashing with local development services. - CI now builds both Docker runtime images, while the full self-hosted smoke path remains a documented manual validation step.
Architecture changes follow an issue-driven stream (ARCH-*) with mandatory documentation updates.
- Tracker:
docs/architecture/ARCH-TRACKER.md - Guardrails:
docs/architecture/ARCH-GUARDRAILS.md - Roadmap:
docs/architecture/IMPROVEMENT-ROADMAP.md - Health check:
docs/governance/architecture-health-check.md - Security operations:
docs/governance/security-operations.md - Contribution guideline:
CONTRIBUTING.md - PR checklist:
.github/pull_request_template.md
ADR-005Domain vs Application boundaryADR-006Schema-first validation with ZodADR-007Timezone-safe datetime policy (Luxon)ADR-008Standard error model and centralised API mappingADR-009Generic idempotency executorADR-010i18n foundation (en-USbaseline)ADR-011Idempotency state machine and concurrency behaviourADR-012Classes vs functions guidelineADR-013Async idempotent invoice rolloutADR-014Durable invoice async infrastructure strategyADR-015Invoice async operational readinessADR-016Schema-first contracts, unified time policy, and boundary deduplication polish
- Deterministic in-memory adapters are still used in selected paths for simplicity and fast local feedback.
- Durable Postgres-backed behaviour is already modelled and validated for the infrastructure paths that matter most.
- The next architectural move should be driven by a concrete structural risk, not change for change's sake.
- Repository: gabedalmolin/grantledger-platform
- Project board: GitHub Project #6
Special thanks to Marcos Pont, for his support, technical guidance, and consistent feedback throughout this project. His mentorship, engineering judgement, and practical perspective were instrumental in challenging assumptions, sharpening architectural decisions, and raising the overall technical standard of this implementation.
- HTTP Semantics (RFC 9110): https://www.rfc-editor.org/rfc/rfc9110
- Zod documentation: https://zod.dev
- Luxon documentation: https://moment.github.io/luxon
