GrantLedger Platform

Change-safe billing infrastructure for multi-tenant SaaS systems.

GrantLedger is a multi-tenant SaaS billing platform built around correctness, explicit contracts, and operational resilience.

It is designed for billing workflows where idempotency, replay handling, asynchronous orchestration, and boundary clarity are non-negotiable.

Platform Baseline

Current state on main: the platform baseline is complete through ARCH-033, including runtime security and environment contracts, API and worker metrics, a self-hosted deployment stack with smoke validation, a guided demo scenario, and a supply-chain security baseline with container scanning and SBOM generation.

schema-first contracts at the boundary;
idempotent write flows and replay-safe webhook handling;
executable API and worker runtimes with health, readiness, and metrics;
self-hosted deployment validation with Prometheus and Grafana support;
guided demo coverage for end-to-end local validation;
CI and security automation with image scanning, SBOM generation, and CodeQL.

Quick Start

Prerequisites

Node.js >=22 <23
npm >=10 <11
Docker (for Postgres validation)

Install dependencies

npm ci

Fast local confidence loop

npm run typecheck
npm run build
npm run test

Full project gate

npm run quality:gate

Guided self-hosted demo

cp deploy/self-hosted/.env.example deploy/self-hosted/.env
docker compose -f deploy/self-hosted/docker-compose.yml --env-file deploy/self-hosted/.env up -d --build
npm run demo:selfhost

Durable Postgres validation

npm run db:up
DATABASE_URL='postgresql://grantledger_app:grantledger_app@localhost:5432/grantledger_rls' npm run db:migrate
DATABASE_URL='postgresql://grantledger_app:grantledger_app@localhost:5432/grantledger_rls' npm run test:pg

If your local grantledger-postgres volume already exists, recreate it once so that initialisation scripts from db/init are applied cleanly.

Why GrantLedger Exists

Billing rarely stays simple for long.

What begins as a handful of rules usually grows into retries, replay handling, partial failures, timezone-sensitive periods, and business logic spread across handlers, jobs, and external integrations.

GrantLedger exists to model that reality directly. The goal is not to showcase architecture for its own sake. The goal is to provide a practical foundation for SaaS billing flows that remain:

reliable under retries and concurrency;
explicit at the boundaries;
auditable when something goes wrong;
maintainable as the product and team grow.

Design Goals

GrantLedger aims to give product and engineering teams confidence that billing behaviour is:

consistent;
auditable;
resilient under retries and concurrency;
understandable under operational stress;
evolvable without losing architectural coherence.

What Is Implemented Today

Core capabilities

Tenant-aware request context resolution with explicit authentication and access failure semantics.
Checkout orchestration through an application-level payment provider contract.
Subscription state-machine commands with idempotent mutation orchestration.
Webhook normalisation and deduplication with canonical event publishing contracts.
Schema-first invoice API contracts with Zod-inferred types to reduce contract drift.
Unified datetime policy (Luxon-based) across invoice orchestration paths.
Boundary-level payload normalisation to reduce duplication and preserve API consistency.
Replay controls and observer-based operational hooks for async invoice lifecycle monitoring.
Asynchronous invoice generation flow across API, application, worker, and durable Postgres infrastructure:
- enqueue with Idempotency-Key
- poll status by jobId
- process work with retry scheduling and terminal dead-letter behaviour

Behavioural guarantees

Standard API error envelope: { message, code, details?, traceId? }
Application-layer idempotency state model: processing | completed | failed
Conflict safety:
- same key + different payload -> 409
- same key while processing -> 409
Async invoice contract:
- enqueue -> 202 Accepted with jobId
- status -> queued | processing | completed | failed
- transient processing failures can return the job to queued with retry context

Architecture

Layer responsibilities

packages/domain
- entities, invariants, state transitions, deterministic calculations
- no transport, framework, or provider concerns
packages/application
- use-case orchestration, ports/interfaces, idempotency, retry, replay, and audit flow
- no HTTP-specific mapping
apps/api
- boundary validation, header/context resolution, runtime composition, and transport mapping
apps/worker
- worker loop orchestration and operational execution of asynchronous flows
packages/contracts
- canonical types and Zod schemas for boundary contracts
packages/shared
- reusable cross-cutting helpers such as time handling, i18n, observability helpers, and idempotency utilities
packages/infra-postgres
- durable repositories, job stores, webhook persistence, and tenant-scoped infrastructure wiring

API runtime composition

The API layer now follows a clearer separation of concerns:

handlers/
- transport-facing behaviour only
bootstrap/
- runtime assembly and environment-specific dependency wiring
http/
- transport primitives and shared HTTP mapping helpers

This keeps HTTP handlers focused on request/response concerns while moving infrastructure selection into dedicated bootstrap modules.

Dependency direction

apps/* -> application -> domain

contracts, shared, and infrastructure adapters remain foundational packages consumed by the higher layers.

Repository layout

apps/
  api/
    src/
      bootstrap/
      handlers/
      http/
      infrastructure/
  worker/
  admin/

packages/
  application/
  contracts/
  domain/
  infra-postgres/
  shared/

docs/
  adr/
  architecture/
  governance/

Async Invoice Flow

flowchart LR
  C["Client"] -->|POST enqueue + Idempotency-Key| API["API handler"]
  API --> APP1["application.enqueueInvoiceGeneration"]
  APP1 --> JOB["JobStore (queued)"]
  C -->|GET status jobId| API
  W["Worker runInvoiceWorkerOnce"] --> APP2["application.processNextInvoiceGenerationJob"]
  APP2 --> JOB
  APP2 --> INV["InvoiceRepository"]
  APP2 --> AUD["AuditLogger / Observer hooks"]
  JOB -->|completed / failed| API

Monorepo Packages

@grantledger/domain
- business rules and invariants
@grantledger/application
- use cases such as subscription, invoice, idempotency, payment-webhook, auth-context, and checkout
@grantledger/contracts
- shared contracts and Zod schemas across domain, application, and API boundaries
@grantledger/shared
- time policy, i18n baseline, observability helpers, payload hashing, and standard error helpers
@grantledger/infra-postgres
- durable persistence and Postgres-specific runtime wiring
@grantledger/api / @grantledger/worker
- transport-facing orchestration adapters built as testable functions

Tech Stack

Node.js 22.x
TypeScript (strict, project references, exactOptionalPropertyTypes)
npm workspaces
Zod for schema-first boundary validation
Luxon for timezone-safe datetime handling
Vitest for test execution
ESLint for static analysis
GitHub Actions for CI and security automation

Testing Strategy

Testing is intentionally split by feedback speed and risk profile.

Default suite

npm run test
- fast default validation across application, API, and worker behaviour
npm run test:coverage
- default coverage-oriented run for the same fast suite

Durable infrastructure suite

npm run test:pg
- dedicated Postgres integration validation for durable persistence paths in packages/infra-postgres

Why the split exists

This split is deliberate:

local iteration stays fast;
durable persistence behaviour is still validated explicitly;
CI can enforce both fast feedback and infrastructure realism without forcing every local run through Postgres.

Current test scope prioritises business-critical behaviour:

packages/application/src/**/*.test.ts
- idempotency core
- subscription idempotency
- webhook deduplication
- invoice enqueue/process idempotency and retry lifecycle
apps/api/src/**/*.test.ts
- integration-style handler tests for auth, checkout, subscription, invoice, webhook, and error mapping
apps/worker/src/**/*.test.ts
- worker loop behaviour such as idle, processed, retry scheduling, dead-letter handling, and observer-failure resilience
packages/infra-postgres/src/**/*.integration.test.ts
- durable persistence, tenant isolation, invoice jobs, idempotency state, and webhook storage

Common Developer Workflows

Fast validation before a small change

npm run typecheck
npm run build
npm run test

Run the API locally

PERSISTENCE_DRIVER=memory npm run api:dev

The API host exposes:

GET /healthz
GET /readyz
POST /v1/auth/subscriptions
POST /v1/checkout/sessions
POST /v1/invoices/generation
POST /v1/invoices/generation/status
POST /v1/webhooks/provider

Run the worker locally

PERSISTENCE_DRIVER=memory npm run worker:dev

Full validation before opening or updating a PR

npm run quality:gate
DATABASE_URL='postgresql://grantledger_app:grantledger_app@localhost:5432/grantledger_rls' npm run test:pg

Delivery bootstrap

Use the orchestrator when opening or updating delivery PRs with issue/project metadata sync.

DATABASE_URL='postgresql://grantledger_app:grantledger_app@localhost:5432/grantledger_rls' \
bash ./scripts/delivery-bootstrap.sh \
  --issue-number <ISSUE_NUMBER> \
  --issue-body /tmp/issue.md \
  --pr-title "<PR_TITLE>" \
  --pr-body /tmp/pr.md \
  --branch <BRANCH_NAME>

Add --skip-gates only when the relevant checks were already executed on the same branch and commit.

Delivery closeout

Use the closeout orchestrator after checks pass to synchronise PR, issue, and project completion.

bash ./scripts/delivery-closeout.sh --pr <PR_NUMBER>

Use --issue <ISSUE_NUMBER> when the PR body does not contain Closes #N.

Runtime Baseline

GrantLedger now includes a minimal executable runtime baseline for both primary applications:

apps/api/src/server.ts
- a thin Node HTTP host around the existing API handlers
- includes healthz, readyz, and metrics
- reuses the same Postgres pool for request handling and readiness checks when PERSISTENCE_DRIVER=postgres
apps/worker/src/main.ts
- a long-running worker process around runInvoiceWorkerOnce
- supports configurable polling through WORKER_POLL_INTERVAL_MS
- exposes Prometheus-style metrics on a dedicated HTTP endpoint

Environment contract

API

.env.example
- provides a baseline local/runtime configuration template
API_HOST
- optional
- defaults to 0.0.0.0
API_PORT
- optional
- defaults to 3000
API_JSON_BODY_LIMIT_BYTES
- optional
- defaults to 1048576 (1 MiB)
PERSISTENCE_DRIVER
- optional
- memory by default
- postgres requires DATABASE_URL
DATABASE_URL
- required when PERSISTENCE_DRIVER=postgres
STRIPE_WEBHOOK_SECRET
- required only when processing Stripe webhooks through the runtime host
GRANTLEDGER_VERSION
- optional
- included in structured logs and metrics labels when provided

Worker

.env.example
- provides a baseline local/runtime configuration template
PERSISTENCE_DRIVER
- optional
- memory by default
- postgres requires DATABASE_URL
DATABASE_URL
- required when PERSISTENCE_DRIVER=postgres
WORKER_TENANT_ID
- required when PERSISTENCE_DRIVER=postgres
WORKER_ID
- optional stable worker identifier override
JOB_LEASE_SECONDS
- optional
- defaults to 30
JOB_HEARTBEAT_SECONDS
- optional
- defaults to 10
WORKER_POLL_INTERVAL_MS
- optional
- defaults to 1000
WORKER_METRICS_HOST
- optional
- defaults to 0.0.0.0
WORKER_METRICS_PORT
- optional
- defaults to 9464

Production-like local validation

Build and run the applications directly:

npm run build
PERSISTENCE_DRIVER=memory npm run api:start
PERSISTENCE_DRIVER=memory npm run worker:start

Or build container images:

docker build -f apps/api/Dockerfile -t grantledger-api .
docker build -f apps/worker/Dockerfile -t grantledger-worker .

Both runtime images now execute as a non-root user by default.

Observability Baseline

GrantLedger now exposes a minimal operational observability surface suitable for local validation and review:

API metrics:
- GET /metrics
- request count, request duration, error count, and health/readiness state
Worker metrics:
- dedicated HTTP metrics host and port
- cycle count, cycle duration, failure count, queue depth, retry/dead-letter gauges, and terminal failure rate
Structured logs:
- stable service, runtime, environment, and optional version metadata

Observability assets now live under deploy/observability:

deploy/observability/prometheus.local.yml
- local Prometheus scrape configuration for API and worker metrics
deploy/observability/prometheus.self-hosted.yml
- self-hosted scrape configuration for the compose stack
deploy/observability/grafana/dashboards/grantledger-runtime.json
- starter Grafana dashboard covering API and worker runtime signals
deploy/observability/grafana/provisioning/
- Grafana datasource and dashboard provisioning for the self-hosted stack

Local observability path

Start the services locally:

PERSISTENCE_DRIVER=memory npm run api:start
PERSISTENCE_DRIVER=memory npm run worker:start

Then scrape:

API metrics at http://localhost:3000/metrics
Worker metrics at http://localhost:9464/metrics

Use the Prometheus config in deploy/observability/prometheus.local.yml, and import the Grafana dashboard from deploy/observability/grafana/dashboards/grantledger-runtime.json.

Self-Hosted Deployment Baseline

GrantLedger now includes a production-like self-hosted stack under deploy/self-hosted.

Included services:

postgres
migrate
api
worker
prometheus
grafana

Quick start

cp deploy/self-hosted/.env.example deploy/self-hosted/.env
npm run selfhost:smoke

The smoke path will:

build the API and worker images
boot the full self-hosted stack
wait for database, API, worker, Prometheus, and Grafana readiness
verify key metrics are exposed
fail non-zero if the stack is unhealthy

Important deployment notes

The root docker-compose.yml remains the local development baseline for Postgres only.
The self-hosted stack is intentionally isolated under deploy/self-hosted so development and production-like validation stay separate.
The self-hosted stack uses isolated defaults (API_PORT=3000, API_HOST_PORT=13000, POSTGRES_PORT=15432, WORKER_METRICS_PORT=19464, PROMETHEUS_PORT=19090, GRAFANA_PORT=13001) to avoid clashing with local development services.
CI now builds both Docker runtime images, while the full self-hosted smoke path remains a documented manual validation step.

Governance and Architecture Discipline

Architecture changes follow an issue-driven stream (ARCH-*) with mandatory documentation updates.

Tracker: docs/architecture/ARCH-TRACKER.md
Guardrails: docs/architecture/ARCH-GUARDRAILS.md
Roadmap: docs/architecture/IMPROVEMENT-ROADMAP.md
Health check: docs/governance/architecture-health-check.md
Security operations: docs/governance/security-operations.md
Contribution guideline: CONTRIBUTING.md
PR checklist: .github/pull_request_template.md

Accepted ADRs

ADR-005 Domain vs Application boundary
ADR-006 Schema-first validation with Zod
ADR-007 Timezone-safe datetime policy (Luxon)
ADR-008 Standard error model and centralised API mapping
ADR-009 Generic idempotency executor
ADR-010 i18n foundation (en-US baseline)
ADR-011 Idempotency state machine and concurrency behaviour
ADR-012 Classes vs functions guideline
ADR-013 Async idempotent invoice rollout
ADR-014 Durable invoice async infrastructure strategy
ADR-015 Invoice async operational readiness
ADR-016 Schema-first contracts, unified time policy, and boundary deduplication polish

Current Trade-offs and Next Steps

Deterministic in-memory adapters are still used in selected paths for simplicity and fast local feedback.
Durable Postgres-backed behaviour is already modelled and validated for the infrastructure paths that matter most.
The next architectural move should be driven by a concrete structural risk, not change for change's sake.

Project Links

Repository: gabedalmolin/grantledger-platform
Project board: GitHub Project #6

Acknowledgements

Special thanks to Marcos Pont, for his support, technical guidance, and consistent feedback throughout this project. His mentorship, engineering judgement, and practical perspective were instrumental in challenging assumptions, sharpening architectural decisions, and raising the overall technical standard of this implementation.

References

HTTP Semantics (RFC 9110): https://www.rfc-editor.org/rfc/rfc9110
Zod documentation: https://zod.dev
Luxon documentation: https://moment.github.io/luxon

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
.github		.github
apps		apps
db		db
deploy		deploy
docs		docs
packages		packages
scripts		scripts
.dockerignore		.dockerignore
.env.example		.env.example
.eslintrc.cjs		.eslintrc.cjs
.gitignore		.gitignore
.redocly.yaml		.redocly.yaml
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.base.json		tsconfig.base.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts
vitest.scripts.config.ts		vitest.scripts.config.ts

Folders and files

Latest commit

History

Repository files navigation