Skip to content

gabedalmolin/grantledger-platform

Repository files navigation

GrantLedger Platform

GrantLedger - Operational trust at scale

Change-safe billing infrastructure for multi-tenant SaaS systems.

GrantLedger is a multi-tenant SaaS billing platform built around correctness, explicit contracts, and operational resilience.

It is designed for billing workflows where idempotency, replay handling, asynchronous orchestration, and boundary clarity are non-negotiable.

Platform Baseline

Current state on main: the platform baseline is complete through ARCH-033, including runtime security and environment contracts, API and worker metrics, a self-hosted deployment stack with smoke validation, a guided demo scenario, and a supply-chain security baseline with container scanning and SBOM generation.

  • schema-first contracts at the boundary;
  • idempotent write flows and replay-safe webhook handling;
  • executable API and worker runtimes with health, readiness, and metrics;
  • self-hosted deployment validation with Prometheus and Grafana support;
  • guided demo coverage for end-to-end local validation;
  • CI and security automation with image scanning, SBOM generation, and CodeQL.

Quick Start

Prerequisites

  • Node.js >=22 <23
  • npm >=10 <11
  • Docker (for Postgres validation)

Install dependencies

npm ci

Fast local confidence loop

npm run typecheck
npm run build
npm run test

Full project gate

npm run quality:gate

Guided self-hosted demo

cp deploy/self-hosted/.env.example deploy/self-hosted/.env
docker compose -f deploy/self-hosted/docker-compose.yml --env-file deploy/self-hosted/.env up -d --build
npm run demo:selfhost

Durable Postgres validation

npm run db:up
DATABASE_URL='postgresql://grantledger_app:grantledger_app@localhost:5432/grantledger_rls' npm run db:migrate
DATABASE_URL='postgresql://grantledger_app:grantledger_app@localhost:5432/grantledger_rls' npm run test:pg

If your local grantledger-postgres volume already exists, recreate it once so that initialisation scripts from db/init are applied cleanly.

Why GrantLedger Exists

Billing rarely stays simple for long.

What begins as a handful of rules usually grows into retries, replay handling, partial failures, timezone-sensitive periods, and business logic spread across handlers, jobs, and external integrations.

GrantLedger exists to model that reality directly. The goal is not to showcase architecture for its own sake. The goal is to provide a practical foundation for SaaS billing flows that remain:

  • reliable under retries and concurrency;
  • explicit at the boundaries;
  • auditable when something goes wrong;
  • maintainable as the product and team grow.

Design Goals

GrantLedger aims to give product and engineering teams confidence that billing behaviour is:

  • consistent;
  • auditable;
  • resilient under retries and concurrency;
  • understandable under operational stress;
  • evolvable without losing architectural coherence.

What Is Implemented Today

Core capabilities

  • Tenant-aware request context resolution with explicit authentication and access failure semantics.
  • Checkout orchestration through an application-level payment provider contract.
  • Subscription state-machine commands with idempotent mutation orchestration.
  • Webhook normalisation and deduplication with canonical event publishing contracts.
  • Schema-first invoice API contracts with Zod-inferred types to reduce contract drift.
  • Unified datetime policy (Luxon-based) across invoice orchestration paths.
  • Boundary-level payload normalisation to reduce duplication and preserve API consistency.
  • Replay controls and observer-based operational hooks for async invoice lifecycle monitoring.
  • Asynchronous invoice generation flow across API, application, worker, and durable Postgres infrastructure:
    • enqueue with Idempotency-Key
    • poll status by jobId
    • process work with retry scheduling and terminal dead-letter behaviour

Behavioural guarantees

  • Standard API error envelope: { message, code, details?, traceId? }
  • Application-layer idempotency state model: processing | completed | failed
  • Conflict safety:
    • same key + different payload -> 409
    • same key while processing -> 409
  • Async invoice contract:
    • enqueue -> 202 Accepted with jobId
    • status -> queued | processing | completed | failed
    • transient processing failures can return the job to queued with retry context

Architecture

Layer responsibilities

  • packages/domain
    • entities, invariants, state transitions, deterministic calculations
    • no transport, framework, or provider concerns
  • packages/application
    • use-case orchestration, ports/interfaces, idempotency, retry, replay, and audit flow
    • no HTTP-specific mapping
  • apps/api
    • boundary validation, header/context resolution, runtime composition, and transport mapping
  • apps/worker
    • worker loop orchestration and operational execution of asynchronous flows
  • packages/contracts
    • canonical types and Zod schemas for boundary contracts
  • packages/shared
    • reusable cross-cutting helpers such as time handling, i18n, observability helpers, and idempotency utilities
  • packages/infra-postgres
    • durable repositories, job stores, webhook persistence, and tenant-scoped infrastructure wiring

API runtime composition

The API layer now follows a clearer separation of concerns:

  • handlers/
    • transport-facing behaviour only
  • bootstrap/
    • runtime assembly and environment-specific dependency wiring
  • http/
    • transport primitives and shared HTTP mapping helpers

This keeps HTTP handlers focused on request/response concerns while moving infrastructure selection into dedicated bootstrap modules.

Dependency direction

apps/* -> application -> domain

contracts, shared, and infrastructure adapters remain foundational packages consumed by the higher layers.

Repository layout

apps/
  api/
    src/
      bootstrap/
      handlers/
      http/
      infrastructure/
  worker/
  admin/

packages/
  application/
  contracts/
  domain/
  infra-postgres/
  shared/

docs/
  adr/
  architecture/
  governance/

Async Invoice Flow

flowchart LR
  C["Client"] -->|POST enqueue + Idempotency-Key| API["API handler"]
  API --> APP1["application.enqueueInvoiceGeneration"]
  APP1 --> JOB["JobStore (queued)"]
  C -->|GET status jobId| API
  W["Worker runInvoiceWorkerOnce"] --> APP2["application.processNextInvoiceGenerationJob"]
  APP2 --> JOB
  APP2 --> INV["InvoiceRepository"]
  APP2 --> AUD["AuditLogger / Observer hooks"]
  JOB -->|completed / failed| API
Loading

Monorepo Packages

  • @grantledger/domain
    • business rules and invariants
  • @grantledger/application
    • use cases such as subscription, invoice, idempotency, payment-webhook, auth-context, and checkout
  • @grantledger/contracts
    • shared contracts and Zod schemas across domain, application, and API boundaries
  • @grantledger/shared
    • time policy, i18n baseline, observability helpers, payload hashing, and standard error helpers
  • @grantledger/infra-postgres
    • durable persistence and Postgres-specific runtime wiring
  • @grantledger/api / @grantledger/worker
    • transport-facing orchestration adapters built as testable functions

Tech Stack

  • Node.js 22.x
  • TypeScript (strict, project references, exactOptionalPropertyTypes)
  • npm workspaces
  • Zod for schema-first boundary validation
  • Luxon for timezone-safe datetime handling
  • Vitest for test execution
  • ESLint for static analysis
  • GitHub Actions for CI and security automation

Testing Strategy

Testing is intentionally split by feedback speed and risk profile.

Default suite

  • npm run test
    • fast default validation across application, API, and worker behaviour
  • npm run test:coverage
    • default coverage-oriented run for the same fast suite

Durable infrastructure suite

  • npm run test:pg
    • dedicated Postgres integration validation for durable persistence paths in packages/infra-postgres

Why the split exists

This split is deliberate:

  • local iteration stays fast;
  • durable persistence behaviour is still validated explicitly;
  • CI can enforce both fast feedback and infrastructure realism without forcing every local run through Postgres.

Current test scope prioritises business-critical behaviour:

  • packages/application/src/**/*.test.ts
    • idempotency core
    • subscription idempotency
    • webhook deduplication
    • invoice enqueue/process idempotency and retry lifecycle
  • apps/api/src/**/*.test.ts
    • integration-style handler tests for auth, checkout, subscription, invoice, webhook, and error mapping
  • apps/worker/src/**/*.test.ts
    • worker loop behaviour such as idle, processed, retry scheduling, dead-letter handling, and observer-failure resilience
  • packages/infra-postgres/src/**/*.integration.test.ts
    • durable persistence, tenant isolation, invoice jobs, idempotency state, and webhook storage

Common Developer Workflows

Fast validation before a small change

npm run typecheck
npm run build
npm run test

Run the API locally

PERSISTENCE_DRIVER=memory npm run api:dev

The API host exposes:

  • GET /healthz
  • GET /readyz
  • POST /v1/auth/subscriptions
  • POST /v1/checkout/sessions
  • POST /v1/invoices/generation
  • POST /v1/invoices/generation/status
  • POST /v1/webhooks/provider

Run the worker locally

PERSISTENCE_DRIVER=memory npm run worker:dev

Full validation before opening or updating a PR

npm run quality:gate
DATABASE_URL='postgresql://grantledger_app:grantledger_app@localhost:5432/grantledger_rls' npm run test:pg

Delivery bootstrap

Use the orchestrator when opening or updating delivery PRs with issue/project metadata sync.

DATABASE_URL='postgresql://grantledger_app:grantledger_app@localhost:5432/grantledger_rls' \
bash ./scripts/delivery-bootstrap.sh \
  --issue-number <ISSUE_NUMBER> \
  --issue-body /tmp/issue.md \
  --pr-title "<PR_TITLE>" \
  --pr-body /tmp/pr.md \
  --branch <BRANCH_NAME>

Add --skip-gates only when the relevant checks were already executed on the same branch and commit.

Delivery closeout

Use the closeout orchestrator after checks pass to synchronise PR, issue, and project completion.

bash ./scripts/delivery-closeout.sh --pr <PR_NUMBER>

Use --issue <ISSUE_NUMBER> when the PR body does not contain Closes #N.

Runtime Baseline

GrantLedger now includes a minimal executable runtime baseline for both primary applications:

  • apps/api/src/server.ts
    • a thin Node HTTP host around the existing API handlers
    • includes healthz, readyz, and metrics
    • reuses the same Postgres pool for request handling and readiness checks when PERSISTENCE_DRIVER=postgres
  • apps/worker/src/main.ts
    • a long-running worker process around runInvoiceWorkerOnce
    • supports configurable polling through WORKER_POLL_INTERVAL_MS
    • exposes Prometheus-style metrics on a dedicated HTTP endpoint

Environment contract

API

  • .env.example
    • provides a baseline local/runtime configuration template
  • API_HOST
    • optional
    • defaults to 0.0.0.0
  • API_PORT
    • optional
    • defaults to 3000
  • API_JSON_BODY_LIMIT_BYTES
    • optional
    • defaults to 1048576 (1 MiB)
  • PERSISTENCE_DRIVER
    • optional
    • memory by default
    • postgres requires DATABASE_URL
  • DATABASE_URL
    • required when PERSISTENCE_DRIVER=postgres
  • STRIPE_WEBHOOK_SECRET
    • required only when processing Stripe webhooks through the runtime host
  • GRANTLEDGER_VERSION
    • optional
    • included in structured logs and metrics labels when provided

Worker

  • .env.example
    • provides a baseline local/runtime configuration template
  • PERSISTENCE_DRIVER
    • optional
    • memory by default
    • postgres requires DATABASE_URL
  • DATABASE_URL
    • required when PERSISTENCE_DRIVER=postgres
  • WORKER_TENANT_ID
    • required when PERSISTENCE_DRIVER=postgres
  • WORKER_ID
    • optional stable worker identifier override
  • JOB_LEASE_SECONDS
    • optional
    • defaults to 30
  • JOB_HEARTBEAT_SECONDS
    • optional
    • defaults to 10
  • WORKER_POLL_INTERVAL_MS
    • optional
    • defaults to 1000
  • WORKER_METRICS_HOST
    • optional
    • defaults to 0.0.0.0
  • WORKER_METRICS_PORT
    • optional
    • defaults to 9464

Production-like local validation

Build and run the applications directly:

npm run build
PERSISTENCE_DRIVER=memory npm run api:start
PERSISTENCE_DRIVER=memory npm run worker:start

Or build container images:

docker build -f apps/api/Dockerfile -t grantledger-api .
docker build -f apps/worker/Dockerfile -t grantledger-worker .

Both runtime images now execute as a non-root user by default.

Observability Baseline

GrantLedger now exposes a minimal operational observability surface suitable for local validation and review:

  • API metrics:
    • GET /metrics
    • request count, request duration, error count, and health/readiness state
  • Worker metrics:
    • dedicated HTTP metrics host and port
    • cycle count, cycle duration, failure count, queue depth, retry/dead-letter gauges, and terminal failure rate
  • Structured logs:
    • stable service, runtime, environment, and optional version metadata

Observability assets now live under deploy/observability:

  • deploy/observability/prometheus.local.yml
    • local Prometheus scrape configuration for API and worker metrics
  • deploy/observability/prometheus.self-hosted.yml
    • self-hosted scrape configuration for the compose stack
  • deploy/observability/grafana/dashboards/grantledger-runtime.json
    • starter Grafana dashboard covering API and worker runtime signals
  • deploy/observability/grafana/provisioning/
    • Grafana datasource and dashboard provisioning for the self-hosted stack

Local observability path

Start the services locally:

PERSISTENCE_DRIVER=memory npm run api:start
PERSISTENCE_DRIVER=memory npm run worker:start

Then scrape:

  • API metrics at http://localhost:3000/metrics
  • Worker metrics at http://localhost:9464/metrics

Use the Prometheus config in deploy/observability/prometheus.local.yml, and import the Grafana dashboard from deploy/observability/grafana/dashboards/grantledger-runtime.json.

Self-Hosted Deployment Baseline

GrantLedger now includes a production-like self-hosted stack under deploy/self-hosted.

Included services:

  • postgres
  • migrate
  • api
  • worker
  • prometheus
  • grafana

Quick start

cp deploy/self-hosted/.env.example deploy/self-hosted/.env
npm run selfhost:smoke

The smoke path will:

  • build the API and worker images
  • boot the full self-hosted stack
  • wait for database, API, worker, Prometheus, and Grafana readiness
  • verify key metrics are exposed
  • fail non-zero if the stack is unhealthy

Important deployment notes

  • The root docker-compose.yml remains the local development baseline for Postgres only.
  • The self-hosted stack is intentionally isolated under deploy/self-hosted so development and production-like validation stay separate.
  • The self-hosted stack uses isolated defaults (API_PORT=3000, API_HOST_PORT=13000, POSTGRES_PORT=15432, WORKER_METRICS_PORT=19464, PROMETHEUS_PORT=19090, GRAFANA_PORT=13001) to avoid clashing with local development services.
  • CI now builds both Docker runtime images, while the full self-hosted smoke path remains a documented manual validation step.

Governance and Architecture Discipline

Architecture changes follow an issue-driven stream (ARCH-*) with mandatory documentation updates.

  • Tracker: docs/architecture/ARCH-TRACKER.md
  • Guardrails: docs/architecture/ARCH-GUARDRAILS.md
  • Roadmap: docs/architecture/IMPROVEMENT-ROADMAP.md
  • Health check: docs/governance/architecture-health-check.md
  • Security operations: docs/governance/security-operations.md
  • Contribution guideline: CONTRIBUTING.md
  • PR checklist: .github/pull_request_template.md

Accepted ADRs

  • ADR-005 Domain vs Application boundary
  • ADR-006 Schema-first validation with Zod
  • ADR-007 Timezone-safe datetime policy (Luxon)
  • ADR-008 Standard error model and centralised API mapping
  • ADR-009 Generic idempotency executor
  • ADR-010 i18n foundation (en-US baseline)
  • ADR-011 Idempotency state machine and concurrency behaviour
  • ADR-012 Classes vs functions guideline
  • ADR-013 Async idempotent invoice rollout
  • ADR-014 Durable invoice async infrastructure strategy
  • ADR-015 Invoice async operational readiness
  • ADR-016 Schema-first contracts, unified time policy, and boundary deduplication polish

Current Trade-offs and Next Steps

  • Deterministic in-memory adapters are still used in selected paths for simplicity and fast local feedback.
  • Durable Postgres-backed behaviour is already modelled and validated for the infrastructure paths that matter most.
  • The next architectural move should be driven by a concrete structural risk, not change for change's sake.

Project Links

Acknowledgements

Special thanks to Marcos Pont, for his support, technical guidance, and consistent feedback throughout this project. His mentorship, engineering judgement, and practical perspective were instrumental in challenging assumptions, sharpening architectural decisions, and raising the overall technical standard of this implementation.

References

About

Multi-tenant SaaS billing platform with schema-first contracts, idempotent write flows, webhook replay handling, and asynchronous invoice processing.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Contributors