From eba033bec840f15fa40575bfd2c299fe48e5e40f Mon Sep 17 00:00:00 2001 From: MarcYin Date: Thu, 2 Apr 2026 22:16:54 +0100 Subject: [PATCH] docs: update documentation for codebase revision Add geo/resample.py to package tree and architecture docs, expand glossary with solver terms (DCT, multi-grid, RAA, RTModelBackend, band_weight_power), document cost function and multi-grid strategy in theory.md, annotate solver config fields, and add RTModelBackend protocol and resampling utilities to the developer guide. Co-Authored-By: Claude Opus 4.6 --- docs/architecture/data-flow.md | 4 ++-- docs/architecture/package-map.md | 2 +- docs/code_structure.md | 8 +++++++- docs/concepts/glossary.md | 20 ++++++++++++++++++++ docs/developer/extending-siac.md | 26 ++++++++++++++++++++++++++ docs/reference/configuration.md | 22 +++++++++++----------- docs/science/paper-companion.md | 1 + docs/science/priors-and-mapping.md | 23 +++++++++++++++++++++++ docs/science/theory.md | 23 +++++++++++++++++++++++ 9 files changed, 114 insertions(+), 15 deletions(-) diff --git a/docs/architecture/data-flow.md b/docs/architecture/data-flow.md index 290f28b..2e90c27 100644 --- a/docs/architecture/data-flow.md +++ b/docs/architecture/data-flow.md @@ -78,9 +78,9 @@ This is why `process_s2(...)` exists separately from `process_scene(...)`. | M1 preprocessing | native sensor resolution | `ObservationBundle` | Captures TOA bands, geometry, cloud mask, and sensor metadata. | | M2 atmospheric prior | solver resolution | `AtmosphericState` | Prior data are aligned to the retrieval grid, not necessarily the output grid. | | M3 surface prior | mixed; provider dependent | `SurfacePrior` | Surface priors may depend on spectral mapping, BRDF routes, and optional atmospheric priors. | -| M4 grid assembly | solver grid | `SolverInputBundle` | Normalizes observation and prior data into the solver's working space. | +| M4 grid assembly | solver grid | `SolverInputBundle` | Normalizes observation and prior data into the solver's working space. Resampling is handled by the canonical functions in `geo.resample`. | | M5 solver | solver grid | `SolvedAtmosphere` | Produces solved atmospheric variables and uncertainty estimates. | -| M6 correction | output grid | `CorrectionResult` | Uses solved atmosphere to compute BOA outputs and auxiliary layers. | +| M6 correction | output grid | `CorrectionResult` | Uses solved atmosphere to compute BOA outputs and auxiliary layers. Atmospheric fields are resampled to the correction grid via `resample_field_for_correction` from `geo.resample`. | ## Persistence Flow diff --git a/docs/architecture/package-map.md b/docs/architecture/package-map.md index cf5f226..3d98fb6 100644 --- a/docs/architecture/package-map.md +++ b/docs/architecture/package-map.md @@ -32,7 +32,7 @@ flowchart TD | `runtime` | Xarray-backed payload types and validators | Config loading, remote access | Provides the typed contracts passed between workflow stages. | | `domain` | Pure types and protocols such as AOI and sensor definitions | Xarray runtime payloads, auth, config, I/O | Prefer this layer for reusable concepts with minimal side effects. | | `storage` | Raster/product writers, STAC helpers, readers | Retrieval logic and request parsing | Output and artifact persistence lives here. | -| `geo` | Geometry helpers and reprojection utilities | Product search or config resolution | Shared geospatial utilities used by multiple layers. | +| `geo` | Geometry helpers, reprojection utilities, and canonical grid-resampling functions (`resample.py`) | Product search or config resolution | `geo.resample` provides `resample_field_to_template`, `resample_mask_to_template`, `resample_coefficients_to_template`, and gap-fill helpers used by `algorithms.grid`, `algorithms.correction`, and `workflows.pipeline`. | | `catalog` | Bundled sensor catalog and static lookup data | Runtime mutation and remote fetching | Static data only. | | `src/siac_rs` | Native implementations for kernels, PSF, emulator, optimization, Whittaker smoothing | High-level orchestration and public API | Optional acceleration boundary behind Python interfaces. | diff --git a/docs/code_structure.md b/docs/code_structure.md index 15d9e16..83acf9b 100644 --- a/docs/code_structure.md +++ b/docs/code_structure.md @@ -81,7 +81,10 @@ siac/ │ ├── scene.py # Generic scene execution and output dispatch │ └── sentinel2.py # Sentinel-2 processing workflow only │ -├── geo/ # Geometry and reprojection utilities +├── geo/ # Geometry, reprojection, and resampling utilities +│ ├── __init__.py +│ └── resample.py # Canonical grid-resampling functions used across pipeline stages +│ └── storage/ # Raster/product read-write helpers ``` @@ -97,6 +100,9 @@ siac/ - `adapters/` owns external-system integration concerns. - `algorithms/` owns retrieval math, surface priors, correction, cloud masking, RT backends, and grid prep. +- `geo/` owns shared geospatial utilities including the canonical resampling + functions (`resample.py`) used by the grid assembler, solver, and corrector + to move fields between spatial grids. - `app/` resolves configuration into concrete runtime components. - `app/` also owns request coercion for Sentinel-2 search/resolve paths. - `workflows/` executes end-to-end processing plans. diff --git a/docs/concepts/glossary.md b/docs/concepts/glossary.md index ab23f18..9167eb6 100644 --- a/docs/concepts/glossary.md +++ b/docs/concepts/glossary.md @@ -54,3 +54,23 @@ **CorrectionResult** : The final typed result containing BOA outputs, auxiliary fields, and diagnostics. + +## Solver and algorithm terms + +**Multi-grid solver** +: The default solver strategy in SIAC. Retrieval starts on a coarse grid and progressively refines to the target aerosol resolution. Each level is solved with L-BFGS-B optimization. See `algorithms.solver.multigrid`. + +**DCT regularization** +: Discrete Cosine Transform–based smoothness penalty applied to the atmospheric state during retrieval. Encourages spatially smooth AOT and TCWV fields while preserving large-scale gradients. See the smoothness term in `algorithms.solver.cost`. + +**Band weight power (alpha)** +: Exponent controlling wavelength-dependent weighting in the observation cost. Configured as `algorithms.solver.alpha` (default `-1.6`). Higher absolute values give more weight to shorter wavelengths. Propagated through `MultiGridConfig.band_weight_power`. + +**RAA** +: Relative Azimuth Angle, computed as `|VAA − SAA| mod 2π`. Used by RT models and BRDF kernels to parameterize the scattering geometry. + +**RTModelBackend** +: The structural protocol (`domain.protocols.RTModelBackend`) that all radiative transfer backends must satisfy. Declares `simulate_toa`, `compute_coefficients`, and `supported_parameters`. Backends include the emulator, LUT, and Py6S adapters. + +**SolverInputBundle** +: The typed payload consumed by the solver, containing resampled TOA, geometry, cloud mask, priors, RT model, and band metadata assembled onto the solver grid. diff --git a/docs/developer/extending-siac.md b/docs/developer/extending-siac.md index 9f1f58a..19e2522 100644 --- a/docs/developer/extending-siac.md +++ b/docs/developer/extending-siac.md @@ -56,6 +56,32 @@ Use the existing registries as the standard pattern: If the new feature is configurable, also extend the schema in `python/siac/config/schema.py` so it can be selected through `SIACConfig`. +## RTModelBackend Protocol + +Radiative transfer backends must satisfy the `RTModelBackend` structural +protocol defined in `python/siac/domain/protocols.py`. The protocol declares: + +- `simulate_toa(geometry, atmo_state, surface, band)` — forward model +- `compute_coefficients(geometry, atmo_state, bands)` — linearized RT coefficients +- `supported_parameters` — parameter names the backend can vary + +Existing backends (`emulator`, `lut`, `py6s`) all implement this protocol. +When adding a new RT backend, ensure it satisfies `RTModelBackend` and passes +an `isinstance` check at solver and corrector initialization. + +## Resampling Utilities + +All grid resampling between pipeline stages uses the canonical functions in +`python/siac/geo/resample.py`: + +- `resample_field_to_template(field, template)` — bilinear resampling with NaN gap-fill +- `resample_mask_to_template(mask, template)` — conservative boolean mask resampling with dilation +- `resample_coefficients_to_template(coeffs, template)` — resamples all `RTCoefficients` fields +- `resample_field_for_correction(field, template)` — resample + guarantee finiteness for correction stage + +If you add a new pipeline stage that operates on a different spatial grid, +use these functions rather than implementing ad-hoc resampling. + ## Runtime Contract Expectations New components must fit the payload contracts already used by the workflow: diff --git a/docs/reference/configuration.md b/docs/reference/configuration.md index f07a9ef..f81fd87 100644 --- a/docs/reference/configuration.md +++ b/docs/reference/configuration.md @@ -154,17 +154,17 @@ Supported backends: Main fields: -- `aot_gamma` -- `tcwv_gamma` -- `alpha` -- `max_iterations` -- `gtol` -- `ftol` -- `aerosol_resolution` -- `use_multigrid` -- `min_grid_size` -- `bounds.aot` -- `bounds.tcwv` +- `aot_gamma` — regularization strength for AOT prior term (default `0.05`) +- `tcwv_gamma` — regularization strength for TCWV prior term (default `0.05`) +- `alpha` — band weight power exponent for the observation cost (default `-1.6`). Controls wavelength-dependent weighting: bands are weighted proportionally to $\lambda^{\alpha}$, so negative values give more weight to shorter (aerosol-sensitive) wavelengths. Propagated internally as `MultiGridConfig.band_weight_power` and `CostFunctionConfig.band_weight_power`. +- `max_iterations` — maximum L-BFGS-B iterations per grid level +- `gtol` — gradient tolerance for L-BFGS-B convergence +- `ftol` — function tolerance for L-BFGS-B convergence +- `aerosol_resolution` — target spatial resolution (metres) for the aerosol retrieval grid +- `use_multigrid` — enable the coarse-to-fine multi-grid solver strategy (default `true`) +- `min_grid_size` — minimum grid dimension (pixels) for multi-grid levels +- `bounds.aot` — `[min, max]` bounds for AOT during optimization +- `bounds.tcwv` — `[min, max]` bounds for TCWV during optimization ### `algorithms.cloud_mask` diff --git a/docs/science/paper-companion.md b/docs/science/paper-companion.md index 3364a32..9a234b7 100644 --- a/docs/science/paper-companion.md +++ b/docs/science/paper-companion.md @@ -23,6 +23,7 @@ Reference paper: | spectral mapping | `python/siac/algorithms/surface/spectral_mapping.py` | | spatial mapping / ePSF | `python/siac/algorithms/surface/swir_refine.py`, `src/siac_rs/src/psf.rs` | | radiative transfer approximation | `python/siac/adapters/rt.py`, `python/siac/algorithms/rt/`, `src/siac_rs/src/emulator.rs` | +| grid resampling between stages | `python/siac/geo/resample.py` | | implementation details | `python/siac/app/`, `python/siac/workflows/`, `python/siac/runtime/` | | uncertainty handling | `python/siac/runtime/models.py`, solver/correction paths | diff --git a/docs/science/priors-and-mapping.md b/docs/science/priors-and-mapping.md index 755f718..0c2cddb 100644 --- a/docs/science/priors-and-mapping.md +++ b/docs/science/priors-and-mapping.md @@ -77,6 +77,29 @@ Main implementation area: - `python/siac/algorithms/cloud/mask.py` +## Grid resampling between stages + +Moving data between spatial grids (native sensor resolution → solver grid → +correction grid) is a recurring concern across the pipeline. All resampling is +handled by the canonical functions in `python/siac/geo/resample.py`: + +- `resample_field_to_template` — bilinear interpolation for continuous fields + (atmospheric state, geometry angles), with coordinate-based interpolation + when possible and `scipy.ndimage.zoom` as fallback. NaN gap-fill is applied + by default. +- `resample_mask_to_template` — conservative resampling for boolean masks + (cloud, shadow), applying a maximum-filter dilation before interpolation to + avoid losing masked pixels when downsampling. +- `resample_coefficients_to_template` — resamples all fields of an + `RTCoefficients` bundle, handling both 2-D and 3-D (with `param` dimension) + arrays. +- `resample_field_for_correction` — combines resampling with a finiteness + guarantee (nearest-neighbour fill then source-mean fallback) for the + correction stage. + +These functions are used by the grid assembler (M4), the solver (M5), and the +atmospheric corrector (M6). + ## M1-M6 stage mapping | Stage | Prior or mapping role | diff --git a/docs/science/theory.md b/docs/science/theory.md index 0c5693f..84889c9 100644 --- a/docs/science/theory.md +++ b/docs/science/theory.md @@ -59,6 +59,29 @@ In the current repository this logic appears across: - `python/siac/algorithms/solver/multigrid.py` - `python/siac/algorithms/correction/atmospheric.py` +### Cost function detail + +The cost function in `algorithms/solver/cost.py` combines the three terms into a single scalar minimized by L-BFGS-B: + +$$J = J_\text{obs} + J_\text{prior} + J_\text{smooth}$$ + +**Observation term** — For each solver band, the forward model predicts TOA reflectance from the current atmospheric state and surface prior via `RTModelBackend.simulate_toa`. The residual is weighted by band-dependent weights proportional to $\lambda^{\alpha}$ where $\alpha$ is the `band_weight_power` parameter (default −1.6, configured as `algorithms.solver.alpha`). This gives shorter wavelengths — more sensitive to aerosol — higher influence. + +**Prior term** — Penalizes departure of AOT and TCWV from their prior values, weighted by prior uncertainty. Regularization strengths `aot_gamma` and `tcwv_gamma` scale the prior cost. + +**Smoothness term** — A DCT-based (Discrete Cosine Transform) spatial regularization. Rather than constructing and inverting a full Laplacian matrix, the implementation builds a sparse Laplacian with boundary-aware diagonal, multiplies in DCT space, and penalizes high-frequency structure in the AOT and TCWV fields. This encourages smooth atmospheric fields without requiring expensive matrix operations. + +### Multi-grid solver strategy + +The default solver (`algorithms/solver/multigrid.py`) uses a coarse-to-fine multi-grid approach: + +1. Start at a coarse grid (e.g. 4× the target aerosol resolution). +2. Solve for AOT and TCWV using L-BFGS-B at this resolution. +3. Interpolate the solution to the next finer grid level. +4. Repeat until the target `aerosol_resolution` is reached. + +Each level uses the previous level's solution as the initial guess, improving convergence speed and avoiding local minima. The `MultiGridConfig` controls bounds, regularization strengths, and the `band_weight_power`. + ## Why coarse retrieval comes before full-resolution correction The paper emphasizes retrieving atmospheric parameters at a coarser resolution than the native scene bands. The same architectural idea is visible in the code: