Skip to content

Tiered synthetic data generation #36

@jc-macdonald

Description

@jc-macdonald

Tiered synthetic data generation

Implement the tiered data generation framework from Theory_1_Code.ipynb for generating synthetic observational data at different information levels.

Tier structure

  • Tier 1 (scalar signal): Aggregate time series (e.g., total biomass, total cases). Lowest information.
  • Tier 2 (binned snapshot): Histogram or binned distribution at selected time points.
  • Tier 3 (time-resolved structure): Full distribution snapshots at multiple time points.
  • Tier 4a/4b (full observability): Complete state trajectory with/without noise.

Implementation

  • construct_scalar_signal(trajectory, grid, noise_model) → Tier 1 data
  • construct_binned_snapshot(trajectory, grid, bins, times, noise_model) → Tier 2 data
  • construct_time_resolved_structure(trajectory, grid, times, noise_model) → Tier 3 data
  • construct_full_observability(trajectory, grid, noise_model) → Tier 4 data
  • generate_tiered_data(trajectory, grid, tier_config) → orchestrator returning all tiers
  • Noise models: Gaussian additive, Poisson (count data), negative binomial, dropout/missingness

Source code reference

  • Thoery_1_Code.ipynb: construct_scalar_signal, construct_binned_snapshot, construct_time_resolved_structure, generate_tiered_data
  • Chemostat_Simulator.ipynb: simulate_biomass_curves (OD600), simulate_flow_cytometry_histograms, simulate_cfu_counts, simulate_microscopy_bins

Tests

  • Tier 1 recoverable from Tier 2+ by aggregation
  • Noise models produce expected variance
  • Tier hierarchy: Tier k+1 data strictly more informative than Tier k

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions