Skip to content

CTPLab/staining-robustness-evaluation

Repository files navigation

A protocol for evaluating robustness to H&E staining variation in computational pathology models

This repository implements a structured protocol for evaluating the robustness of computational pathology (CPath) models to hematoxylin and eosin (H&E) staining variation, as described in our paper Link to Paper.

Figure 1-git

It enables:

  • Definition of realistic reference staining conditions
  • Extraction of slide-level staining properties
  • Controlled simulation of staining variation during model application
  • Quantification of robustness via performance variability

The protocol is demonstrated using MSI classification in colorectal cancer, but is designed to be reusable for other tasks and datasets.


Overview: How This Project Is Structured

The project is split into this code repository and a HuggingFace repository HuggingFace providing the precomputed results. You can reproduce everything or selectively reuse precomputed artifacts:

Work Package Code (This Repo) Precomputed Results (HuggingFace)
PLISM stain characterization stain_vector_concentration_extraction/compute_stats.py, stain_vector_concentration_extraction/unmix_tiles.py plism-wsi_stain_references
SurGen stain characterization stain_vector_concentration_extraction/unmix_wsi_v1.py surgen_stain_properties
Sample ABMIL train hyperparameters controlled_staining_simulations/simulation_settings.ipynb MSI_classification_models/fixed_splits_n=300, MSI_classification_models/fixed_simulation_hps_n=300.csv
ABMIL training (n=300 models) controlled_staining_simulations/train_abmil.py MSI_classification_models/trained_models
Extract features under simulated reference staining conditions controlled_staining_simulations/extract_features.py Not provided, follow steps in GitHub.
Apply models on extracted features controlled_staining_simulations/apply_simulated_models.py, controlled_staining_simulations/apply_public_models.py exp_results
Evaluate results controlled_staining_simulations/evaluate_results.ipynb See paper for results.

The layout of this repository mirrors the protocol steps 1-3 (see Figure above).

Implements Step 1: Staining reference selection and Step 2: Test set staining characterization.

Estimates both H&E stain vectors and intensities by Macenko-based unmixing on the:

Use this if you want to:

  • Build your own stain reference library
  • Extract stain statistics from a new dataset

Implements Step 3: Apply CPath model under simulated reference conditions, as well as the ABMIL-based training of the MSI classification models.

Provides code for:

  • ABMIL-based training of MSI models
  • Feature extraction under simulated staining conditions
  • Apply public and self-trained models on features from simulated staining conditions
  • Evaluation notebooks

Use this if you want to:

  • Replicate our experiments (ABMIL-based training, feature extraction and model application)
  • Run controlled staining simulations on your own models

Model wrappers and checkpoint storage.

Downloaded model weights must be placed here manually.

Shared utilities:

  • Tile handling
  • GPU monitoring
  • Model loading
  • Training helpers

Metadata Files

Used for dataset splits and evaluation in the manuscript.

  • SURGEN.csv
  • tcga_coadread.csv

Quick Start

1. Setup

  • Python ≥ 3.8
  • PyTorch (GPU recommended)

Install core dependencies:

pip install torch torchvision numpy scikit-image openslide-python opencv-python-headless timm huggingface-hub einops pillow scipy shapely pandas matplotlib seaborn scikit-learn

Move to the code directory:

cd /YOUR/PATH/staining-robustness-evaluation

2. Typical Workflows

A. Apply the protocol to your own dataset

  1. Select reference staining conditions

  2. Characterize test set staining properties

Configure and run unmix_wsi_v1.py, check the unmixing logs to verify suitable tiles are selected, if the selected tiles are problematic, please adjust the threshold parameters in config.py

python -m stain_vector_concentration_extraction.unmix_wsi_v1

For detailed steps check: stain_vector_concentration_extraction

  1. Apply CPath models under simulated reference staining conditions

First, extract the features (utilizing your selected foundation model encoder) under the simulated staining condition, which is applied for each tile before feature extraction.

python -m controlled_staining_simulations.extract_features --help

Second, apply your trained aggregator model on the extracted features:

python -m controlled_staining_simulations.apply_public_models --help

For detailed steps check controlled_staining_simulations.

  1. Evaluate robustness You can build on our Jupyter Notebook for result evaluation, or implement your own custom logic for evaluating the results:
controlled_staining_simulations/evaluate_results.ipynb

B. Reproduce experiments from the paper

  1. Download pretrained models (see section below) and place checkpoints under models/
  2. Run feature extraction
  3. Run model application scripts
  4. Use the evaluation notebook to reproduce AUC and robustness metrics

For detailed steps check controlled_staining_simulations.


Models: Download and Placement

Create subfolders under models/ and place checkpoints there.

Example structure:

models/NIEHEUS2023/export-0.pth
models/WAGNER2023/MSI_high_CRC_model.pth
models/CTRANSPATH/ctranspath.pth
...

Foundation models:

Public MSI models:

Data and pretrained models: Pretrained models, stain reference libraries, and extracted stain statistics are available on the accompanying Hugging Face Repository


Evaluation Metrics

  • Performance: AUC under the reference staining condition
  • Robustness: Min–max AUC range across all simulated staining conditions

Bootstrapped confidence intervals (n=1000) are computed in the evaluation notebook.


Reproducibility and Resources

Resources available on Hugging Face:

  • Stain reference library (PLISM)
  • Slide-level stain vectors (SurGen)
  • Trained ABMIL models
  • Hyperparameter configurations

👉 Hugging Face: Hugging Face Repository


Citation

If you use this repository, please cite: A protocol for evaluating robustness to H&E staining variation in computational pathology models

@misc{schönpflug2026protocolevaluatingrobustnesshe,
      title={A protocol for evaluating robustness to H&E staining variation in computational pathology models}, 
      author={Lydia A. Schönpflug and Nikki van den Berg and Sonali Andani and Nanda Horeweg and Jurriaan Barkey Wolf and Tjalling Bosse and Viktor H. Koelzer and Maxime W. Lafarge},
      year={2026},
      eprint={2603.12886},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.12886}, 
}

References

This repository utilizes and builds on:

Datasets:

  • PLISM dataset: Ochi, M., Komura, D., Onoyama, T. et al. Registered multi-device/staining histology image dataset for domain-agnostic machine learning models. Sci Data 11, 330 (2024). Link
  • SurGen dataset: Myles C., Um, I.H., Marshall, C. et al. 1020 H&E-stained whole-slide images with survival and genetic markers. GigaScience, Volume 14 (2025). Link
  • TCGA COADREAD: WSIs: GDC Portal, MSI Status from CBioportal: TCGA COADREAD Pan-cancer Atlas (2018), TCGA COADREAD Nature (2012). The results shown here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.

Foundation models:

  • UNI2-h: Chen, R.J., Ding, T., Lu, M.Y., Williamson, D.F.K., et al. Towards a general-purpose foundation model for computational pathology. Nat Med (2024). Paper, HuggingFace
  • HOptimus1: HuggingFace
  • Virchow2: Zimmermann, E., Vorontsov, E., Viret et al. Virchow2: Scaling self-supervised mixed magnification models in pathology (2024). Paper, HuggingFace
  • CTransPath: Wang, X., Yang, S., Zhang et. al. Transformer-based unsupervised contrastive learning for histopathological image classification. Medical image analysis, 81, p.102559 (2022). Paper, GitHub
  • RetCCL: Wang, X., Du, Y., Yang, S. et. al. RetCCL: Clustering-guided contrastive learning for whole-slide image retrieval. Medical image analysis, 83, 102645 (2023). Paper, GitHub

Public MSI models:

  • Niehues, J. M., Quirke, P., West, N. P., et. al. Generalizable biomarker prediction from cancer pathology slides with self-supervised deep learning: A retrospective multi-centric study. Cell reports medicine, 4(4) (2023). Paper, HuggingFace, Original Repo
  • Wagner, S. J., Reisenbüchler, D., West, N. P. et al. Transformer-based biomarker prediction from colorectal cancer histology: A large-scale multicentric study. Cancer cell, 41(9), 1650-1661 (2023). Paper, HuggingFace, Original Repo

About

Evaluation protocol for measuring computational pathology model robustness to H&E staining variation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors