This repository implements a structured protocol for evaluating the robustness of computational pathology (CPath) models to hematoxylin and eosin (H&E) staining variation, as described in our paper Link to Paper.
It enables:
- Definition of realistic reference staining conditions
- Extraction of slide-level staining properties
- Controlled simulation of staining variation during model application
- Quantification of robustness via performance variability
The protocol is demonstrated using MSI classification in colorectal cancer, but is designed to be reusable for other tasks and datasets.
The project is split into this code repository and a HuggingFace repository HuggingFace providing the precomputed results. You can reproduce everything or selectively reuse precomputed artifacts:
| Work Package | Code (This Repo) | Precomputed Results (HuggingFace) |
|---|---|---|
| PLISM stain characterization | stain_vector_concentration_extraction/compute_stats.py, stain_vector_concentration_extraction/unmix_tiles.py |
plism-wsi_stain_references |
| SurGen stain characterization | stain_vector_concentration_extraction/unmix_wsi_v1.py |
surgen_stain_properties |
| Sample ABMIL train hyperparameters | controlled_staining_simulations/simulation_settings.ipynb |
MSI_classification_models/fixed_splits_n=300, MSI_classification_models/fixed_simulation_hps_n=300.csv |
| ABMIL training (n=300 models) | controlled_staining_simulations/train_abmil.py |
MSI_classification_models/trained_models |
| Extract features under simulated reference staining conditions | controlled_staining_simulations/extract_features.py |
Not provided, follow steps in GitHub. |
| Apply models on extracted features | controlled_staining_simulations/apply_simulated_models.py, controlled_staining_simulations/apply_public_models.py |
exp_results |
| Evaluate results | controlled_staining_simulations/evaluate_results.ipynb |
See paper for results. |
The layout of this repository mirrors the protocol steps 1-3 (see Figure above).
Implements Step 1: Staining reference selection and Step 2: Test set staining characterization.
Estimates both H&E stain vectors and intensities by Macenko-based unmixing on the:
- Tile-level for PLISM dataset More information on PLISM
- Slide-level for SurGen or any other WSI-based dataset
Use this if you want to:
- Build your own stain reference library
- Extract stain statistics from a new dataset
Implements Step 3: Apply CPath model under simulated reference conditions, as well as the ABMIL-based training of the MSI classification models.
Provides code for:
- ABMIL-based training of MSI models
- Feature extraction under simulated staining conditions
- Apply public and self-trained models on features from simulated staining conditions
- Evaluation notebooks
Use this if you want to:
- Replicate our experiments (ABMIL-based training, feature extraction and model application)
- Run controlled staining simulations on your own models
Model wrappers and checkpoint storage.
Downloaded model weights must be placed here manually.
Shared utilities:
- Tile handling
- GPU monitoring
- Model loading
- Training helpers
Used for dataset splits and evaluation in the manuscript.
SURGEN.csvtcga_coadread.csv
- Python ≥ 3.8
- PyTorch (GPU recommended)
Install core dependencies:
pip install torch torchvision numpy scikit-image openslide-python opencv-python-headless timm huggingface-hub einops pillow scipy shapely pandas matplotlib seaborn scikit-learnMove to the code directory:
cd /YOUR/PATH/staining-robustness-evaluation-
Select reference staining conditions
- Option 1: Select references from our PLISM-based H&E stain vectors and intensity library Hugging Face Repository
- Option 2: Create your own references based on your own WSIs (Script: unmix_wsi_v1.py) or tiles (Script: unmix_tiles.py).
-
Characterize test set staining properties
Configure and run unmix_wsi_v1.py, check the unmixing logs to verify suitable tiles are selected, if the selected tiles are problematic, please adjust the threshold parameters in config.py
python -m stain_vector_concentration_extraction.unmix_wsi_v1For detailed steps check: stain_vector_concentration_extraction
- Apply CPath models under simulated reference staining conditions
First, extract the features (utilizing your selected foundation model encoder) under the simulated staining condition, which is applied for each tile before feature extraction.
python -m controlled_staining_simulations.extract_features --helpSecond, apply your trained aggregator model on the extracted features:
python -m controlled_staining_simulations.apply_public_models --helpFor detailed steps check controlled_staining_simulations.
- Evaluate robustness You can build on our Jupyter Notebook for result evaluation, or implement your own custom logic for evaluating the results:
controlled_staining_simulations/evaluate_results.ipynb
- Download pretrained models (see section below) and place checkpoints under
models/ - Run feature extraction
- Run model application scripts
- Use the evaluation notebook to reproduce AUC and robustness metrics
For detailed steps check controlled_staining_simulations.
Create subfolders under models/ and place checkpoints there.
Example structure:
models/NIEHEUS2023/export-0.pth
models/WAGNER2023/MSI_high_CRC_model.pth
models/CTRANSPATH/ctranspath.pth
...
Foundation models:
- UNI2-h: HuggingFace
- HOptimus1: HuggingFace
- Virchow2: HuggingFace
- CTransPath: GitHub
- RetCCL: GitHub
Public MSI models:
- NIEHEUS2023: HuggingFace, Original Repo, Paper
- WAGNER2023: HuggingFace, Original Repo, Paper
Data and pretrained models: Pretrained models, stain reference libraries, and extracted stain statistics are available on the accompanying Hugging Face Repository
- Performance: AUC under the reference staining condition
- Robustness: Min–max AUC range across all simulated staining conditions
Bootstrapped confidence intervals (n=1000) are computed in the evaluation notebook.
Resources available on Hugging Face:
- Stain reference library (PLISM)
- Slide-level stain vectors (SurGen)
- Trained ABMIL models
- Hyperparameter configurations
👉 Hugging Face: Hugging Face Repository
If you use this repository, please cite: A protocol for evaluating robustness to H&E staining variation in computational pathology models
@misc{schönpflug2026protocolevaluatingrobustnesshe,
title={A protocol for evaluating robustness to H&E staining variation in computational pathology models},
author={Lydia A. Schönpflug and Nikki van den Berg and Sonali Andani and Nanda Horeweg and Jurriaan Barkey Wolf and Tjalling Bosse and Viktor H. Koelzer and Maxime W. Lafarge},
year={2026},
eprint={2603.12886},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.12886},
}
This repository utilizes and builds on:
Datasets:
- PLISM dataset: Ochi, M., Komura, D., Onoyama, T. et al. Registered multi-device/staining histology image dataset for domain-agnostic machine learning models. Sci Data 11, 330 (2024). Link
- SurGen dataset: Myles C., Um, I.H., Marshall, C. et al. 1020 H&E-stained whole-slide images with survival and genetic markers. GigaScience, Volume 14 (2025). Link
- TCGA COADREAD: WSIs: GDC Portal, MSI Status from CBioportal: TCGA COADREAD Pan-cancer Atlas (2018), TCGA COADREAD Nature (2012). The results shown here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.
Foundation models:
- UNI2-h: Chen, R.J., Ding, T., Lu, M.Y., Williamson, D.F.K., et al. Towards a general-purpose foundation model for computational pathology. Nat Med (2024). Paper, HuggingFace
- HOptimus1: HuggingFace
- Virchow2: Zimmermann, E., Vorontsov, E., Viret et al. Virchow2: Scaling self-supervised mixed magnification models in pathology (2024). Paper, HuggingFace
- CTransPath: Wang, X., Yang, S., Zhang et. al. Transformer-based unsupervised contrastive learning for histopathological image classification. Medical image analysis, 81, p.102559 (2022). Paper, GitHub
- RetCCL: Wang, X., Du, Y., Yang, S. et. al. RetCCL: Clustering-guided contrastive learning for whole-slide image retrieval. Medical image analysis, 83, 102645 (2023). Paper, GitHub
Public MSI models:
- Niehues, J. M., Quirke, P., West, N. P., et. al. Generalizable biomarker prediction from cancer pathology slides with self-supervised deep learning: A retrospective multi-centric study. Cell reports medicine, 4(4) (2023). Paper, HuggingFace, Original Repo
- Wagner, S. J., Reisenbüchler, D., West, N. P. et al. Transformer-based biomarker prediction from colorectal cancer histology: A large-scale multicentric study. Cancer cell, 41(9), 1650-1661 (2023). Paper, HuggingFace, Original Repo