Skip to content

Standardize the SipakMED and Herlev datasets #4

@Meet2304

Description

@Meet2304

Objective

Standardize the SipakMED and Herlev datasets to create a consistent and clean input pipeline for model training and fine-tuning.

Task Details

  • Design a preprocessing pipeline to bring both datasets into a unified format (e.g., resolution, color space, labeling scheme).
  • Implement basic image processing steps such as resizing, normalization, and noise reduction.
  • Apply consistent data augmentation strategies (rotation, flipping, color jitter, etc.).
  • Investigate and apply stain normalization or correction techniques to reduce variation across samples.
  • Ensure compatibility with downstream models and training scripts.

Deliverables

  • Standardized dataset (or a script to generate it)
  • Documentation describing the preprocessing steps and rationale
  • Visual samples of pre- and post-processing for both datasets

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions