Skip to content

Releases: cachevector/hashprep

v0.1.0b1 - Beta Release

09 Feb 15:22
69798e2

Choose a tag to compare

Pre-release

HashPrep v0.1.0b1 - Beta Release

This release marks HashPrep's graduation from alpha to beta status.

What's New

HashPrep is now feature-complete and ready for broader community testing. Core features are stable and the API is mature enough for real-world ML workflows.

Highlights

  • 82 passing tests with comprehensive coverage across all features
  • Stable APIs for both CLI and library usage
  • Complete documentation with installation and usage guides
  • Multiple report formats (HTML, PDF, Markdown, JSON)
  • Production-ready code generation (fix scripts and sklearn pipelines)

Installation

pip install hashprep

Key Features

  • Intelligent dataset profiling with ML-specific checks
  • Automated data quality issue detection
  • Context-aware preprocessing suggestions
  • Rich report generation with modern themes
  • Reproducible pipeline code generation

Documentation

See the README for complete usage instructions.

What Beta Means

  • Core features are stable and tested
  • APIs should remain stable (breaking changes will trigger major version bump)
  • Ready for community testing and feedback
  • Minor bugs and edge cases may still exist

We encourage users to test HashPrep in their ML workflows and report any issues on GitHub.

v0.1.0a1

02 Oct 19:31

Choose a tag to compare

v0.1.0a1 Pre-release
Pre-release

Improved correlation checks and reduced false positives in missing patterns

Improvements

  • Refined correlation checks in calculate_correlations
    • Fixed type inference errors by iterating over analyzer.column_types instead of analyzer.df
    • Updated mixed-variable thresholds to {'warning': 0.5, 'critical': 0.8} for consistency with Cramer’s V
    • Ensured seamless integration with run_checks
  • Reduced over-flagging in missing patterns detection
    • Introduced effect size thresholds:
      • Categorical: Cramer’s V > 0.1
      • Numeric: Cohen’s d > 0.2
    • Tightened p-value threshold to 0.01
    • Increased minimum samples per group to 10
    • Replaced ANOVA (f_oneway) with Mann-Whitney U test for better handling of skewed distributions
    • Added pattern grouping to summarize correlations per missing column (top 3 shown for conciseness)

Fixes

Corrected correlation dictionary iteration (analyzer.column_types)
Prevented spurious warnings by filtering weak associations

v0.1.0a0

27 Sep 19:24
91b2ae2

Choose a tag to compare

v0.1.0a0 Pre-release
Pre-release

First alpha release of HashPrep