Releases: cachevector/hashprep
Releases · cachevector/hashprep
v0.1.0b1 - Beta Release
HashPrep v0.1.0b1 - Beta Release
This release marks HashPrep's graduation from alpha to beta status.
What's New
HashPrep is now feature-complete and ready for broader community testing. Core features are stable and the API is mature enough for real-world ML workflows.
Highlights
- 82 passing tests with comprehensive coverage across all features
- Stable APIs for both CLI and library usage
- Complete documentation with installation and usage guides
- Multiple report formats (HTML, PDF, Markdown, JSON)
- Production-ready code generation (fix scripts and sklearn pipelines)
Installation
pip install hashprepKey Features
- Intelligent dataset profiling with ML-specific checks
- Automated data quality issue detection
- Context-aware preprocessing suggestions
- Rich report generation with modern themes
- Reproducible pipeline code generation
Documentation
See the README for complete usage instructions.
What Beta Means
- Core features are stable and tested
- APIs should remain stable (breaking changes will trigger major version bump)
- Ready for community testing and feedback
- Minor bugs and edge cases may still exist
We encourage users to test HashPrep in their ML workflows and report any issues on GitHub.
v0.1.0a1
Improved correlation checks and reduced false positives in missing patterns
Improvements
- Refined correlation checks in
calculate_correlations- Fixed type inference errors by iterating over
analyzer.column_typesinstead ofanalyzer.df - Updated mixed-variable thresholds to
{'warning': 0.5, 'critical': 0.8}for consistency with Cramer’s V - Ensured seamless integration with
run_checks
- Fixed type inference errors by iterating over
- Reduced over-flagging in missing patterns detection
- Introduced effect size thresholds:
- Categorical: Cramer’s V > 0.1
- Numeric: Cohen’s d > 0.2
- Tightened p-value threshold to 0.01
- Increased minimum samples per group to 10
- Replaced ANOVA (
f_oneway) with Mann-Whitney U test for better handling of skewed distributions - Added pattern grouping to summarize correlations per missing column (top 3 shown for conciseness)
- Introduced effect size thresholds:
Fixes
Corrected correlation dictionary iteration (analyzer.column_types)
Prevented spurious warnings by filtering weak associations
v0.1.0a0
First alpha release of HashPrep