evalution
Here are 10 public repositories matching this topic...
Home of the SanityHarness Leaderboard website.
-
Updated
Feb 28, 2026 - HTML
The official repository for the dataset FactualBench, which is introduced in paper "Exploring the Generalizability of Factual Hallucination Mitigation via Enhancing Precise Knowledge Utilization".
-
Updated
Dec 30, 2025 - Python
A Flask-Celery integration template for distributed task processing with progress tracking and use MMMLU to evaluation example.
-
Updated
Nov 25, 2024 - Python
A Python toolkit for the robust evaluation and analysis of seizure detection models.
-
Updated
Jun 24, 2025 - Python
Multi-class classification of news articles using NLP techniques, TF-IDF, and Naive Bayes
-
Updated
Jul 10, 2025 - Jupyter Notebook
R package for multiplicity-adjusted bootstrap tilting confidence limits for prediction performance after model selection
-
Updated
Mar 9, 2026 - R
Automated fish species classification using deep learning (CNN + transfer learning). Includes model training, evaluation metrics, and a Streamlit app for real-time predictions.
-
Updated
Nov 25, 2025 - Jupyter Notebook
Improve this page
Add a description, image, and links to the evalution topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the evalution topic, visit your repo's landing page and select "manage topics."