A font classification system that identifies 394 font variants across 32 families from rendered text images, using LoRA fine-tuning of DINOv2.
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtgit clone --filter=blob:none --depth 1 https://github.com/google/fonts.gitpython dataset_generator.py \
--font_dir <path to google fonts> \
--out_dir <output folder> \
--img_size 224 \
--font_size 1024 \
--padding 128Uses all CPU cores by default (--workers N to override). Generates ~575 training images and 40 test images per font variant with randomized colors, alignment, line wrapping, and Gaussian noise.
python dataset_cleaner.py <dataset folder>Prints any corrupted image paths for manual inspection.
pip install -U "huggingface_hub[cli]"
huggingface-cli upload-large-folder <user>/<repo> <dataset folder> --repo-type=datasetFor large datasets (200k+ files), tar the train/test folders first to avoid API rate limits:
tar cf train.tar -C <dataset folder> train/
tar cf test.tar -C <dataset folder> test/
HF_HUB_DISABLE_XET=1 huggingface-cli upload <user>/<repo> train.tar train.tar --repo-type=dataset
HF_HUB_DISABLE_XET=1 huggingface-cli upload <user>/<repo> test.tar test.tar --repo-type=datasetLoRA (default, recommended):
python train_model.py \
--data_dir <dataset folder> \
--output_dir <output folder> \
--batch_size 32 \
--epochs 100 \
--learning_rate 1e-4 \
--lora_rank 8 \
--lora_alpha 16 \
--lora_dropout 0.1Baseline comparisons:
# Full fine-tuning (all 87.2M params)
python train_model.py --full_finetune --data_dir <data> --output_dir <out> --epochs 100
# Linear probe (classifier head only, 606K params)
python train_model.py --linear_probe --data_dir <data> --output_dir <out> --epochs 20
# CNN baseline (ResNet-50)
python train_model.py --resnet_baseline --data_dir <data> --output_dir <out> --epochs 100python train_model.py \
--checkpoint <output folder>/checkpoint-2752 \
--data_dir <dataset folder> \
--output_dir <output folder> \
--epochs 100python train_model.py \
--epochs 0 \
--data_dir <dataset folder> \
--checkpoint <output folder>/checkpoint-2752 \
--huggingface_model_name <user>/<repo>python serve_model.py <model name or path> <image path>For GPU training on cloud instances (Vast.ai, RunPod, etc.):
# Run all baselines (linear probe, ResNet-50, LoRA r=4/8/16, full FT)
bash cloud_train.sh --hf_dataset dchen0/font_crops_v5 --mode all
# Or individually
bash cloud_train.sh --hf_dataset dchen0/font_crops_v5 --mode lora
bash cloud_train.sh --hf_dataset dchen0/font_crops_v5 --mode full
bash cloud_train.sh --hf_dataset dchen0/font_crops_v5 --mode resnetpython confusion_matrix.py \
--data_dir <dataset folder> \
--model <HuggingFace model name or local path>The model's label set must match the dataset's class folders. The script will check label overlap and abort if there's a mismatch.
Produces:
figures/confusion_matrix.pdf— Row-normalized heatmap grouped by font familyfigures/top_confused_pairs.pdf— Bar chart of most frequent misclassificationsfigures/per_family_accuracy.pdf— Per-family accuracy breakdownfigures/tsne_embeddings.pdf— t-SNE of [CLS] embeddingsfigures/font_dendrogram.pdf— UPGMA clustering of font familiesfigures/metrics.tex— LaTeX macros for paperconfusion_matrix.json— Raw countsbad_images.json— All misclassified images
# Full build (evaluation + LaTeX)
bash build_paper.sh --data_dir <dataset folder> --model <model>
# LaTeX only (skip evaluation)
bash build_paper.sh --skip-matrixhandler.py implements the preprocessing pipeline (pad-to-square + resize + normalize) used at both training and inference time. It's bundled with the model on HuggingFace for Inference Endpoints.