Skip to content

Fast and lightweight pixel class counting for NumPy arrays, tensors, and GeoTIFF rasters.

License

Notifications You must be signed in to change notification settings

DPIRD-DMA/ClassCounter

Repository files navigation

PyPI Version Python Versions License

ClassCounter

Fast class counting for NumPy arrays, PyTorch tensors, and GeoTIFFs.

Installation

pip install classcounter

Optional backends:

pip install classcounter[torch]   # PyTorch support
pip install classcounter[geo]     # GeoTIFF support via rasterio
pip install classcounter[numba]   # Numba-accelerated backend
pip install classcounter[all]     # Everything

Requires Python 3.10+.

Usage

import numpy as np
from classcounter import count_classes

arr = np.random.default_rng(0).integers(0, 5, size=(100, 100), dtype=np.int32)

count_classes(arr)
# {0: 2017, 1: 1960, 2: 2050, 3: 1932, 4: 2041}

# Map class IDs to names
count_classes(arr, names={0: "water", 1: "forest", 2: "urban", 3: "crop", 4: "bare"})
# {'water': 2017, 'forest': 1960, 'urban': 2050, 'crop': 1932, 'bare': 2041}

# Get percentages instead of counts
count_classes(arr, percent=True)
# {0: 20.17, 1: 19.6, 2: 20.5, 3: 19.32, 4: 20.41}

Input arrays can be any shape — they are flattened internally. Negative integers and floats are supported via a np.unique fallback path.

GeoTIFF files

count_classes("land_cover.tif")

Requires the geo extra.

Saving results to GeoTIFF metadata

Write class counts back into the raster's GDAL metadata tags:

# One-liner: count and save in one step
count_classes("land_cover.tif", save_metadata=True)
# Writes tags: CLASS_COUNT_0=2017, CLASS_COUNT_1=1960, ...

# With percentages — automatically uses CLASS_PERCENT_ prefix
count_classes("land_cover.tif", percent=True, save_metadata=True)
# Writes tags: CLASS_PERCENT_0=20.17, CLASS_PERCENT_1=19.6, ...

# Custom prefix
count_classes("land_cover.tif", save_metadata=True, metadata_prefix="LAND_")
# Writes tags: LAND_0=2017, LAND_1=1960, ...

Or use the standalone function for more control:

from classcounter import save_counts_to_raster

counts = count_classes("land_cover.tif", names={0: "water", 1: "forest"})
save_counts_to_raster("land_cover.tif", counts)
# Writes tags: CLASS_COUNT_water=2017, CLASS_COUNT_forest=1960

Stale tags from previous runs are automatically cleared before writing.

PyTorch tensors

import torch

tensor = torch.randint(0, 5, (100, 100))
count_classes(tensor)

tensor = tensor.to("cuda")  # GPU — counting happens on-device
count_classes(tensor)

Backend selection

The backend is chosen automatically based on the input type:

  • NumPy arrays → Numba (if installed), otherwise NumPy
  • PyTorch tensors → PyTorch (runs on-device, including CUDA)
  • File paths → loaded via rasterio, then counted with Numba/NumPy

API

count_classes(data, names=None, percent=False, save_metadata=False, metadata_prefix=None)

Parameter Type Description
data ndarray, Tensor, str, or Path Input array, tensor, or path to a raster file
names dict[int, str] or None Optional mapping of class IDs to human-readable names
percent bool Return percentages (0–100) instead of raw counts. Default False
save_metadata bool Write results as GDAL tags in the source GeoTIFF. Only valid when data is a file path. Default False
metadata_prefix str or None Custom tag prefix. Defaults to CLASS_COUNT_ or CLASS_PERCENT_ (when percent=True)

Returns: dict[int | str, int] mapping class values (or names) to counts, or dict[int | str, float] when percent=True.

When names is provided, classes present in the data but missing from the mapping use their integer key (with a warning). Classes in the mapping but absent from the data receive a count of 0.

save_counts_to_raster(path, counts, *, prefix=None)

Parameter Type Description
path str or Path Path to an existing GeoTIFF file
counts dict Dict of class counts as returned by count_classes
prefix str or None Tag name prefix. Defaults to CLASS_COUNT_

Writes each entry as a GDAL metadata tag (e.g. CLASS_COUNT_0=1234). Existing tags matching the prefix are cleared before writing.

Performance

Benchmarks on a Ryzen 9 5950X with RTX 4090, 100M-element arrays:

Backend Time (ms)
NumPy 178
Numba 17
PyTorch CPU 38
PyTorch GPU 2

Backend comparison

Run the included benchmark notebook to compare backends on your hardware.

See Examples.ipynb for a walkthrough of all features including name mapping, percentages, PyTorch tensors, and GPU acceleration.

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-feature)
  3. Run tests: uv run pytest
  4. Submit a pull request

License

MIT

About

Fast and lightweight pixel class counting for NumPy arrays, tensors, and GeoTIFF rasters.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors