feat: add UBERON brain anatomy matching for all species by bendichter · Pull Request #1825 · dandi/dandi-cli

bendichter · 2026-03-29T15:37:23Z

Summary

Adds UBERON ontology-based brain region matching, extending anatomy extraction to all species (not just mouse)
For mice: tries Allen CCF first, falls back to UBERON per-token
For other species: tries UBERON directly
Synonym scope is a settable parameter (frozenset[str]), defaulting to EXACT only
Bundles ~2,400 nervous system descendant terms from UBERON as a compact JSON (~479KB)
Generation script parses the OBO file directly with no library dependencies

New files

dandi/data/generate_uberon_structures.py — downloads and parses uberon.obo, extracts nervous system descendants
dandi/data/uberon_brain_structures.json — bundled lookup data (2,408 structures, 9,717 synonyms)

Modified files

dandi/metadata/brain_areas.py — UBERON loading, lookup, and matching functions; locations_to_mouse_anatomy() with Allen→UBERON fallback
dandi/metadata/util.py — _extract_brain_anatomy() now handles all species
dandi/tests/test_brain_areas.py — tests for UBERON matching, synonym scope control, and Allen/UBERON fallback
.pre-commit-config.yaml, pyproject.toml — codespell exclusions for new JSON

Test plan

All 56 brain area tests pass locally
All 179 metadata tests pass (including existing brain anatomy integration tests)
Pre-commit passes (black, isort, codespell, flake8)
CI passes on all platforms

🤖 Generated with Claude Code

For mice, location tokens are first matched against Allen CCF, then fall back to UBERON. For all other species, UBERON is tried directly. Synonym scope (EXACT, RELATED, NARROW, BROAD) is a settable parameter, defaulting to EXACT only. - Add generate_uberon_structures.py to parse the UBERON OBO file and produce a bundled JSON of ~2,400 nervous-system descendants - Add UBERON lookup/matching functions to brain_areas.py - Update _extract_brain_anatomy in util.py to handle non-mouse species - Add comprehensive tests for UBERON matching and Allen/UBERON fallback Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codecov · 2026-03-29T15:42:36Z

Codecov Report

❌ Patch coverage is 93.43629% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.73%. Comparing base (3c7ba2d) to head (69a328e).
⚠️ Report is 11 commits behind head on add-brain-area-anatomy.

Files with missing lines	Patch %	Lines
dandi/metadata/brain_areas.py	84.61%	12 Missing ⚠️
dandi/cli/cmd_service_scripts.py	60.00%	2 Missing ⚠️
dandi/metadata/util.py	66.66%	2 Missing ⚠️
dandi/data/generate_uberon_structures.py	98.90%	1 Missing ⚠️

Additional details and impacted files

@@                    Coverage Diff                     @@
##           add-brain-area-anatomy    #1825      +/-   ##
==========================================================
+ Coverage                   75.34%   75.73%   +0.39%     
==========================================================
  Files                          87       89       +2     
  Lines                       12259    12514     +255     
==========================================================
+ Hits                         9237     9478     +241     
- Misses                       3022     3036      +14

Flag	Coverage Δ
unittests	`75.73% <93.43%> (+0.39%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Instead of passing a flat set of scopes, use a max_synonym_scope parameter (default "EXACT"). Matching tries tiers in precision order: EXACT > NARROW > BROAD > RELATED, up to the specified maximum. Term names are always tried first before any synonym tier. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Thread the scope parameter through so callers can control how permissive UBERON synonym matching is. Defaults to EXACT. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

yarikoptic

most likely my requests could be handled by an agent quite sensibly

dandi/data/generate_uberon_structures.py

yarikoptic · 2026-03-30T14:16:13Z

dandi/data/generate_uberon_structures.py

+            m = re.match(r'synonym:\s+"(.+?)"\s+(EXACT|RELATED|NARROW|BROAD)', line)
+            if m:


I tend to instruct my AIs to do walrus for such... I guess we need to adjust DEVELOPMENT.md and/or .lad for that to be auto-picked up

Suggested change

m = re.match(r'synonym:\s+"(.+?)"\s+(EXACT|RELATED|NARROW|BROAD)', line)

if m:

if (m := re.match(r'synonym:\s+"(.+?)"\s+(EXACT|RELATED|NARROW|BROAD)', line)):

yarikoptic · 2026-03-30T14:16:43Z

dandi/data/generate_uberon_structures.py

+
+def main() -> None:  # pragma: no cover
+    url = "http://purl.obolibrary.org/obo/uberon.obo"
+    print(f"Downloading {url} ...")


especially when moved into service command - use logging to gain logging control/archival etc

dandi/data/uberon_brain_structures.json

pyproject.toml

.pre-commit-config.yaml

dandi/metadata/brain_areas.py

- Move UBERON generator to service-scripts CLI subcommand - Use logging instead of print in generator - Use walrus operators in generator and brain_areas.py - Pretty-print UBERON JSON (indent=1) for better diffs - Use glob patterns for codespell/pre-commit excludes - Exclude *_structures.json from large-file check Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

bendichter · 2026-03-30T15:42:30Z

Addressed all review feedback in e40872c:

Service-scripts CLI: UBERON generator is now a dandi service-scripts generate-uberon-structures subcommand (Allen CCF can follow in a separate commit on the parent PR).
Logging: Replaced all print() calls with lgr.info().
Walrus operators: Applied in both the generator (if m := re.match(...)) and brain_areas.py (if (s := _lookup_in_dicts(...)) is not None:).
Pretty-printed JSON: uberon_brain_structures.json now uses indent=1 for human readability and better diffs. Excluded *_structures.json from the large-file check since the indented file is ~700KB.
Glob patterns for codespell/pre-commit: Both pyproject.toml (*_structures.json) and .pre-commit-config.yaml (dandi/data/.*_structures\.json) now use globs so future structure files are automatically covered.

Not yet addressed: weekly CI smoke test for the generator — happy to add a pytest marker for that if you'd like it in this PR.

yarikoptic · 2026-03-30T17:45:36Z

weekly CI smoke test for the generator — happy to add a pytest marker for that if you'd like it in this PR.

develop the test itself here, and mark with a new (to be added) data_regeneration marker. We will look into adding a dedicated CI (and moving out from the main loop) later for it. also mark with needing network.

Adds a regression test that re-downloads the UBERON OBO file and verifies the generated output matches the committed JSON. Marked with data_regeneration and skipif_no_network for scheduled CI runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

bendichter · 2026-03-31T02:51:34Z

Added in 69a328e:

Regression test in dandi/tests/test_data_regeneration.py: re-downloads the UBERON OBO file, runs the generator to a temp path, and asserts the output matches the committed JSON byte-for-byte. This catches both code regressions and upstream UBERON changes.
Marked with @pytest.mark.data_regeneration (new marker, registered in pytest_plugin.py) and @mark.skipif_no_network.
Run with pytest -m data_regeneration for a scheduled CI job.

satra · 2026-04-02T14:45:32Z

the homba owl file is here: https://github.com/brain-bican/harmonized_ontology_of_mammalian_brain_anatomy_ontology/releases/tag/v2026-04-02

bendichter · 2026-04-02T14:50:56Z

thanks for the pointer, @satra. I'm looking at the URLs of the individual terms inside this owl file, and the ones I tried did not resolve to a term page, e.g. https://purl.brain-bican.org/ontology/HOMBA_12230 and https://purl.brain-bican.org/ontology/HOMBA_12233. Is that intentional?

satra · 2026-04-02T14:53:31Z

url and uri are two different things. an owl doesn't have to resolve to a page. i believe they will fix that though soonish (in a couple of weeks).

yarikoptic · 2026-04-06T13:41:53Z

@bendichter @satra -- please clarify what exactly yet to be done (by @bendichter and his agents) or should I re-review this?

bendichter · 2026-04-06T15:13:47Z

See new design doc: dandi/dandi-archive#2768, an attempt to understand and plan how we can use all these ontologies together

bendichter added the minor Increment the minor version when merged label Mar 29, 2026

fix: resolve mypy error for optional name field in test

8e8262a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

bendichter added enhancement New feature or request minor Increment the minor version when merged and removed enhancement New feature or request minor Increment the minor version when merged labels Mar 29, 2026

bendichter and others added 2 commits March 29, 2026 12:44

feat: add max_synonym_scope parameter to _extract_brain_anatomy

2dc05ec

Thread the scope parameter through so callers can control how permissive UBERON synonym matching is. Defaults to EXACT. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

yarikoptic requested changes Mar 30, 2026

View reviewed changes

		m = re.match(r'synonym:\s+"(.+?)"\s+(EXACT\|RELATED\|NARROW\|BROAD)', line)
		if m:

	m = re.match(r'synonym:\s+"(.+?)"\s+(EXACT\|RELATED\|NARROW\|BROAD)', line)
	if m:
	if (m := re.match(r'synonym:\s+"(.+?)"\s+(EXACT\|RELATED\|NARROW\|BROAD)', line)):

Conversation

bendichter commented Mar 29, 2026 • edited by yarikoptic Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New files

Modified files

Test plan

Uh oh!

codecov bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yarikoptic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yarikoptic Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

yarikoptic Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bendichter commented Mar 30, 2026

Uh oh!

yarikoptic commented Mar 30, 2026

Uh oh!

bendichter commented Mar 31, 2026

Uh oh!

satra commented Apr 2, 2026

Uh oh!

bendichter commented Apr 2, 2026

Uh oh!

satra commented Apr 2, 2026

Uh oh!

yarikoptic commented Apr 6, 2026

Uh oh!

bendichter commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bendichter commented Mar 29, 2026 •

edited by yarikoptic

Loading

codecov bot commented Mar 29, 2026 •

edited

Loading