feat: add UBERON brain anatomy matching for all species#1825
feat: add UBERON brain anatomy matching for all species#1825bendichter wants to merge 6 commits intoadd-brain-area-anatomyfrom
Conversation
For mice, location tokens are first matched against Allen CCF, then fall back to UBERON. For all other species, UBERON is tried directly. Synonym scope (EXACT, RELATED, NARROW, BROAD) is a settable parameter, defaulting to EXACT only. - Add generate_uberon_structures.py to parse the UBERON OBO file and produce a bundled JSON of ~2,400 nervous-system descendants - Add UBERON lookup/matching functions to brain_areas.py - Update _extract_brain_anatomy in util.py to handle non-mouse species - Add comprehensive tests for UBERON matching and Allen/UBERON fallback Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## add-brain-area-anatomy #1825 +/- ##
==========================================================
+ Coverage 75.34% 75.73% +0.39%
==========================================================
Files 87 89 +2
Lines 12259 12514 +255
==========================================================
+ Hits 9237 9478 +241
- Misses 3022 3036 +14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Instead of passing a flat set of scopes, use a max_synonym_scope parameter (default "EXACT"). Matching tries tiers in precision order: EXACT > NARROW > BROAD > RELATED, up to the specified maximum. Term names are always tried first before any synonym tier. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Thread the scope parameter through so callers can control how permissive UBERON synonym matching is. Defaults to EXACT. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
yarikoptic
left a comment
There was a problem hiding this comment.
most likely my requests could be handled by an agent quite sensibly
| m = re.match(r'synonym:\s+"(.+?)"\s+(EXACT|RELATED|NARROW|BROAD)', line) | ||
| if m: |
There was a problem hiding this comment.
I tend to instruct my AIs to do walrus for such... I guess we need to adjust DEVELOPMENT.md and/or .lad for that to be auto-picked up
| m = re.match(r'synonym:\s+"(.+?)"\s+(EXACT|RELATED|NARROW|BROAD)', line) | |
| if m: | |
| if (m := re.match(r'synonym:\s+"(.+?)"\s+(EXACT|RELATED|NARROW|BROAD)', line)): |
|
|
||
| def main() -> None: # pragma: no cover | ||
| url = "http://purl.obolibrary.org/obo/uberon.obo" | ||
| print(f"Downloading {url} ...") |
There was a problem hiding this comment.
especially when moved into service command - use logging to gain logging control/archival etc
- Move UBERON generator to service-scripts CLI subcommand - Use logging instead of print in generator - Use walrus operators in generator and brain_areas.py - Pretty-print UBERON JSON (indent=1) for better diffs - Use glob patterns for codespell/pre-commit excludes - Exclude *_structures.json from large-file check Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Addressed all review feedback in e40872c:
Not yet addressed: weekly CI smoke test for the generator — happy to add a pytest marker for that if you'd like it in this PR. |
develop the test itself here, and mark with a new (to be added) |
Adds a regression test that re-downloads the UBERON OBO file and verifies the generated output matches the committed JSON. Marked with data_regeneration and skipif_no_network for scheduled CI runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Added in 69a328e:
|
|
the homba owl file is here: https://github.com/brain-bican/harmonized_ontology_of_mammalian_brain_anatomy_ontology/releases/tag/v2026-04-02 |
|
thanks for the pointer, @satra. I'm looking at the URLs of the individual terms inside this owl file, and the ones I tried did not resolve to a term page, e.g. https://purl.brain-bican.org/ontology/HOMBA_12230 and https://purl.brain-bican.org/ontology/HOMBA_12233. Is that intentional? |
|
url and uri are two different things. an owl doesn't have to resolve to a page. i believe they will fix that though soonish (in a couple of weeks). |
|
@bendichter @satra -- please clarify what exactly yet to be done (by @bendichter and his agents) or should I re-review this? |
|
See new design doc: dandi/dandi-archive#2768, an attempt to understand and plan how we can use all these ontologies together |
Summary
frozenset[str]), defaulting to EXACT onlyNew files
dandi/data/generate_uberon_structures.py— downloads and parsesuberon.obo, extracts nervous system descendantsdandi/data/uberon_brain_structures.json— bundled lookup data (2,408 structures, 9,717 synonyms)Modified files
dandi/metadata/brain_areas.py— UBERON loading, lookup, and matching functions;locations_to_mouse_anatomy()with Allen→UBERON fallbackdandi/metadata/util.py—_extract_brain_anatomy()now handles all speciesdandi/tests/test_brain_areas.py— tests for UBERON matching, synonym scope control, and Allen/UBERON fallback.pre-commit-config.yaml,pyproject.toml— codespell exclusions for new JSONTest plan
🤖 Generated with Claude Code