SKOS (Simple Knowledge Organization System) is a W3C standard for representing controlled vocabularies as concepts, labels, and concept hierarchies.
We use SKOS to keep MIDAS terms in a shared, machine-readable graph (output/vocabulary.ttl) before reducing and converting it into an LLM-focused lexicon (output/lexicon.json).
data/midas-data.owl + data/vocabulary.json
-> utils/create_skos.py
-> output/vocabulary.ttl
-> utils/reduce_skos.py
-> output/vocabulary-reduced.ttl
-> utils/build_lexicon.py
-> output/consolidated_lexicon.json
-> utils/clean_lexicon.py
-> utils/update_lexicon.py
-> output/lexicon.json
utils/create_vocab.py: downloadsdata/midas-data.owlanddata/vocabulary.json.utils/create_skos.py: builds SKOS concepts/schemes (labels + broader/narrower links) ->output/vocabulary.ttl.utils/reduce_skos.py: keeps SKOS schemes/concepts needed for extraction ->output/vocabulary-reduced.ttl.utils/build_lexicon.py: converts reduced SKOS graph to structured JSON ->output/consolidated_lexicon.json.utils/clean_lexicon.py: cleans/compacts lexicon ->output/lexicon.json.utils/update_lexicon.py: injects required constrained-vocabulary additions and validates gold-standard conformance.
python main.py