Skip to content

POC: oot support#218

Draft
briaguya0 wants to merge 65 commits intoHarbourMasters:mainfrom
briaguya0:oot-assets-recovered
Draft

POC: oot support#218
briaguya0 wants to merge 65 commits intoHarbourMasters:mainfrom
briaguya0:oot-assets-recovered

Conversation

@briaguya0
Copy link
Copy Markdown
Contributor

what this does

generates an o2r that matches what https://github.com/briaguya0/Shipwright/tree/fix-skinvtxcnt-ub (just dev from when i started with HarbourMasters/ZAPDTR#37 included) generates when using a PAL GC (sha1: 0227D7C0074F2D0AC935631990DA8EC5914597B4) rom. since zip files aren't generated deterministically the comparison was done file-by-file within the extracted archive.

relevant files

  • zapd_to_torch.py
    • does what it says on the tin, takes xml from ZAPDTR/OTRExporter land and adapts it to be torch yml
  • test_assets.py
    • used to verify, lots of options

why this is POC/draft

  • the soh dir shouldn't be in here
  • i still need to actually review all the torch changes
    • some shared factory changes almost definitely break existing ports
    • there are some hacks/bugs ported from zapd for binary matching
  • need to decide how much data we want to have in yml files
  • it only supports the 1 rom, i want to get it working with/verify against all supported roms

things to look into

  • external files in yml using pal_gc in the path, maybe this is because config.yml is in pal_gc's parent dir?
  • what are the differences in how zapd handles this and how torch does
    • we read from o2r in zapd_to_torch, meaning we're getting extra info from there
    • DMA stuff too

briaguya0 and others added 30 commits March 29, 2026 04:10
Recovered from filesystem after data loss. This squashes ~58 commits
originally made between 2026-03-23 and 2026-03-28. The full original
reflog is preserved in docs/recovered-git-history.md.

New OoT-specific factories:
- OoTSceneFactory (OOT:SCENE, OOT:ROOM) — scene command parsing and binary export
- OoTSkeletonFactory — skeleton, limb, and skin vertex support
- OoTAnimationFactory — normal, curve, legacy, and player animations
- OoTCollisionFactory — collision mesh with camera data and waterboxes
- OoTArrayFactory — Shipwright-compatible VTX and Vec3s arrays

Modified upstream:
- DisplayListFactory — OoT cross-segment DList handling, VTX consolidation,
  virtual segment 0x80, G_BRANCH_Z discovery, ZAPD compatibility fixes
- Companion — OoT factory registration, BUILD_OOT cmake option
- ResourceType — OoT type codes (OSKL, OSLB, OANM, OROM, OCOL, OPTH, OTXT)

Tooling (soh/):
- zapd_to_torch.py — converts ZAPDTR/OTRExporter XML to Torch YAML
- test_assets.sh, check.sh, verify.sh, manifest.sh, lib.sh — test harness
- list_assets.py — asset manifest query tool

Status at time of loss: 20,432 assets passing, 0 failures.
14,355 scene assets in progress (scene/room factory implemented,
iterating on binary format correctness). OoTTextFactory was not
recovered and needs recreation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- identify_roms.sh: identifies OoT ROMs by SHA1, renames to
  standardized format, handles duplicates
- extract_dma.py: extracts DMA tables from all 17 ROM versions
  using Shipwright filelists, outputs JSON keyed by filename
- Pre-computed DMA tables for all 17 versions (14 unique)
- Manifests directory with gitignore for generated hash files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
config.yml moved from soh/ to soh/assets/yml/ where Torch expects it.
Generated per-version YAML dirs are gitignored via local .gitignore
rather than the top-level one.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add lib/libyaz0/ with decode support following libmio0/libyay0 pattern
- Wire YAZ0 into Decompressor::Decode and AutoDecode
- Add missing PendingVtx struct in DeferredVtx namespace
- Add missing IS_VIRTUAL_SEGMENT macro in BaseFactory.h
- Add libyaz0 to CMake C_FILES glob

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TranslateAddr now recognizes high segments (>= 0x80) when they
  exist in the segment map, not just standard segments (0x01-0x1F)
- ASSET_PTR extracts segment offset for virtual segments too,
  preventing raw 0x80XXXXXX addresses from being used as buffer offsets

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OTRExporter writes 0-byte files for LimbTable entries. BlobFactory
crashed when trying to Write() a null buffer. Guard the write with
an empty check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Empty blobs (e.g. LimbTable) now write 0 bytes to match OTRExporter
  reference output instead of writing a header with size 0
- test_assets.sh auto-logs to soh/logs/ with timestamp
- New compare_asset.sh tool for hex-diffing individual assets between
  reference and generated O2R

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
*.o2r are generated archive files. torch.hash.yml is a Torch
build cache tracking which YAMLs have been processed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Hash all extracted files in a single sha256sum call instead of
  one process per file
- Redirect torch output to a log file instead of piping through grep
- Collapse duplicate jq reduce into one pass with inline fail count

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrites the asset test script in Python to avoid per-file process
spawning. YAML collection, O2R extraction, and hashing are all done
in-process. Hashes assets directly from the zip without extracting
to disk.

107s → 1.6s for 17,516 object assets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add BUILD_OOT option (default ON) following pattern of other games,
  defines OOT_SUPPORT so OoT factories are registered
- Stub OoTTextFactory so it compiles (real impl is task HarbourMasters#5)
- Expose DeferredVtx::BeginDefer in DisplayListFactory.h so
  OoTSceneFactory can call it

Enables 16,952 additional assets: 12,377 → 29,329 passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Enable GFX auto-discovery for auto-discovered limbs (previously
  disabled, causing 573 limbs to have empty DList paths)
- Fix LOD limb DList suffix: use "FarDL" instead of "DL2" to match
  OTRExporter/ZAPDTR naming convention
- Fix Curve limb DList suffixes: "CurveDL"/"Curve2DL" to match ZAPDTR
- Resolve LOD far DList before near, so shared-address limbs use
  the Far name for both fields (matches OTRExporter behavior)
- Rewrite compare_asset.sh as compare_asset.py (takes two O2Rs,
  no torch run needed)
- test_assets.py now saves generated.o2r to soh/o2r/ by default

Objects: 17,322 passed, 1 failed (MTX), 193 not generated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OTRExporter writes a 0-byte file for each skeleton's limb array
(e.g. gKeeseSkeletonLimbs). Add this to the skeleton factory's
parse to match.

Objects: 17,515 passed, 1 failed (MTX), 0 not generated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OTRExporter/ZAPDTR reads the N64 Mtx as 16 sequential int32 BE
values and writes them back as-is. Our exporter was writing
individual uint16 int-part values, which produced byte-swapped
output within each 32-bit word.

Now reads and stores the raw int32 values in the parser and writes
them in the binary exporter, matching the reference format.

Objects: 17,516 passed, 0 failed. Code: 11 passed, 0 failed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When multiple segments map to the same physical ROM address (common
for overlays which alias segments 8-13 to their code data), the
virtual address patcher was returning a segment 0x0D address instead
of segment 0x80. This caused texture lookups to fail because textures
are registered under segment 0x80 offsets in the YAML.

Now explicitly prefers segment 0x80 when it maps to the same physical
address, matching how YAML offsets are declared.

Overlays: 325 passed, 0 failed (was 101 failures).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Scene/room DLists are auto-discovered by the scene factory with
room-prefixed names matching OTRExporter output. Pre-declared DList
entries from ZAPDTR XMLs used different naming (gXxxDL_ vs
xxx_room_0DL_) causing mismatches.

Scenes: 10,729 passed, 0 failed (was 27 failures).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Room mesh DLists are auto-discovered by the scene factory with
correct room-prefixed names. Pre-declared DLists from ZAPDTR XMLs
(both room-named and scene-named) conflict with auto-discovery.

18 scene-level DLists declared in room files (e.g. gKinsutaDL_0030B0)
are now missing — these need to be handled by the scene factory or
a separate mechanism. Tracked as part of scene work.

31,156 passed, 1 failed (version), 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Scene/room alternate headers (SetAlternateHeaders command) are now
recursively processed as sub-assets. Processing is deferred until
after the primary header's commands (especially SetMesh) complete,
so primary DLists are registered first and alternate headers reuse
their names for shared ROM addresses.

DeferredVtx state is saved/restored around each alternate header to
prevent VTX consolidation corruption.

Exposes SaveAndClearPending/RestorePending and PendingVtx struct in
DisplayListFactory.h for use by scene factory.

31,436 passed (+280), 128 scene failures (Sets/Cutscenes), 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Alternate headers pass parent's baseName for sub-asset naming
  (DLists, backgrounds, cutscenes, pathways) so names match
  OTRExporter which doesn't prefix with Set_
- Fix cutscene suffix: "CutsceneData" instead of "Cs" to match
  OTRExporter's GetSegmentedPtrName convention

31,501 passed (+345 from session start), 108 failed, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Cutscenes use entryName (with Set_ prefix) matching OTRExporter
- Pathways use baseName (parent name) matching OTRExporter
- Fix cutscene suffix: CutsceneData instead of Cs

31,583 passed, 109 failed (84 Set command data, 24 cutscenes, 1 version).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use getNeighborSize to limit pathway entry scanning instead of a
hard 256 maximum. This helps some alternate headers with tight
boundaries, though pathway count inference remains imperfect
without XML metadata.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OTRExporter creates empty placeholder files for actor list data
(e.g. Bmori1_room_0ActorEntry_000054). Add these as companion
files in the scene factory.

32,151 passed (+568), 109 failed, 0 regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OoT alternate headers reference the same DLists as primary headers
under Set_-prefixed names. OTRExporter creates both files with
identical content.

- Add RegisterAssetAlias to Companion for creating duplicate O2R
  entries with the same binary data under different names
- Scene factory uses entryName for DList symbols and
  ResolveGfxWithAlias to register aliases when an existing DList
  is found at the same offset
- Alias files are written during the export phase using the
  already-serialized binary data (zero re-parsing overhead)

34,539 passed (+2,388), 109 failed, 738 not generated.
Session total: 12,377 → 34,539 (34.9% → 97.6%).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace naive 0xFFFFFFFF scan with a command-aware parser that
correctly determines cutscene boundaries by parsing the command
structure (ID + entry count + entry size per type).

Handles camera splines (terminated by continueFlag), scene
transitions (0x2D), destinations (0x3E8), and standard commands.

Cutscene sizes are now correct, but content still differs from
reference because OTRExporter re-serializes with different byte
ordering (ROM is BE, O2R is LE with CMD_HH packing). Full
re-serialization is the next step.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document the BE→LE field re-packing needed for each command type.
Raw copy doesn't work because OTRExporter uses CMD_HH/CMD_BBH/CMD_HBB
macros to pack fields into uint32 words differently than ROM layout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace raw cutscene copy with proper BE→LE re-serialization using
CMD_HH/CMD_BBH/CMD_HBB field packing to match OTRExporter output.
Handles camera splines, actor cues, misc/lighting/BGM, textbox,
rumble, settime, transition, and destination commands.

33 additional cutscenes now match. 76 failures remain (likely
a subtle issue with uint16/uint32 field reading in some entries).

34,572 passed (97.7%), 76 failed, 738 not generated.
Session total: 12,377 → 34,572 (34.9% → 97.7%).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Actor cue entries have rotY/rotZ as the 3rd word packed with CMD_HH,
not a raw uint32. Differentiate actor cues from misc/lighting/BGM
commands to apply correct packing.

34,602 passed (97.8%), 46 failed (44 cutscene, 1 pathway, 1 version).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
34,602/35,386 (97.8%) passing. Remaining: 44 cutscene format
issues, 598 audio (no factory), 135 scene sub-assets, 4 text.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
briaguya0 and others added 27 commits March 29, 2026 21:32
Extract cutscene re-serialization into reusable SerializeCutscene
function. Register OOT:CUTSCENE factory for YAML-declared cutscenes
(gXxxCs assets from ZAPDTR XML).

34,698 passed (+50), 0 failed, 688 not generated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 289 lines of inline cutscene serialization with a call to
the reusable SerializeCutscene function. No behavior change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Register OOT:PATH factory for YAML-declared path assets (gXxxPath
from ZAPDTR XML). Reads num_paths pathway entries from ROM and
serializes with the same format as scene companion pathway files.
No doubling for standalone paths (doubling only occurs in
SetPathways command handler).

34,726 passed (+28), 0 failed, 660 not generated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both the SetPathways handler and the standalone OOT:PATH factory
now call the shared SerializePathways function.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Read JPEG screen buffer data (320x240x2 = 153600 bytes) from ROM
at the source address in SetMesh type 1 entries. Write as Background
companion files matching OTRExporter format (IGBO header + size + data).

Handles both single background (format 1) and multiple backgrounds
(format 2).

34,761 passed (+35), 0 failed, 625 not generated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both mesh type 1 format 1 (single) and format 2 (multiple)
background handlers now call the shared helper function.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
g-prefixed scene DLists are at different ROM offsets from room mesh
DLists, but including them in YAML still causes 838 failures.
They interfere with gAddrMap lookups during scene factory processing.
Need a companion-file approach instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
YAML approach causes 838 regressions because GFX factory processing
during YAML parse corrupts DeferredVtx state for subsequent scene
factory. Document root cause and alternative approaches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keep g-prefixed DLists in room YAMLs but sort OOT:ROOM/OOT:SCENE
entries before GFX entries within each file. This ensures the scene
factory processes rooms first with clean VTX state, preventing
auto-discovery conflicts from pre-registered VTX addresses.

34,779 passed (+18), 0 failed, 607 not generated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stop skipping room-prefixed DLists from room XML files. Some
(like spot00_room_0DL_012B20, spot16_room_0DL_00AA48) are child
DLists not discovered by SetMesh and need to be in the YAML.

The YAML entry ordering (OOT:ROOM before GFX) prevents VTX
auto-discovery conflicts. AddAsset deduplicates mesh DLists
that were already auto-discovered by the scene factory.

34,783 passed (+4), 0 failed, 603 not generated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive plan verified against actual OTRExporter and ZAPDTR
source. Documents multi-segment ROM structure, binary formats for
samples/fonts/sequences, and implementation approach.

Key finding: audio data spans 4 separate DMA entries (code,
Audiobank, Audiotable, Audioseq), not a single segment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Step 0: YAML setup, Step 1: main entry, Step 2: load segments,
Step 3: sequences (+110), Step 4: samples (+449), Step 5: fonts (+38).
Each step independently verifiable before proceeding.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract OoT audio table offsets, sequence names, and sample names
from Shipwright XML. Auto-add Audiobank/Audioseq/Audiotable
segments to the audio YAML. Enhanced _format_asset to handle
nested list/dict structures for audio sample banks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Register OOT:AUDIO factory that creates the main audio/audio
entry (64-byte OAUD header with version 2). Fix audio YAML path
to avoid double nesting (audio.yml not audio/audio.yml).

34,784 passed (+1), 0 failed, 602 not generated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parse audio table headers from decompressed code segment. Extract
110 sequences from Audioseq ROM data with metadata (font indices,
medium, cachePolicy). Write as OSEQ companion files.

34,893 passed (+109), 0 failed, 493 not generated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ZAPDTR treats sequence entries with size=0 as aliases: ptr field
is an index to another sequence entry whose data should be used.
Sequence 087_File_Select aliases sequence 40.

34,894 passed, 0 failed. All 110 sequences passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents the safe BinaryReader-based approach for parsing
Audiobank structures, the exact pointer chains for drums/
instruments/SFX, and the OSMP output format. Corrects the
raw pointer arithmetic approach that caused segfaults.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parse font structures from Audiobank to discover all unique samples.
Use LUS::BinaryReader with BE endianness and bounds checking for
all ROM data reads. Extract sample data from Audiotable with
loop metadata and ADPCM book data.

35,321 passed (+437), 3 failed (sample size discrepancies),
62 not generated (38 fonts + misc).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents cross-bank sample naming collision for Tom Drum,
Drum Sidestick, and Windchimes. Identifies root cause and
proposes fix options.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use absolute Audiotable offset (not relative) as the sample name
lookup key, matching ZAPDTR's ZAudio.cpp:174. Only bank 1 (base=0)
resolves named paths; other banks get fallback names like
sample_5_00420C20. Fixes 3 data mismatches and 8 missing samples.

Result: 449/449 samples pass (was 441 with wrong data for 3).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents the OSFT binary format, parsing details from ZAPDTR, and
implementation approach for the remaining 38 audio font assets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parse drums, instruments, and SFX from Audiobank with envelope data
and sample references. Replicate ZAPDTR stack residue behavior for
invalid instrument entries. Add font name extraction to YAML generator.

596/598 audio assets pass. 2 fonts differ by 29 bytes total in
dead data (uninitialized fields in invalid instruments from ZAPDTR UB).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ZAPDTR reuses the same stack slot for DrumEntry and InstrumentEntry.
Invalid instruments before any valid one inherit the last drum's field
values: drum.pan→inst.loaded, drum.loaded→inst.normalRangeLo, with
padding/offset bytes mapping to zero.

598/598 audio assets now pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parse message tables from compressed code segment and text data from
uncompressed message_data_static segments. Handles PAL languages
(ger/fra) with separate lang_offset pointer tables.

4/4 text assets pass. 35,385/35,386 total (only portVersion remaining).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents binary format (7 bytes: endianness flag + 3x uint16 BE),
generation in OTRExporter, runtime consumption in SoH, and root cause
of why Torch doesn't generate it (missing -u/--version CLI flag).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Big Endian flag byte to ParseVersionString matching OTRExporter
format (7 bytes: endianness + 3x uint16 BE). Pass -u 9.2.0 to torch
in test_assets.py.

35,386/35,386 assets pass (100%).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Explains why YAML generation requires a reference O2R for VTX backfill:
VTX assets aren't in XML, and DeferredVtx auto-naming doesn't match
ZAPDTR conventions that SoH expects at runtime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@briaguya0 briaguya0 changed the title POC: oot assets POC: oot support Mar 30, 2026
@garrettjoecox
Copy link
Copy Markdown

Smart approach to match the o2r 1:1, at least as far as quickly iterating to get to parity, is it faster?

As an aside, a lot of the parts I was unhappy with in terms of the flow between the rom and the o2r have more to do with the formatting, things being unnecessarily different than how they are stored in the rom, and this just inherits those issues, right?

@briaguya0
Copy link
Copy Markdown
Contributor Author

is it faster?

not yet, it was during some parts of iteration but some complexity in processing was added that slowed it down, i plan to dig into perf improvements

As an aside, a lot of the parts I was unhappy with in terms of the flow between the rom and the o2r have more to do with the formatting, things being unnecessarily different than how they are stored in the rom, and this just inherits those issues, right?

yeah, the goal of this effort is a drop-in replacement for zapdtr/otrexporter. i want a testable way to say "ok, torch does what we used to use zapdtr/otrexporter to do." once that's working i'm all for making improvements to the file structure etc, but i don't want to block a tooling switch on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants