Static overlay pipeline
Most of Legaia's gameplay code lives in RAM overlays paged into the 0x801C0000+ window per game mode (title / field / battle / menu / world-map / cutscene / minigames). The dynamic capture workflow reverses them from emulator save states; this is the static complement - extract each overlay straight from PROT.DAT and disassemble it at its load base, with identity attached from the first byte. It complements the dynamic captures; it does not replace them.
Why static extraction works
PSX overlays are normally clean copies of a fixed-VA-linked blob: the loader DMAs the bytes into the overlay window, runs FlushCache, and jumps in - there is no per-load relocation. Legaia's overlay code ships as MIPS-code entries inside PROT.DAT (the mips_overlay / overlay_ptr_table detectors flag the small ones; the big scene overlays are raw too, just data-section-first). So the on-disc entry is the loaded code, modulo the runtime-written .bss.
This is proved two ways:
- Static reproducibility. The as-loaded bytes extracted from any copy of the disc hash to a committed sha256 (
asset overlay verify). No Sony bytes are committed - only the hash. - Runtime byte-match (disc + save-state gated). The on-disc bytes are byte-identical to the resident RAM image over the entire
.text+.rodataregion. For the battle overlay (PROT 0898) the first0x28800of0x29800bytes match - 100 % of code+rodata - with only the trailing.bssdiverging. For the menu overlay (PROT 0899) the clean prefix is0x15e8cbytes across six menu-open save states.
What it buys (and the limit)
- Solves the VA-aliasing identity problem structurally. Many overlays link to the same VA range -
0x801DD864is a battle-action function in one overlay and a muscle-dome function in another - which is why the repo disambiguates withoverlay_<label>_<addr>naming. Statically, an overlay is “PROT entry N at base X”: identity from the source entry, not a guessed label. - Reproducible from the user's disc, with no curated save state - including overlays nobody ever captured.
- It does not unblock runtime-value captures (
gp[0x754]==3, watchpoint results,ctx[+0x274]bytes). Those still need live probes. This is a workflow + coverage + identity win; the dynamic captures stay authoritative for runtime values.
Base recovery
The load base is recovered statically from the overlay's own internal jal call graph. For the true base B, every internal call target T maps to file offset T - B, which begins a function prologue (addiu sp, sp, -X). Tallying B = T - prologue_offset over every (distinct-call-target, prologue-offset) pair, the true base wins by a landslide (the field overlay recovers 0x801CE818 with 60 corroborating call targets; battle with 44).
This is decisive enough to catch and correct mislabelled overlays. The historical “PROT 0896 = options/menu overlay” label is wrong: 0896 (bat_back_dat) is not an options/menu overlay at all, while the real menu overlay is PROT 0899 at 0x801CE818 - found by byte-searching the corpus for FUN_801CF650's instruction signature, RAM-verified across six menu-open saves. PROT 0899 and the field overlay (0897) are VA-alias siblings in slot A: both load at 0x801CE818 at different times, so 0x801CF650 is a "Give" string in 0897 but the equip aggregator in 0899. That is the exact aliasing this pipeline exists to disambiguate. PROT 0896 is the pipeline's cautionary tale: its whole-file recovery returns a convincing 60-vote base, but the votes come from the field overlay's bytes carried in 0896's over-read tail (whose self-consistency fixes the result by construction); head-only recovery yields no landslide, and a live mode-24 entry capture refuted the old “mode-24 OTHER overlay” reading - the SCUS-resident OTHER INIT streams each minigame's own overlay directly into slot A. Moral: when an entry's footprint over-reads a known overlay, subtract the aliased region before trusting a recovered base. The static loader census closes the question from the other side: a full-image scan of both overlay loaders' jal sites finds no call site that can produce param 0 or 1, so extraction entries 0895/0896 are unreachable from any static loader call - extraction 0895's content is the boot init.pak bundle and 0896 remains an unidentified blob (see boot · game-mode state machine and the open-threads 0896 row).
Slot A vs slot B
The overlay loaders manage two independently swappable slots.
- Slot A (
~0x801CE818) holds the big scene overlays - field (0897), battle (0898), menu (0899), the STR/MDEC cutscene overlay (0970), and the minigame overlays (fishing 0972, slot machine 0975, baka fighter 0976, dance 0980 - the mode-24 door-warp sub-id slots, see script-VM § 0x3E WARP). All VA-alias siblings (same base, resident at different times). The field/battle/menu/cutscene rows recover fromjalalone; the minigame rows are cross-checked instead by a documented minigame function landing on a prologue at the base (anchor_va), because their footprints over-read each other (one minigame's code is duplicated across consecutive entries atbase + N×0x800, sojal-recovery can latch a phantom base; the canonical entry is the one recovering0x801CE818, which is also the entry the warp actually streams - the historical “slot machine = 0973 with a0x4000over-read prefix at0x801CA818” row was that phantom, the same image matched inside 0973's over-read tail). The “world-map”, “save”, and “shop” UIs are not separate entries: the overworld controller lives in field 0897, and the save + shop sessions live in menu 0899 (each function's signature byte-matches only that entry viafind-sig). - Slot B (link base
0x801F69D8) holds the player-summon / effect / minigame-data blobs from the0900..0969PROT cluster. These timeshare one buffer, so a save state catches an inseparable mix of two overlays - there is no clean whole-overlay RAM prefix, and most have too sparse a call graph to recover. Their base is cross-checked the slot-B way: a high fraction of the overlay's internal absolute self-pointers (lui 0x801f/0x8020 ; addiu) resolve in-file only at the committed base. This is precisely where static extraction earns its keep: the disc entry disassembles cleanly at the link base even though the runtime buffer is unusable.
The slot-B cluster is heterogeneous. The summon-stager arithmetic range 0903..=0913 (spell ids 0x81..=0x8B under the corrected loader index math param + 0x37F in extraction space - the deep-dived 38-spawn-call stager file is 0905, the 0x83 slot; the historical “Gimard = 0905” label was the + 0x381 off-by-2) is fully capture-pinned per spell id, zero exceptions. 0907 inside it is Nighto's stager - its head title “Hell's Music” is the attack's display name (the SCUS spell table carries the same string); the earlier “Disco King dance-song” identity is refuted (the dance overlay 0980 has no slot-B loader callsite - its music is sequenced BGM). The high summon block is capture-pinned too: spell ids 0x99..=0xA0 drive extractions 0927..=0934 on the same arithmetic - 0927 “Dark Eclipse” is the Evil-Seru-Magic (Juggernaut) stager, its title the attack's display name like Nighto's; the Sim-Seru summons Palma / Mule / Horn / Jedo land 0928..0931 and the Ra-Seru trio Meta / Terra / Ozma 0932..0934 (pre-linked pointer-table heads, no title). 0924 “Ultimate Rave” remains computed - likeliest another Evil-Seru-Magic creature's stager (the creature resolves the loader id under the generic spell), an open thread. The cluster also holds the GAME OVER overlay (0902) and summon-effect data (0957).
The committed map
crates/asset/data/static-overlays.toml is the entry→base map - one record per overlay: prot_index (the identity), base_va, form (raw / lzs), clean_copy_bytes (RAM-verified prefix length), eligibility (verified / static / ineligible), base_source (jal / capture / cross_ref), an optional anchor_va (a known function VA that must land on a prologue at the base - a capture-free base cross-check for non-jal rows), and the fingerprint_sha256 reproducibility anchor. It spans the slot-A scene family (field/battle/menu + cutscene + the minigames) and the pinned slot-B entries (summon render + the capture-pinned stager blocks 903..913 and 927..934, GAME OVER, the attack-titled 0924, summon-effect data).
CLI
# Inspect the map.
asset overlay list
# Reconnaissance sweep: recover each entry's base + print its leading dev
# string (the identity tell). --base filters to one overlay slot.
asset overlay scan extracted/PROT.DAT --from 895 --to 985 --base 0x801CE818
# Locate a function-head signature across the corpus and infer the host
# overlay's base (the capture-free byte-search that pins an entry).
asset overlay find-sig extracted/PROT.DAT "1e80043c a046838c" --anchor-va 0x801DC6B4
# Re-extract from your PROT.DAT and assert every committed fingerprint
# reproduces (bit-for-bit, from any copy of the disc).
asset overlay verify extracted/PROT.DAT
# Extract each eligible overlay's as-loaded bytes to a gitignored dir.
asset overlay extract extracted/PROT.DAT --out extracted/overlays
# Emit Ghidra import helpers: a per-overlay Jython rename script + a shell
# driver that imports each overlay at its base, program named overlay_<label>.
asset overlay ghidra --out extracted/overlays
# Regenerate map rows (recover bases + hash bytes); review before committing.
asset overlay generate extracted/PROT.DAT --index 897 --index 898
The extracted .bins are Sony overlay code (gitignored); only the map (PROT index → base → label + sha256 hashes) and the docs are committed. A statically-extracted-and-disassembled overlay reproduces the same functions at the same addresses as the captured overlay_<label>_<addr>.txt dumps - asserted against the disc bytes and against live RAM in disc-gated tests.