Asset extraction

Each crate ships its own extraction binary; the top-level legaia-extract orchestrates them end-to-end.

# Full pipeline
./target/release/legaia-extract \
    "/path/to/Legend of Legaia (USA).bin" --out extracted

# Per-stage tools
./target/release/disc-extract  # Mode2/2352 + ISO9660 walk
./target/release/prot-extract  # PROT.DAT TOC + CDNAME → per-entry files
./target/release/lzs-decode    # LZS decompress
./target/release/asset         # Categorize + sub-asset walk
./target/release/tim           # PSX TIM → PNG
./target/release/tmd           # Legaia TMD parser + OBJ export
./target/release/vab           # VAB extraction → WAV
./target/release/mes           # MES dialog parse / disasm / json
./target/release/mdt           # MDT move table parser

Full extraction tooling reference →

Ghidra in Docker

Headless Ghidra runs in a single Docker service. Volumes mount the extracted disc as read-only data plus the project DB and scripts directory as read-write.

docker compose up -d ghidra

# One-time: import SCUS_942.54
docker compose exec ghidra /ghidra/support/analyzeHeadless \
    /projects legaia -import /data/SCUS_942.54

# Run an analysis script
docker compose exec ghidra /ghidra/support/analyzeHeadless \
    /projects legaia -process SCUS_942.54 \
    -postScript dump_funcs.py -noanalysis -scriptPath /scripts

The script catalogue

All scripts live in ghidra/scripts/:

  • dump_funcs.py — dump a list of functions by entry address.
  • find_lui_writers.py — find LUI+ADDIU pairs that target a specific 32-bit address. Necessary because Ghidra's reference manager doesn't auto-resolve these.
  • function-coverage.py — citation-graph closure: which referenced helpers don't have their own dump yet.
  • force_disasm_dump.py — for jal targets Ghidra hasn't auto-disassembled. Walks instructions, validates ≥8 instructions, then creates the function and dumps it.
  • analyze-overlay.sh — one-shot mednafen save state → labelled overlay program in Ghidra + asset-load CSV.
  • import-overlay-named.sh — preserve overlays across imports (for capturing multiple game-mode overlays simultaneously).

Full Ghidra workflow reference →

Runtime overlay capture

Static analysis hits a wall when code lives in RAM overlays at 0x801C0000+. Capture works from mednafen save states, PCSX-Redux Lua dumps, or Duckstation .sav files. All three are supported:

# Mednafen
scripts/analyze-overlay.sh ~/.mednafen/mcs/Legend*Legaia*.mc0 --label level_up

# Duckstation (zstd-compressed .sav)
scripts/extract-duckstation-overlay.py SCUS-94254_1.sav --out /tmp/legaia_overlay_fishing.bin
scripts/import-overlay-named.sh /tmp/legaia_overlay_fishing.bin fishing

The pipeline locates main RAM using known SCUS anchor strings (e.g. "---- FIELD PROGRAM -----%d"), slices the 0x801C0000–0x80200000 window (256 KB), and imports into Ghidra as a named program. Run inventory_overlay.py to get a function CSV, then write a per-overlay dump script.

Hex jal args in the overlay = PROT indices; cross-reference extracted/CDNAME.TXT to identify which scene asset bundle the overlay loads.

Captured overlays

All major runtime code paths are now covered:

  • Town / field / dialog / inventory — field VM (FUN_801DE840), MES renderer, inventory hub
  • Battle / battle-action — per-actor SM (FUN_801E295C), effect VM cluster
  • Options / config / menus — item / magic / equipment / status / options UI
  • Shop, save screen, level-up, world map, cutscene — all fully dumped
  • Minigame hub (fishing, slot machine, Baka Fighter, dance, debug menu) — five variants of the same overlay; debug-menu capture is the superset (189 functions)
  • Muscle Dome / Baka card battle — distinct overlay; round dispatcher (FUN_801D8DE8), 6500-byte game SM (FUN_801D5854), card resolution (FUN_801D388C); 148 functions dumped

For preserving multiple overlay imports across runs, use import-overlay-named.sh — it produces overlay_<label>.bin rather than overwriting overlay.bin.

Full overlay capture reference →

Mednafen automation

A scriptable substitute for mednafen's interactive memory-watchpoint debugger. The toolkit treats each .mc{0..9} save state as a frozen RAM snapshot, then uses pairwise diffs and targeted bisection to surface where the runtime wrote between snapshots — the watchpoint-equivalent answer without needing a live emulator session.

The crate

crates/mednafen ships:

  • MDFNSVST parser — gzip wrapper + targeted-scan section indexer. Resolves MAIN.MainRAM.data8 as 2 MiB of PSX main RAM and indexes the GPU / SPU / CDC / MDEC / DMA / TIMER sections.
  • Diff engine — coalesces per-byte changes into PSX-virtual-address regions; emits sortable JSON for downstream tools.
  • Bisect helper — walks an ordered snapshot list and reports which adjacent pair brackets a write to a target address.
  • Scenario manifest — declarative scripts/mednafen/scenarios.toml mapping each save slot to a labelled scenario with overlay slices and watchpoint regions.

The CLI

# Inspect one save's section table
target/release/mednafen-state info "$HOME/.mednafen/mcs/...mc0"

# Slice an overlay window (replaces extract-mednafen-overlay.py)
target/release/mednafen-state extract SAVE \
    --start 0x801C0000 --end 0x80200000 --out /tmp/overlay.bin

# Diff two saves in the overlay window
target/release/mednafen-state diff mc1.mc mc2.mc \
    --start 0x801C0000 --end 0x80200000 --top 8

# Bisect a sequence of saves for a transition
target/release/mednafen-state bisect --addr 0x8007B888 mc0 mc1 mc2 mc3

# Run all watchpoints for a scenario against its sister states
target/release/mednafen-state watch area_load_early

The scenarios

Ten labelled scenarios cover the gameplay states most useful for blocked decompilation work:

  • mc0 drake_castle — steady-state field with full party (move-table base pointer)
  • mc1/mc2/mc3 area_load_{early,mid,late} — three progressive captures of a scene-load transition (scene_bundle preamble→slot mapping, navmesh candidate)
  • mc4/mc5/mc6 battle_{intro,arts_view,anim_strike} — battle progression (ANM bytecode interpreter, art-strike chain)
  • mc7 status_menu / mc8 options_menu / mc9 load_screen — menu sub-screens

Why scripted snapshots?

Mednafen's TUI debugger has memory breakpoints but no scriptable interface — every interaction is keyboard-driven inside the running window. Save-state diffs answer the same question ("between this point and that point, what addresses got written?") without needing the live debugger. Cluster the changed bytes into contiguous regions and you have a ranked list of structures to look up writers for in Ghidra.

Findings unlocked by the toolkit

  • MOVE-table source pinned — tracing _DAT_8007B888 across the 10 saves shows three distinct values (one per scene); the per-scene CDNAME block's slot-1 PROT entry (a scene_asset_table) carries descriptor[4] = Move at sizes that match each scene's runtime base. MOVE2 (_DAT_8007B840) is zero across every save — populated only by scenes with an alternate move table.
  • Anim dispatcher located in SCUS — diffing the actor pool between battle-intro idle (mc4) and active somersault (mc6) over 0x801C9594..0x801C9F7F reveals 0x60-byte slots whose +0x00 is the anim PC pointer (actor[+0x4C] mirror) and whose dispatch byte (actor[+0x5A]) flips between values 0x04 (idle) and 0x06 (playing). The dispatcher is FUN_80021DF4 in SCUS_942.54 (already dumped) — seven opcodes, the keyframe path is fully ported.
  • Navmesh / region table located — diffing the navmesh candidate window (0x80100000..0x80120000) between mc1 and mc3 isolates a 24-byte stride record table at 0x80108EA4..0x801095xx with sequential record ids, six i16 coordinate fields, a 4-byte ASCII tag, and per-scene records following a leading shared bank.

Full mednafen-automation reference →