Asset extraction
Tools for extracting assets from a user-supplied disc image. Per the project's clean-room model, no Sony bytes ship in this repo - the user runs the extraction tools against their own disc.
Top-level pipeline
legaia-extract (in crates/extract) drives the full pipeline:
./target/release/legaia-extract "/path/to/Legend of Legaia (USA).bin" --out extracted
The pipeline runs verify → disc → PROT → categorize → streaming-format extract → TIM → PNG → CD-XA demux → WAV. Use --skip-png to skip the slowest step; --skip-xa to skip the CD-XA audio demux; --skip-verify to skip the SHA verification.
Output lands in ./extracted/ (gitignored):
extracted/
├── PROT.DAT - raw archive copy
├── CDNAME.TXT - entry name map
├── SCUS_942.54 - executable
├── PROT/ - per-PROT-entry files (1232 entries, named via CDNAME).
│ Includes trailing-overlay sectors for entries
│ whose on-disc footprint extends past their
│ TOC-indexed end (see formats/prot.html).
│ ├── categorize.json - per-class breakdown
│ └── ####_<name>.BIN
├── streaming/ - DATA_FIELD streaming sub-assets
│ └── ####_<name>/chunk##_<TYPE>/####.tim
├── XA/ - raw Form-1 .XA dumps (truncated audio; not listenable)
└── XA_WAV/ - correctly-paced per-channel WAVs (one per (file_no, ch_no))
└── XAn_fileN_chM.wav
The CD-XA step reads the raw disc directly and demuxes every *.XA file into one WAV per (file_no, ch_no) channel, each decoded at its true per-sector rate / stereo mode. This bypasses the extracted/XA/ Form-1 dumps from the disc-walk step, which truncate the Form-2 audio sectors (2324 → 2048) and collapse a file's multiplexed channels into one shuffled stream. The NA corpus is 34 files / 316 channels, all 4-bit 37.8 kHz; non-4-bit channels are skipped with a warning (the group decoder is 4-bit only). The decoder is bit-exact, so the WAVs are reference-quality.
Per-stage tools
When you want to drive a single stage:
Disc → files (disc-extract)
disc-extract extract /path/to/disc.bin extracted/
Walks ISO9660 and writes every file. See disc + ISO9660.
PROT.DAT → entries (prot-extract)
prot-extract extract extracted/PROT.DAT extracted/PROT/ --cdname extracted/CDNAME.TXT
Splits PROT.DAT into 1232 numbered entries with CDNAME-derived filenames. Each extracted file's size is the entry's full on-disc footprint — max(indexed_size, next_start - this_start) — so trailing-overlay sectors past the TOC-indexed end (e.g. PROT 899's title-screen overlay code) are visible. See PROT TOC.
LZS decode (lzs-decode)
lzs-decode raw --size N <file> # standalone LZS body
lzs-decode container <file> <out_dir> # multi-section player.lzs container, one file per section
See LZS compression.
TIM → PNG (tim)
tim convert <file> [out.png] # single TIM; out defaults to <file>.png
tim convert-dir <dir> # recursively convert every .tim under <dir>
TMD analysis (tmd)
tmd info <file> # header + object table
tmd dump-obj <file> --out <prefix> # OBJ-with-faces export
tmd validate-prims <DIR> # bulk-walk every prim group, sanity-check
VAB extraction (vab)
vab list <file> # find + describe every VAB
vab extract <file> --out <out_dir> [--wav] [--sample-rate 22050] # VAG bodies (+ optional WAV)
CD-XA demux → WAV (xa)
The streamed-audio (XA*.XA) decoder. The disc-wide demux is the correct-pacing path the top-level pipeline uses:
xa demux-disc-all <disc.bin> --out extracted/XA_WAV # every .XA → per-channel WAV
xa demux-disc <disc.bin> --lba L --size S --out <dir> # one .XA by LBA/size
xa info <file.xa> [--channels stereo] [--sample-rate 37800]
xa convert <file.xa> [-o out.wav] # single Form-1 dump (must guess rate)
Prefer demux-disc-all over convert/convert-dir: it reads the raw 2352-byte sectors, splits by (file_no, ch_no), and takes the true rate / channel mode from each sector's CD-XA subheader instead of guessing a global rate. See XA audio.
Sub-asset extraction (asset)
The format-aware extractor:
asset categorize <DIR> [--out categorize.json] # per-class breakdown
asset extract <file> --out <out_dir> # streaming-format chunks → individual files
asset stream <file> # walk DATA_FIELD chunks, no extraction
asset describe <file> # asset-descriptor walk
asset effect-bundle <file>
asset tmd-scan <DIR> # bulk byte-search for TMD magic
asset tim-scan <DIR> # bulk byte-search for TIM magic
MES dialog (mes)
mes info <file>
mes disasm <file>
mes json <file>
Move tables (mdt)
mdt classify <file> # detect runtime-buffer vs flat-record layout
mdt records <file> --limit 8
mdt slots <file> --limit 8
Disc-gated tests
Two integration tests touch a real disc and only run when LEGAIA_DISC_BIN points at a valid .bin:
crates/iso/tests/disc_pipeline.rs- disc walk, file count, key file SHA-256s.crates/extract/tests/validation_suite.rs- full pipeline, PROT entry count, sub-asset totals, TIM round-trip.
LEGAIA_DISC_BIN="/path/to/Legend of Legaia (USA).bin" cargo test --workspace
Without the env var, both tests skip and pass - that's intentional, so CI works without redistributing Sony data. Don't change that gating.
Asset viewer
Once assets are extracted, browse them interactively:
# Single TIM
asset-viewer tim extracted/PROT/tim/<entry>.TIM
# TIM at a non-zero offset within a larger file. Use this for TIMs in
# the unindexed pre-`init_data` gap of PROT.DAT (system-UI sprite
# sheet, menu-glyph atlas, etc.) - these aren't reachable through
# the `prot` browser because no TOC entry covers them.
asset-viewer tim extracted/PROT.DAT --offset 0x018E0 --clut 2 # system-UI panel CLUT
asset-viewer tim extracted/PROT.DAT --offset 0x018E0 --clut 7 # system-UI cursor CLUT
asset-viewer tim extracted/PROT.DAT --offset 0x11218 --clut 13 # menu-glyph atlas, "Load" text CLUT
# A Legaia TMD as a 3D mesh (auto-rotating)
asset-viewer tmd extracted/streaming/<entry>/chunk##_TMD/####.tmd
# Directory of TMDs (N/P/PgDn/PgUp to cycle)
asset-viewer tmd extracted/streaming
# Battle bundle (paired TIMs auto-loaded for correct CLUTs)
asset-viewer tmd extracted/streaming/<character> --bundle battle
# A VAB sample
asset-viewer vab extracted/PROT/<entry>.BIN --offset 0xN --sample N
# PROT entry browser (auto-detects format per entry)
asset-viewer prot extracted/PROT.DAT --cdname extracted/CDNAME.TXT
See subsystems/engine.md for the engine port architecture.