Scene-prefixed asset bundles Confirmed
Four related shapes account for the dominant per-scene asset layouts on the disc. All of them lead with a 4-byte chunk0 header in the form (type << 24) | size, with type = 0x00 - the same encoding as a DATA_FIELD streaming chunk header. The standard streaming walker would interpret type=0x00 as the TIM dispatcher slot; specialised loaders in the runtime know to dispatch chunk0 differently based on the content magic at offset +4.
scene_tmd_stream - bare-TMD prefix
The dominant scene-asset layout. Implementation: crates/asset/src/scene_tmd_stream.rs. ~12% of all PROT entries match. Walked by FUN_8001FE70 (the battle-init custom walker) - not by FUN_8002541C / FUN_8001F05C despite the chunk-header packing matching the standard format.
+0x00 u32 chunk0_header ; (type=0x00 << 24) | size
+0x04 Legaia TMD ; magic 0x80000002, fills `size` bytes
+0x04 + size streaming chunks ; FUN_8001FE70-style tail until
; terminator OR end-of-file
Strict structural detection:
buf.len() >= 32.buf[4..8] == 0x80000002(Legaia TMD magic).buf[8..12] == 0(TMD on-disc flags; runtime sets to 1 after pointer fixup).buf[12..16]=nobj,1 <= nobj <= 64.- Chunk0 header at
buf[0..4]has type byte 0. - TMD body size (low 24 bits of chunk0 header) is 4-aligned and at least
12 + nobj * 28;4 + size <= buf.len(). - Streaming tail at offset
4 + tmd_sizewalks at least one valid chunk header OR a terminator.
Streaming tail - FUN_8001FE70 walker
Past the leading TMD, each chunk is [u32 header][payload] with header packed as (type << 24) | (size & 0x00FFFFFF). The retail walker (FUN_8001FE70, called from FUN_800520F0 battle scene loader) dispatches:
| Type byte | Action |
|---|---|
0x01 | Upload payload as a single PSX TIM via FUN_800198E0 (LoadImage). |
0x02 | Stop the walk (terminator). |
| any other | Skip silently (advance to next chunk). |
| size = 0 | Stop the walk (zero-size header is the canonical terminator). |
The type-byte semantics differ from the standard FUN_8001F05C dispatcher: there type = 0x01 means TIM_LIST (a [count + offsets + TIMs] pack), but here it means "single bare TIM". So although the chunk-header packing is identical, calling FUN_8002541C on a scene_tmd_stream entry would mis-dispatch and crash. The runtime knows to use FUN_8001FE70 for these entries.
Two-list continuation
Some entries (e.g. 0006_town01.BIN) carry a second set of type-0x01 chunks past the first terminator, separated by a zero-padded gap:
+0x3840 type=0x01 TIM chunk (in FUN_8001FE70's reach)
+0xba64 type=0x01 TIM chunk (in FUN_8001FE70's reach)
+0x13c88 terminator (zero-size header)
+0x13c8c..0x16c23: zero padding
+0x16c24 type=0x01 TIM chunk (continuation - past terminator)
+0x1ee48 type=0x01 TIM chunk (continuation - past terminator)
+0x2706c terminator
FUN_8001FE70 exits at the first terminator, so the continuation list isn't visited by the standard battle-init dispatch. The consumer for the continuation is not yet pinned; the bytes are reachable as WalkSource::Continuation in scene_tmd_stream::battle_tim_chunks and the in-tail chunks alone supply the same (fb_y, fb_x) regions, so the continuation may be a cold-loaded scenario variant.
Reading:
use legaia_asset::scene_tmd_stream;
if let Some(s) = scene_tmd_stream::detect(&buf) {
let tmd = legaia_tmd::parse(&buf[s.tmd_range()])?; // bare TMD, no wrapper
for chunk in &s.tail_chunks {
// each chunk is (type, size, payload) per data-field.md
}
}
// Surface every type-0x01 TIM upload chunk - both in-tail and continuation.
for c in scene_tmd_stream::battle_tim_chunks(&buf) {
// c.payload_offset is the inner PSX TIM magic offset.
// c.source distinguishes Tail (FUN_8001FE70-reachable) from
// Continuation (past the first terminator).
}
scene_vab_stream - VAB-prefix
Same outer wrapper as scene_tmd_stream but the leading chunk carries a Sony VAB sound bank instead of a TMD. The single largest distributed-VAB carrier in the corpus. Implementation: crates/asset/src/scene_vab_stream.rs. ~17% of all PROT entries match.
+0x00 u32 chunk0_header ; LE: type=0x00 in high byte, size=N in low 24 bits
+0x04 u32 magic ; 0x56414270 ('VABp' read as LE u32 = 'p' 'B' 'A' 'V')
+0x08 u32 version ; 7 in retail (must be ≤ 10)
+0x0C.. VAB header tail + programs[] + tones[] + VAG offsets
+0x04 + N streaming chunks ; standard DATA_FIELD chunks until terminator
; OR end-of-file
Strict gate validates the VAB header: version <= 10, program_count <= 128, tone_count <= 128. The chunk0_size low byte is consistently 0x20 (sector-aligned to 32-byte boundary).
Cluster anatomy:
- 119 of 123 entries in the CDNAME
vab_01cluster (1072..1194) match - the standard distributed-bank layout. - 53 entries in
sound_data2(878..890), 19 inmusic_01, 14 inmonster_data/battle_data, plus scattered hits inteien,other5,player_data.
Reading:
use legaia_asset::scene_vab_stream;
use legaia_vab::parse_header;
if let Some(s) = scene_vab_stream::detect(buf) {
let header = parse_header(buf, s.vab_range().start)?;
println!("VAB v{} ps={} ts={}", header.version, header.ps, header.ts);
}
scene_v12_table - scene header + event-script bundle
A scene-named container that bundles a small runtime-fixup header with a full event-script prescript at sector offset 0x800. Implementation: crates/asset/src/scene_v12_table.rs. 97 PROT entries match — one per scene. Detailed reference: scene-v12-table.
+0x000 u16 N + 4 ; runtime fixup-slot offset; header field
+0x002 u16 0x0012 ; constant magic
+0x004 u16 0x0000 ; constant
+0x006 u16 0x0014 ; constant magic (= byte offset of records)
+0x008 u16 param ; count of inline records (0..=192 in retail)
+0x00A u16 N ; runtime fixup-slot offset; header field
+0x00C u16 0x0000 ; constant
+0x00E u16 N + 2 ; runtime fixup-slot offset; header field
+0x010 u32 0 ; padding to 0x14
+0x014 param × 4 bytes ; inline record table
+end_records (= 0x14 + 4*param) ; runtime writes three pointers here
; (slots at +N, +N+2, +N+4 — zero on disc).
+end_records .. 0x800 ; zero padding
+0x800 u16 script_count ; scene event-scripts prescript
+0x802 script_count × u16 ; offset table relative to +0x800
+0x800 + offsets[i] ; per-record field-VM bytecode
The header's u16[0], u16[5], u16[7] are algebraically tied to a single per-scene constant N: u16[0] = N + 4, u16[5] = N, u16[7] = N + 2, and N = 4 * param + 22 (= byte distance from the file head to the first runtime fixup slot). Strict structural checks combine the three constant words, the algebraic ties, and the N/param algebra. Across the entire 1234-entry PROT corpus this matches 97 entries with zero false positives — and every match parses cleanly as a scene-event-scripts prescript at +0x800.
The post-header dense data is the scene_event_scripts prescript, parsed by the field VM (FUN_801DE840) at scene entry. The pre-header table at +0x14 is per-scene runtime metadata: param records of 4 bytes each, grouped by the third byte (b2) into 1..N scene regions; the last byte is always 0x01 (probably a "live" flag). See scene-v12-table for the per-byte semantics.
Each scene block on the disc carries both a v12 entry (this format, prescript at +0x800) and a sister scene_event_scripts entry (prescript at offset 0, no v12 header). The two scripts likely serve different runtime phases — scene-enter triggers vs. per-actor / per-region triggers — though the exact split isn't pinned down yet; both are walked by the same field VM.
scene_asset_table - count-prefixed asset bundle
The on-disc form of the scene asset table that the field loader reads when entering a town/dungeon. Implementation: crates/asset/src/scene_asset_table.rs.
+0x00 u32 count ; descriptor count (6 or 7)
+0x04 u32 meta1 ; varies - purpose unknown
+0x08 count × (u32 type_size, u32 data_offset)
; each pair packs `(type<<24)|size`
+H asset payload region ; LZS-compressed in some entries, raw in the rest
(H = 8 + count*8: 0x40 for count 7, 0x38 for count 6)
The table is count-prefixed, not fixed-7: the runtime walker FUN_80020224 reads count from +0x00 and loops that many descriptors, calling the dispatcher FUN_8001F05C with source = table_base + descriptor.data_offset. Two count values appear:
count = 7- kingdom-bundle scenes (first descriptorTimList); firstdata_offset=0x40.count = 6- early standalone towns (town01= Rim Elm,town0c, …); firstdata_offset=0x38. These previously looked like they "had no MAN in the static bundle"; in fact their table sits in the block's 2nd PROT entry (town01= entry 4, MAN at descriptor 1;town0c= entry 22, MAN at descriptor 2) and iscount=6, so a strictcount==7 && first_offset==0x40detector skipped it. Pinned via a runtime write-watchpoint on the MAN buffer_DAT_8007b898(autorun_man_source.lua) and byte-verified against the live RAM MAN.
Each descriptor is (type_size, data_offset):
type_sizepacks(type_byte << 24) | (size & 0x00FF_FFFF)- the same packing the asset-type dispatcher accepts directly.data_offsetis a file-relative byte position of that descriptor's own LZS stream, addressed against the bundle entry's extended on-disc footprint, not the TOC-indexed sub-region. Descriptor 0's offset is the header end8 + count*8; later descriptors frequently fall past the indexed end into the trailing-overlay sectors the per-PROT TOC crops off.sizeis the decompressed byte count.
The Tmd descriptor (type 2) carries the scene's environment geometry — an asset::pack of Legaia TMDs (terrain, buildings, props) inside that descriptor's LZS stream (town01 = 121 meshes, ≈8041 verts). Because the meshes are LZS-packed, a raw-only TMD scan misses them; the engine's SceneResources walks each entry's LZS-decompressed sections (tmd_scan::scan_entry) to load them, then renders the field with every TIM uploaded (upload_all_tims, matching retail's field loader). The per-mesh world placement + mesh selection for this static geometry come from the field map file's object table (FUN_8003a55c; parser legaia_asset::field_objects, which resolves each object's pack_index into this pack) — see field-locomotion; legaia-engine play-window renders the town from it.
Type-sequence variants (count=7 unless noted):
| Tuple | Notes |
|---|---|
(1, 2, 3, 4, 5, 6, 7) |
Standard count-7 bundle: (TimList, Tmd, Man, Mes, Move, Anm, Vdf). |
(1, 3, 4, 5, 6, 7, 0x14) |
Skips Tmd; trailing 0x14 = Flag(0x14) sentinel. |
(2, 3, 4, 5, 6, 7, 0x14) |
Skips TimList. |
(10, 2, 3, 4, 5, 6, 7) |
Leading Flag(0xA) sentinel. |
(1, 2, 3, 4, 6, 7, 0x14) |
Skips Move. |
(2, 3, 5, 6, 7, 0x14) |
count-6 early-town variant (town01): MAN at index 1. |
(10, 2, 3, 5, 6, 7) |
count-6 early-town variant (town0c): leading Flag(0xA), MAN at index 2. |
All scene-named (izumi, cave01, bylon, dolk, vell, urudre1, chitei2, nilboa, keikoku, concnow, jouina/b/c/e, …). Sizes ~60 KB to ~452 KB.
Reading:
use legaia_asset::scene_asset_table;
if let Some(t) = scene_asset_table::detect(buf) {
for (i, d) in t.descriptors.iter().enumerate() {
println!("desc[{}]: type={:#04x} size={} off={:#x}",
i, d.type_byte, d.size, d.data_offset);
}
}
The runtime consumer is the field-loader chain documented under the asset-loader subsystem: FUN_8001F7C0 + FUN_800255B8 plus the dispatcher at FUN_8001F05C consumes each descriptor pair after LZS-decoding the payload region into a working buffer.
scene_scripted_asset_table - scripted prefix + canonical bundle
A composite shape that pairs a [u16 count][u16 offsets[count]] script prescript at offset 0 with a canonical 7-asset scene table at the next 0x800 sector boundary. Implementation: crates/asset/src/scene_scripted_asset_table.rs. ~6% of all PROT entries match.
+0x00 u16 count ; 1..=4096 - number of script records
+0x02 u16 offsets[count] ; offsets[0] = 2 + count*2,
; monotonically non-decreasing
+offsets[i] record bytecode ; per-record opcodes (typical
; opener: 0xFFFF 0x0000 sentinel +
; field-VM-shaped frame ops)
+0x800-aligned u32 count = 7 ; canonical scene-asset-table lead
... ; same layout as scene_asset_table
Strict gate validates both the prescript and the inner asset table:
u16[0]is the record count (1..=4096).u16[1]algebraically ties to the count:offsets[0] = 2 + count*2.- All offsets monotonic, in-bounds.
- Past the last record offset, the next
0x800-aligned position carriesu32 count = 7plus a validscene_asset_tableheader (first descriptor at+0x40, all type bytes<= 0x14).
The two-level gate is what makes this detector zero-false-positive: the prescript shape alone occasionally matches arbitrary [count][offsets]-shaped data, but the asset-table check at the next sector boundary is a strong second signal.
The prescript is plausibly the scene event-script bytecode that the field VM (FUN_801DE840) executes when the scene loads. The 0xFFFF 0x0000 sentinels at record starts strongly resemble field-VM frame-divider opcodes. The runtime is presumed to walk the prescript first (loading scene-specific event scripts), then load the asset bundle from the sector-aligned position. The exact prescript opcode set is unconfirmed pending more reverse work.
tmd_size_prefix - truncated TMD-prefix
Sister to scene_tmd_stream for the truncated case: same outer shape ([u32 prefix][TMD magic at +4][zero flags][nobj]), but the on-disc payload is shorter than the prefix claims. Implementation: crates/asset/src/tmd_size_prefix.rs. ~3% of all PROT entries match.
+0x00 u32 total_size ; claimed total in-memory size, > on-disc len
+0x04 u32 0x80000002 ; Legaia TMD magic
+0x08 u32 0x00000000 ; TMD flags (on-disc; runtime sets to 1)
+0x0C u32 nobj ; small (typically 2 or 4)
+0x10 object_table[nobj] ; 28 bytes per object (PsyQ TMD layout)
+0x10 + nobj*0x1C ; primitive data (truncated at sector boundary)
All object pointers (vert_top, norm_top, prim_top) point within the prefix-claimed total size - so the on-disc file is genuinely a prefix of a larger logical resource, not a malformed header.
Strict structural checks:
- TMD magic at
+4, flags == 0 at+8,1 <= nobj <= 8. claimed_total > buf.len()- distinguishes fromscene_tmd_streamwhich catches the complete case.- Object table fits on disc.
- Each object's vert / normal / primitive ranges fit within the claimed total.
The 34 hits are all 12 KB files (6 sectors). The runtime consumer hasn't been located; likely the loader allocates claimed_total bytes of RAM and either (a) zero-fills the missing tail, or (b) streams the remainder from another PROT entry.
scene_event_scripts - prescript-only
Sister of scene_scripted_asset_table for the case where the same [u16 count][u16 offsets] prescript exists at offset 0, but the post-prescript payload is not a canonical 7-asset table. Implementation: crates/asset/src/scene_event_scripts.rs. ~20 PROT entries match.
+0x00 u16 count ; 3..=4096
+0x02 u16 offsets[count] ; offsets[0] = 2 + count*2,
; monotonically non-decreasing,
; all <= file size
+offsets[i] record bytecode ; per-record opcodes; the bulk
; of records open with the
; field-VM frame sentinel
; `0xFFFF 0x0000`
... ; bulk asset payload after the
; prescript (per-scene secondary
; header; format unconfirmed -
; appears to be a small `(count,
; descriptor[count])` table at
; the next 0x800 boundary, with
; alternating `(type, size)` and
; runtime-buffer offset pairs)
Strict structural detection:
- Prescript shape valid (count
3..=4096,offsets[0] == 2 + count*2, monotonic, in-bounds). - Frame-opener rate ≥ 50 % of records start with the field-VM
0xFFFF 0x0000sentinel.
The frame-opener rate is what makes this detector zero-false-positive on its own. Random [count][offsets]-shaped data carries no 0xFFFF opener at the record positions; real scene-event-script bundles carry it on the majority of records (50–92 %).
The prescript records are field-VM (FUN_801DE840) event scripts - the same per-frame bytecode shape used by scene_scripted_asset_table (0xFFFF 0x0000 is the field VM's frame divider opcode). Records likely encode: scene-enter triggers, NPC dialogue scripts, cut-scene sequences, pickup / interaction scripts. The per-scene asset payload that follows is loaded by these scripts at runtime.
Detection runs after scene_scripted_asset_table and scene_asset_table, so any composite layouts those detectors recognize claim their entries first.