PROT.DAT / DMY.DAT TOC Confirmed
PROT.DAT is the main asset archive - 1232 numbered entries containing every TIM, TMD, VAB, MES, ANM, MDT, DATA_FIELD streaming buffer, scene asset table, and runtime overlay. Some entries also carry trailing-overlay sectors past their TOC-indexed end (e.g. PROT 899's trailing 60 sectors are the title-screen overlay code); the extractor surfaces these so the on-disc footprint matches what the SCUS boot loader actually reads. DMY.DAT is a sibling archive that turns out to be developer fixtures (memory-bus test pattern + paired random blobs); see DMY.DAT.
Overview
Implementation: crates/prot/src/archive.rs.
Header (8 bytes at offset 0x000 OR 0x800)
u32 file_count_minus_1
u32 header_sectors // size of TOC in 0x800-byte sectors
The detector tries offset 0x000 first, then 0x800, accepting whichever yields plausible values. PROT.DAT uses 0x000.
TOC (immediately after header)
The TOC is a sequence of u32 words. Each on-disc entry occupies multiple TOC slots. For entry index p:
start_lba = toc[p + 2] // absolute LBA into PROT.DAT
indexed_size_sectors = toc[p + 5] - toc[p + 3] + 4 // TOC-declared payload size
footprint_sectors = toc[p + 3] - toc[p + 2] // on-disc span to next entry
size_sectors = max(indexed_size_sectors, footprint_sectors)
byte_offset = start_lba * 0x800
size_bytes = size_sectors * 0x800
toc[p+5] is the absolute LBA of entry p+3 (an end-marker that aliases the next-entry's start), so toc[p+5] - toc[p+3] + 4 recovers the indexed size in sectors.
Trailing-overlay sectors
For ~24% of entries the on-disc contiguous range to the next entry's start LBA is larger than the indexed payload — the trailing sectors carry overlay content the SCUS boot loader reads via a multi-sector ReadN past the TOC-claimed end. PROT entry 899 is the canonical example: indexed payload is 14 sectors (28 KiB, the options menu), but the on-disc footprint is 74 sectors — the trailing 60 sectors are the title-screen overlay code (see boot.md).
Archive::read_entry reads the full footprint; Archive::read_entry_indexed reads only the indexed sub-region. Scene-side parsers were designed for the indexed view and use read_entry_indexed via ProtIndex::entry_bytes; asset-viewer / disc-browser consumers use the full footprint so trailing-overlay content is visible.
> Historical note. An earlier Python proof-of-concept used start_lba = toc[p+5] - toc[p+2]. That subtraction actually computes the SIZE in sectors and was misinterpreted as the start LBA — under that math start_lba collapsed to a small relative offset within "block 0" of the file, and ~80% of PROT entries ended up reading the SAME few low-LBA byte ranges. Anything written using that formula's outputs is artefacted; trust only post-toc[p+2] extractions. The max(indexed, footprint) size extension is a later correction (the indexed formula alone misses trailing-overlay sectors for entries like 899).
In-RAM TOC
SCUS_942.54 keeps a transformed copy of the TOC at RAM address 0x801C70F0. Used at FUN_8003E8A8 (the LBA resolver):
start_lba = TABLE[(idx + 2) * 4 + 0x801C70F0]
end_lba = TABLE[(idx + 3) * 4 + 0x801C70F0]
size_sectors = end_lba - start_lba
Different stride from the on-disc TOC. The on-disc-to-in-RAM transformation runs once at boot (FUN_8003E4E8 reads the first three sectors of PROT.DAT into 0x801C70F0).
Resolving entries by name vs by index
Two entry points:
FUN_8003E8A8- index-based (consumed directly by the streaming loader and the dev-build sound branch).FUN_8003E6BC- path-based; resolves dev paths likedata\battle\efect.datorh:\PROT\FIELD\<scene>\…into an index via the CDNAME-driven name map, then delegates to the LBA resolver. Most retail-build code paths land here.
Names come from CDNAME.TXT, which lives at the top level of the disc.