Overview

Implementation: crates/prot/src/archive.rs.

Header (8 bytes at offset 0x000 OR 0x800)

u32 file_count_minus_1
u32 header_sectors      // size of TOC in 0x800-byte sectors

The detector tries offset 0x000 first, then 0x800, accepting whichever yields plausible values. PROT.DAT uses 0x000.

TOC (immediately after header)

The TOC is a sequence of u32 words. Each on-disc entry occupies multiple TOC slots. For entry index p:

start_lba             = toc[p + 2]                       // absolute LBA into PROT.DAT
indexed_size_sectors  = toc[p + 5] - toc[p + 3] + 4      // TOC-declared payload size
footprint_sectors     = toc[p + 3] - toc[p + 2]          // on-disc span to next entry
size_sectors          = max(indexed_size_sectors, footprint_sectors)
byte_offset           = start_lba * 0x800
size_bytes            = size_sectors * 0x800

toc[p+5] is the absolute LBA of entry p+3 (an end-marker that aliases the next-entry's start), so toc[p+5] - toc[p+3] + 4 recovers the indexed size in sectors.

Trailing-overlay sectors

For ~24% of entries the on-disc contiguous range to the next entry's start LBA is larger than the indexed payload — the trailing sectors carry overlay content the SCUS boot loader reads via a multi-sector ReadN past the TOC-claimed end. PROT entry 899 is the canonical example: indexed payload is 14 sectors (28 KiB, the options menu), but the on-disc footprint is 74 sectors — the trailing 60 sectors are the title-screen overlay code (see boot.md).

Archive::read_entry reads the full footprint; Archive::read_entry_indexed reads only the indexed sub-region. Scene-side parsers were designed for the indexed view and use read_entry_indexed via ProtIndex::entry_bytes; asset-viewer / disc-browser consumers use the full footprint so trailing-overlay content is visible.

> Historical note. An earlier Python proof-of-concept used start_lba = toc[p+5] - toc[p+2]. That subtraction actually computes the SIZE in sectors and was misinterpreted as the start LBA — under that math start_lba collapsed to a small relative offset within "block 0" of the file, and ~80% of PROT entries ended up reading the SAME few low-LBA byte ranges. Anything written using that formula's outputs is artefacted; trust only post-toc[p+2] extractions. The max(indexed, footprint) size extension is a later correction (the indexed formula alone misses trailing-overlay sectors for entries like 899).

In-RAM TOC

SCUS_942.54 keeps a transformed copy of the TOC at RAM address 0x801C70F0. Used at FUN_8003E8A8 (the LBA resolver):

start_lba    = TABLE[(idx + 2) * 4 + 0x801C70F0]
end_lba      = TABLE[(idx + 3) * 4 + 0x801C70F0]
size_sectors = end_lba - start_lba

Different stride from the on-disc TOC. The on-disc-to-in-RAM transformation runs once at boot (FUN_8003E4E8 reads the first three sectors of PROT.DAT into 0x801C70F0).

Resolving entries by name vs by index

Two entry points:

  • FUN_8003E8A8 - index-based (consumed directly by the streaming loader and the dev-build sound branch).
  • FUN_8003E6BC - path-based; resolves dev paths like data\battle\efect.dat or h:\PROT\FIELD\<scene>\… into an index via the CDNAME-driven name map, then delegates to the LBA resolver. Most retail-build code paths land here.

Names come from CDNAME.TXT, which lives at the top level of the disc.

See also