How it works

The PSX has a dedicated sound chip (the SPU, Sound Processing Unit) with its own 512 KB of memory. Games can't talk to the SPU directly from main RAM — they use Sony's PsyQ libraries (libsnd for sequencer playback, libspu for low-level voice control) which speak the SPU's wire protocol. Legaia links both statically, so the libraries are part of the game binary and can be inspected like any other code.

PSX games typically split audio into three concerns: sound effects (short one-shot samples), BGM / music (sequence-driven, longer), and streaming audio (XA / .pac voice). Legaia is no exception, but its plumbing is layered:

  • A path-string cluster at 0x8007B380 holds the file extensions the sound subsystem appends to scene-asset paths: .spk, .LZS, .dpk, .MAP, .PCH, .pac, STR, plus the master file bse.dat.
  • Three SCUS-side consumers turn those paths into actual loads: a sound-init / .dpk loader, a streaming-asset loader for XA-family files, and a mode-aware extension dispatcher.
  • Underneath that is Sony's PsyQ SsAPI sequencer (libsnd, statically linked) for SEQ playback, plus the PsyQ libspu primitives that actually move bytes into SPU RAM.

The format on disc that's fully documented is the VAB sound bank (Sony's standard VABp-magic instrument bank format). The .dpk / .MAP / .PCH / .spk / .pac family is still TBD — the dispatch chain into them is fully traced, but their byte-level layouts haven't been reversed yet.

BGM dispatch is genuinely surprising: there's no literal "BGM ID → file" table. A field-VM script writes a BGM ID, and a per-frame poller resolves it to a PROT index using the current scene's CDNAME block layout. So "the table" isn't in the executable — it's in the on-disc filesystem layout itself.

Path-string cluster

The string cluster at 0x8007B380 holds the file extensions the sound subsystem appends to scene-asset paths. Eight extensions: .spk, .LZS, .dpk, .MAP, .PCH, .pac, STR, bse.dat (master file). Full layout in the sound-driver format spec.

Three SCUS consumers

FunctionRole
FUN_8001FA88Sound subsystem init / .dpk loader. Loads bse.dat master bank, then per-scene .dpk from h:\main\bg\domepack\….
FUN_8001FC00Streaming-asset loader. Builds paths under the sound\ prefix; the XA / .pac / STR consumer.
FUN_8001EBECMode-aware extension dispatcher. Reads _DAT_8007B824 as a mode index, then uses small per-mode tables to pick which extension to hit.

Both FUN_8001FA88 and FUN_8001FC00 carry a dev/retail split via _DAT_8007B8C2. The dev branch loads via PROT indices directly; the retail branch uses dev-style paths through FUN_8003E6BC (the path-based opener) which resolves h:\main\bg\domepack\… into the appropriate PROT entry through the CDNAME-driven name map. Both paths land at the same files.

VAB sound banks

Sony's standard VABp-magic instrument bank format, documented at formats/vab. The dominant on-disc carrier is the scene-VAB-prefixed streaming shape — the VAB body is preceded by a 4-byte chunk0 header. Implementation: crates/vab (header parser + extractor + ADPCM decoder).

Bulk scan finds 1191 VABp headers across 239 PROT entries. Multi-bank archives at 0889_sound_data2, 0890_sound_data2, 0891_level_up. The vab_01 cluster (CDNAME indices 1072–1194) is the standard distributed-bank layout.

Per-actor sound effects

Function
FUN_800250D4(sound_id, voice)
Called from
The actor tick (FUN_80021DF4) when actor[+0xb4] != 0 (one-shot pulse) or actor[+0xac] is staged (continuous)

Looks up a sound entry at &DAT_8006F198 + sound_id*8 for sound_id < 0x200, or in the runtime-allocated table at _DAT_8007B8D0 for higher IDs (the .dpk consumer's bank). The entry's byte[3] & 0x1F is the voice count; the helper then calls FUN_800653C8 (libSPU SpuKeyOn-equivalent) for each of voice..voice+count-1.

Move-VM and field-VM opcodes write actor[+0xac] (sound ID) and actor[+0xb0] (voice); the move-VM tick re-fires the SFX whenever the trigger flag at actor[+0xb4] is set.

Monster sound bank — h:\mpack\monster.snd

Battle-time monster sound banks live in a single packed monster.snd file. The loader is FUN_8003E104(monster_idx, slot, dst_buf) — called twice from the battle scene loader FUN_800520F0 (slots 7 and 8, for the active battle's two monster sound banks). It reads the file's per-monster TOC at 0x801C8980 - 0x10 (4-byte stride, paired entries giving [start_lba, end_lba+1]), computes the LBA range, and dispatches:

  • Dev path (_DAT_8007B8C2 != 0) — uses the standard library file API: FUN_800608F0 (fopen) → FUN_80060920 (fseek to record × 0x800) → FUN_80060944 (fread) → FUN_80060910 (fclose). Path string: h:\mpack\monster.snd.
  • Retail path — stages (size, dst) into the gp window at +0x97c / +0x894, kicks the async CD read via FUN_8003F128. Sets a 120-frame timeout at +0x91c.

The same pattern (h:\mpack\… paths + per-record TOC at a small data structure) is the shape we expect for the rest of the still-TBD audio formats — read the FUN_8003E104 dump as the canonical example.

BGM dispatch

The field VM's opcode 0x35 writes the BGM ID to _DAT_8007BAC8. FUN_800243F0 (the per-frame asset poller) resolves it to a PROT index — bgm_id < 2000 is scene-local, bgm_id ≥ 2000 is a global pool. There's no literal BGM table; the resolution is a PROT-relative offset into the CDNAME per-scene block.

See script VM → "BGM lookup table" for the resolver code.

SsAPI sequencer (0x80061-0x80067 cluster)

Legaia statically links Sony's PsyQ libsnd / SsAPI sequencer for .SEQ-driven music. The cluster lives in SCUS at 0x80061B18..0x800681D8 and uses the standard SsAPI globals.

Globals

GlobalRole
_DAT_801CD2B816-bit slot-allocation bitmap (MAX_SEQ_SLOTS = 16).
_DAT_801CD2C0[16]Per-slot pointer table — each entry points at a 0xB0-byte SsAPI sequence-state struct.
_DAT_801CD2C0[i] + 0x58/0x5APer-slot vol/pan, clamped 0..0x7F.
_DAT_801CD2C0[i] + 0x88Running tick (advanced by the varint delta-time decoder).
_DAT_801CD2C0[i] + 0x98Per-slot status flags (bit 0 = paused, bit 1 = active, bit 2 = stopped, bit 3 = end-of-sequence, bit 4/5 = volume-ramp scheduling, bit 8 = ramp lock, bit 0xA = repeat).
_DAT_801CE060Per-voice flag bank (32 voices, bit-packed).
_DAT_801CE080..ACVoice-attribute slots (per-voice pitch + vol working state).
_DAT_8007A94012-entry MIDI-key pitch table (used by FUN_80066E50).
_DAT_801CE564 / _DAT_801CE574Function-pointer hooks installed by Legaia. _564 resolves the active script-VM seq context; _574 is a worker-availability check. Distinct from the standard PsyQ in-line slot lookup — the actor / field VM is wiring callbacks here.

Public SEQ API

FunctionRole
FUN_80062340(seq_data, slot_hint)SsSeqOpen — walks the slot bitmap, marks the first free slot, calls FUN_80062410. Returns slot ID or -1.
FUN_80061D18(slot)SsSeqClose — calls FUN_80067E9C(slot,0,0,1) + FUN_800684CC, clears bitmap bit, memsets all 16 channel records (size 0xB0) to defaults.
FUN_8006275C(slot,0)SsSeqPlay — clears flags 0/3 in +0x98, sets bit 1.
FUN_800628F0(slot,_,mode,_)_SsSeqCtrlmode==1 resets read pointer + sets flag 0x1; mode==0 sets flag 0x2; otherwise clears both. Stop / Pause / Resume state core.
FUN_800641EC(slot, channel)SsSeqRewind / SsSeqReplay — clears flags 0x1/0x2/0x8/0x400, sets 0x4, full slot reset.

Voice / mixer (audible-output critical path)

FunctionRole
FUN_80067550_SsVoNoteOn — master-vol × velocity × channel vol/pan × four expression sliders × stereo-pan square law (uV*uV/0x3FFF).
FUN_80067E9C_SsSeqNoteOn — sequence-driven keyon. Iterates DAT_801CE344, calls FUN_80068B98 (program-change?).
FUN_80065978_SsVoKeyOnDirect — allocates a voice from _DAT_801CE208, looks up region in _DAT_801CE334 (stride 0x10), writes pitch + base note.
FUN_80066E50(key, fine)_SsPitchFromKey — indexes 12-entry pitch table &DAT_8007A940, octave-shift by (oct-5). Returns 16-bit SPU PITCH register value.

The cluster previously appeared in the renderer / GPU primitives inventory; that was a citation artefact (battle / field code triggers SFX cues during render passes). None of these functions is libgpu / libgs — they're all libsnd.

Engine reimpl can stub the entire cluster behind a legaia-engine-audio::Sequencer trait without touching the per-note math.

libspu / SPU control (0x80068-0x8006D cluster)

Sits underneath the SsAPI sequencer and drives the SPU hardware directly. PsyQ libspu is statically linked here.

SPU globals

GlobalRole
_DAT_8007AF40SPU register base pointer (SPU MMIO at 0x1F801C00..0x1F801E00).
_DAT_8007AF40 + 0x180/0x182MAIN_VOL_L/R.
_DAT_8007AF40 + 0x1AASPUCNT (control register).
_DAT_8007AF40 + 0x1B0/0x1B2REVERB_VOL_L/R.
_DAT_8007AF40 + 0x1C0..0x1FEReverb config block (APF1, COMB1-4, IIR_ALPHA, …).
_DAT_8007AFA4Block table base. Each entry: bit 0x80000000 = free, 0x40000000 = end-of-table.
_DAT_8007AFF8Master attribute struct — 10 modes × 0x44 bytes = 0x2A8 bytes total.

libspu primitives

FunctionPsyQ nameNotes
FUN_8006A728SpuFreeBlock-table free — flips matching addr's high bit, calls FUN_8006A420 (compactor).
FUN_8006ACBCSpuSetVoiceAttrMask-driven dispatcher (mask=0..9 selects defaults from _DAT_8007AFF8 + i*0x44). 1272 bytes.
FUN_8006B1B4SpuSetReverbModeParam30-attr reverb commit; writes regs 0x1C0..0x1FE.
FUN_8006BCB4SpuSetCommonAttrMaster vol L/R + reverb regs + SPUCNT bits. 7-mode jump table.
FUN_8006C048SpuSetVoiceAttr (24-voice broadcaster)Loops i=0..23 over 1<<i mask, writes per-voice regs at +i*0x10. 1548 bytes.
FUN_8006C6E4_SsKey2PitchTwo-octave-table pitch math; returns 14-bit SPU PITCH (clamps 0x3FFF).

SPU DMA transfer engine

Sits between the SsApi seq layer and the libspu register primitives. This is the path SEQ/VAG bytes take when moving from PSX RAM into SPU RAM.

FunctionPsyQ nameNotes
FUN_80069B18(mode, addr, len)_spu_t core4-mode SPU transfer state machine. mode=0: arm READ (xfer-mode bits = 0x30); mode=1: arm WRITE (0x20); mode=2: stage start address into SPU +0x1A6; mode=3: COMMIT — wait for SPUCNT bits, kick the DMA channel, flip command-register direction. Times out at 0xF00 poll iterations.
FUN_800697E0(buf, len)_SpuTransfer outer wrapperSaves SPUCNT mask, sets transfer addr, loops over the transfer block in 0x40-byte chunks. Alternative path to FUN_80069B18 for non-DMA copies.
FUN_80069DA8(addr, len)SpuWrite (top-level)Picks between DMA and CPU paths based on _DAT_8007AF5C.
FUN_8006A020 / FUN_8006A04C_spu_a (read / write direction)Sets SPU command register bits 24..27 to 0x2 or 0x22.
FUN_8006A158SsSpuMalloc coreBlock allocator. Walks the block table, returns the start of the first free run of size ≥ request.
FUN_8006A420SpuFree compactorCoalesces adjacent free entries.

SsApi seq-management layer (above libspu)

FunctionRole
FUN_800684CC(vab_id)SsVabClose (by VAB-ID search) — iterates 0x801CDB60 + i*0x36, matches +0x0, calls FUN_80067480(0).
FUN_80068D94(seq_data, mode)SsSepOpen / SEP loader core. Validates 0x564150 ('VAP' magic), reads SEQ header numTracks at +0x12, calls SsSpuMalloc, patches per-track pointer table, writes MIDI body to SPU.
FUN_8006CA7CSsSeqGetStatus — resolves ctx via _DAT_801CE564, returns ctx +0x49 with state-code normalisation.
FUN_8006CDB0SsSeqSetCallback — resolves ctx via _DAT_801CE564, tail-calls FUN_8006DDC8.
FUN_8006DDC8SsSeqSetMarkCallback — installs trampolines at ctx +0x14/+0x18, sets active-flag at +0x46.

Slot bitmap @ _DAT_801CD2B8 → ptr table @ 0x801CD2C0 → per-slot record (stride 0x36) at 0x801CDB60 → VAB program-attr (stride 0xB0) at 0x801CD2C0[i] + prog*0xB0.

File-API leaf cluster

The dev/retail split for sound + monster-bank loading routes the dev branch through libapi-style file primitives at FUN_800608E0..FUN_80060A04: fopen / fseek / fread / fclose plus a vsync_wait (FUN_8005FCCC) and a BREAK 0x105 trap at FUN_80060A04. These are PsyQ kernel-call wrappers around the BIOS A() table — FUN_80056738 / 80056748 / 80056768 / 80057014 / 8005ACE8 are all jr 0xA0 BIOS dispatchers.

Engine reimpl can map the entire cluster to std::fs + a frame-paced sleep.

Engine-audio model — clean-room SPU port

crates/engine-audio ports the SPU side of the audio stack as a clean-room model. No Sony bytes; the spec is this page plus the libspu API surface and the standard PSX SPU register layout.

ModuleMaps to
spu::SpuThe 24-voice mixer (one Voice per slot) + master volume + a live Reverb processor (see below).
spu::voice::VoicePer-voice state: sample address, loop point, pitch, ADSR, L/R volume, reverb-send flag — the libspu SpuSetVoiceAttr + SpuSetVoiceReverb surface.
spu::reverbPerceptual reverb: ReverbMode enum (Off / Room / StudioA-C / Hall / Space / Echo / Delay / Pipe), per-channel circular delay buffer, libspu-style mode-byte API (Spu::write_reverb_mode_byte).
spu::adsrThe 5-phase ADSR envelope (Attack – Decay – Sustain – Release – Off) with linear / exponential / increase / decrease modes per the standard PSX formula.
spu::adpcmStreaming SPU-ADPCM block decoder (28 samples per 16-byte block). One stateful instance per voice carries the inter-block prev1/prev2 history.
spu::ram512 KB SPU RAM model + libspu-shaped transfer engine (SpuRam::set_direction / write / read + SpuAllocator for SsSpuMalloc / SpuFree).
vab_bind::VabBankBridges legaia_vab::VabReport into the SPU: upload(spu, alloc, report, buf) drops every VAG body into SPU RAM through the allocator, and play_note(spu, voice, prog, note, velocity) translates a MIDI key into voice config + key-on. Pitch math matches _SsKey2Pitch / libspu key-to-pitch.
AudioOutOwns a single cpal output stream that drains the Spu at 44.1 kHz and resamples to the host device rate (linear). Engines call with_spu(|spu| ...) from outside the audio thread to push voice attributes / key-on masks.

What this does not model (out of scope for the first port pass):

  • Pitch modulation, noise, FM. None used by Legaia (verified against the libspu calls in the SCUS dumps — SpuSetPitch is the only pitch path).
  • Asynchronous DMA timing. The transfer engine here is synchronous (queue + drain collapsed) — fine because the playback layer reads SPU RAM directly during voice ticks. The model preserves the API shape (set_transfer_start_units_8 / set_direction / write) so the libspu callers map cleanly.

Reverb model (engine-audio)

The retail SPU implements reverb as a comb-filter + allpass network with a configurable work buffer at the bottom of SPU RAM. The 9 standard libspu modes (Room / StudioA-C / Hall / Space / Echo / Delay / Pipe) plus Off set parameter triples the SPU's reverb registers consume.

The engine-audio clean-room port models reverb perceptually rather than at the register level: a per-channel circular delay buffer with a single feedback tap and a wet/dry mix. Each ReverbMode maps to a (delay_samples, feedback_q14, wet_q14) triple tuned by ear against retail recordings.

Per-voice routing is opt-in: Voice::reverb_send = true (libspu SpuSetVoiceReverb analogue) sums the voice's pre-master output into the reverb send bus; the wet output is mixed back into the master in Spu::tick. Spirit Arts and echo-flagged sound effects opt in; everything else stays dry.

Trade-offs: mode selection via Spu::write_reverb_mode_byte(raw) matches the libspu byte API (1=Room, 2=StudioA, …, 9=Pipe). Out-of-range bytes fall back to Off. The retail SpuSetReverbModeParam (FUN_8006B1B4, 30-attribute commit) is stored but not interpreted — the perceptual presets win. Goal is "Spirit Arts have an echo," not bit-exact reproduction.

Engine-audio model — Sequencer port

The legaia-engine-audio::Sequencer is the runtime side of the SsAPI sequencer cluster above. Surface mirrors SsSeqOpen / SsSeqPlay / SsSeqClose / SsSeqSetVol without copying any Sony bytes:

MethodMaps to
Sequencer::new(seq, bank)SsSeqOpen — bind one SEQ + one VAB bank, allocate channel state.
Sequencer::tick_us(spu, dt_us)Per-frame poller — drains events whose accumulated us elapsed at current tempo.
Sequencer::set_master_vol(vol)SsSeqSetVol master.
Sequencer::set_loop_to(idx)Loop point (_DAT_801CD2C0[i] + 0x98 repeat-bit equivalent).
Sequencer::stop(spu)_SsSeqCtrl(mode=1) — silences and freezes.
Sequencer::rewind_to(idx, spu)SsSeqRewind.

Voice allocation is round-robin over the 24 SPU voices, with the sequencer tracking (channel, key) → voice so the matching key-off can shut down the right slot. Tempo events from the SEQ override the running tempo at the event's absolute tick (matching libsnd's mid-stream 0xFF 0x51). PitchBend / Aftertouch are accepted by the parser and ignored by the playback path until the engine wires per-voice modulation.

Engine-audio model — SFX bank + scheduler

Maps battle / field cue IDs (the kind byte the art-record HitCue / overlay scripts emit) to per-cue SfxEntry descriptors that describe how to fire a one-shot through the SPU. Engines populate the catalog at startup, then forward ScheduledCue-like requests through SfxScheduler which queues each request with its retail timing offset and dispatches when the per-frame tick reaches the firing frame.

Cue IDMeaning
0x1AGeneric SFX trigger ("play sound" hit cue). Catalog typically maps to per-strike weapon impact tones.
0x4CHit-effect visual (no sound on its own; engines that fold the visual into a synced sound use this slot).
0x80..=0xFEReserved per-character / per-art SFX IDs. Indexed from the per-actor +0x9C0 table at retail.

SfxBank::play_one_shot delegates to the existing VabBank::play_note for tone lookup, pitch math, and ADSR setup; the scheduler is a frame-driven queue that returns an SfxFireBatch per tick_frame call so engines can dispatch through the same VabBank they already wired for the BGM sequencer. A PendingCue with frames_remaining = 0 fires on the next tick, so a cue queued mid-frame doesn't fire immediately and gives the host a chance to clear render state first — matching the retail timing where a HitCue::timing_frames = 1 cue plays one frame after the strike begins.

Implementation: crates/engine-audio/src/sfx.rs.

XA-ADPCM (in-progress)

crates/xa decodes the format spec correctly on synthetic inputs. The on-disc .XA files use a non-standard interleave — ~90% of groups don't pass standard validation. Likely a custom event-trigger scheme rather than streamed audio. Pinning down the actual format needs runtime tracing.

What's left

The byte-level layouts of .MAP / .PCH / .spk / .dpk / .pac are still TBD. The dispatch chain into them is fully traced; the next move is to read the body of FUN_8001FA88 for the .dpk byte layout (specifically the field accesses on _DAT_8007B8D0 after the path-based opener returns — _DAT_8007B8D0 + 2 is read as a ushort and used as a divisor, almost certainly a record count).

Eventual home: a crates/sound companion to crates/vab.