How it works

The PSX has a dedicated sound chip (the SPU, Sound Processing Unit) with its own 512 KB of memory. Games can't talk to the SPU directly from main RAM - they use Sony's PsyQ libraries (libsnd for sequencer playback, libspu for low-level voice control) which speak the SPU's wire protocol. Legaia links both statically, so the libraries are part of the game binary and can be inspected like any other code.

PSX games typically split audio into three concerns: sound effects (short one-shot samples), BGM / music (sequence-driven, longer), and streaming audio (XA / .pac voice). Legaia is no exception, but its plumbing is layered:

  • A path-string cluster at 0x8007B380 holds the file extensions the sound subsystem appends to scene-asset paths: .spk, .LZS, .dpk, .MAP, .PCH, .pac, STR, plus the master file bse.dat.
  • Two SCUS-side consumers turn those paths into actual loads: a sound-init / .dpk loader and a streaming-asset loader for XA-family files.
  • Underneath that is Sony's PsyQ SsAPI sequencer (libsnd, statically linked) for SEQ playback, plus the PsyQ libspu primitives that actually move bytes into SPU RAM.

The fully-documented formats on disc are the VAB sound bank (Sony's standard VABp-magic instrument bank) and the SEQ sequence. The per-scene .dpk / sound_data2 pack is now decoded as a VAB + SEQ bundle (type-0 VAB header + type-1 VAB samples + type-2 SEQ); the .MAP / .PCH / .spk / .pac PsyQ intermediates are dev-side inputs, not separate retail chunks here.

BGM dispatch is genuinely surprising: there's no literal "BGM ID → file" table. A field-VM script writes a BGM ID, and a per-frame poller resolves it to a PROT index using the current scene's CDNAME block layout. So "the table" isn't in the executable - it's in the on-disc filesystem layout itself.

Path-string cluster

The string cluster at 0x8007B380 holds the file extensions the sound subsystem appends to scene-asset paths. Eight extensions: .spk, .LZS, .dpk, .MAP, .PCH, .pac, STR, bse.dat (master file). Full layout in the sound-driver format spec.

SCUS consumers

FunctionRole
FUN_8001FA88Sound subsystem init / .dpk loader. Loads bse.dat master bank, then per-scene .dpk from h:\main\bg\domepack\….
FUN_8001FC00Streaming-asset loader. Builds paths under the sound\ prefix; the XA / .pac / STR consumer.

FUN_8001EBEC was previously listed here as a third "mode-aware extension dispatcher" - a misread. The decomp shows it is the graphics-side character-TMD equipment-conditional group-transform swap (it reads DAT_8007C018[_DAT_8007B824 + 0..2], the loaded battle-character TMD pointers), not a sound consumer. See character-mesh.

Both FUN_8001FA88 and FUN_8001FC00 carry a dev/retail split via _DAT_8007B8C2. The dev branch loads via PROT indices directly; the retail branch uses dev-style paths through FUN_8003E6BC (the path-based opener) which resolves h:\main\bg\domepack\… into the appropriate PROT entry through the CDNAME-driven name map. Both paths land at the same files.

VAB sound banks

Sony's standard VABp-magic instrument bank format, documented at formats/vab. The dominant on-disc carrier is the scene-VAB-prefixed streaming shape - the VAB body is preceded by a 4-byte chunk0 header. Implementation: crates/vab (header parser + extractor + ADPCM decoder).

Bulk scan finds 1191 VABp headers across 239 PROT entries. Multi-bank archives at 0889_sound_data2, 0890_sound_data2, 0891_level_up. The vab_01 cluster (CDNAME indices 1072–1194) is the standard distributed-bank layout.

Per-actor sound effects

Function
FUN_800250D4(sound_id, voice)
Called from
The actor tick (FUN_80021DF4) when actor[+0xb4] != 0 (one-shot pulse) or actor[+0xac] is staged (continuous)

Looks up a sound entry at &DAT_8006F198 + sound_id*8 for sound_id < 0x200, or in the runtime-allocated table at _DAT_8007B8D0 for higher IDs (the .dpk consumer's bank). The entry's byte[3] & 0x1F is the voice count; the helper then calls FUN_800653C8 (libSPU SpuKeyOn-equivalent) for each of voice..voice+count-1.

Move-VM and field-VM opcodes write actor[+0xac] (sound ID) and actor[+0xb0] (voice); the move-VM tick re-fires the SFX whenever the trigger flag at actor[+0xb4] is set.

Monster sound bank - h:\mpack\monster.snd

Battle-time monster sound banks live in a single packed monster.snd file. The loader is FUN_8003E104(monster_idx, slot, dst_buf) - called twice from the battle scene loader FUN_800520F0 (slots 7 and 8, for the active battle's two monster sound banks). It reads the file's per-monster TOC at 0x801C8980 - 0x10 (4-byte stride, paired entries giving [start_lba, end_lba+1]), computes the LBA range, and dispatches:

  • Dev path (_DAT_8007B8C2 != 0) - uses the standard library file API: FUN_800608F0 (fopen) → FUN_80060920 (fseek to record × 0x800) → FUN_80060944 (fread) → FUN_80060910 (fclose). Path string: h:\mpack\monster.snd.
  • Retail path - stages (size, dst) into the gp window at +0x97c / +0x894, kicks the async CD read via FUN_8003F128. Sets a 120-frame timeout at +0x91c.

The same pattern (h:\mpack\… paths + per-record TOC at a small data structure) is the shape we expect for the rest of the still-TBD audio formats - read the FUN_8003E104 dump as the canonical example.

BGM dispatch

The field VM's opcode 0x35 writes the BGM ID to _DAT_8007BAC8. FUN_800243F0 (the per-frame asset poller) resolves it to a PROT index - bgm_id < 2000 is scene-local, bgm_id ≥ 2000 is a global pool. There's no literal BGM table; the resolution is a PROT-relative offset into the CDNAME per-scene block.

See script VM → "BGM lookup table" for the resolver code.

The engine port reuses this same dispatch for the Battle↔Field music swap: World::set_battle_bgm configures a battle track id, and the live gameplay loop queues an ordinary FieldEvent::Bgm{sub_op: 1} start for it on encounter (swap_to_battle_bgm) and resumes the stashed field track on battle end (restore_field_bgm). The host's AudioBgmDirector cross-fades both transitions over ~0.5 s through its existing start_inner path - no separate battle-audio code path. The battle id must resolve in the current scene's BGM table since the live loop doesn't load a distinct battle audio bundle.

SsAPI sequencer (0x80061-0x80067 cluster)

Legaia statically links Sony's PsyQ libsnd / SsAPI sequencer for .SEQ-driven music. The cluster lives in SCUS at 0x80061B18..0x800681D8 and uses the standard SsAPI globals.

Globals

GlobalRole
_DAT_801CD2B816-bit slot-allocation bitmap (MAX_SEQ_SLOTS = 16).
_DAT_801CD2C0[16]Per-slot pointer table - each entry points at a 0xB0-byte SsAPI sequence-state struct.
_DAT_801CD2C0[i] + 0x58/0x5APer-slot vol/pan, clamped 0..0x7F.
_DAT_801CD2C0[i] + 0x88Running tick (advanced by the varint delta-time decoder).
_DAT_801CD2C0[i] + 0x98Per-slot status flags (bit 0 = paused, bit 1 = active, bit 2 = stopped, bit 3 = end-of-sequence, bit 4/5 = volume-ramp scheduling, bit 8 = ramp lock, bit 0xA = repeat).
_DAT_801CE060Per-voice flag bank (32 voices, bit-packed).
_DAT_801CE080..ACVoice-attribute slots (per-voice pitch + vol working state).
_DAT_8007A94012-entry MIDI-key pitch table (used by FUN_80066E50).
_DAT_801CE564 / _DAT_801CE574Function-pointer hooks installed by Legaia. _564 resolves the active script-VM seq context; _574 is a worker-availability check. Distinct from the standard PsyQ in-line slot lookup - the actor / field VM is wiring callbacks here.

Public SEQ API

FunctionRole
FUN_80062340(seq_data, slot_hint)SsSeqOpen - walks the slot bitmap, marks the first free slot, calls FUN_80062410. Returns slot ID or -1.
FUN_80061D18(slot)SsSeqClose - calls FUN_80067E9C(slot,0,0,1) + FUN_800684CC, clears bitmap bit, memsets all 16 channel records (size 0xB0) to defaults.
FUN_8006275C(slot,0)SsSeqPlay - clears flags 0/3 in +0x98, sets bit 1.
FUN_800628F0(slot,_,mode,_)_SsSeqCtrl - mode==1 resets read pointer + sets flag 0x1; mode==0 sets flag 0x2; otherwise clears both. Stop / Pause / Resume state core.
FUN_800641EC(slot, channel)SsSeqRewind / SsSeqReplay - clears flags 0x1/0x2/0x8/0x400, sets 0x4, full slot reset.

SEQ internals

FunctionRole
FUN_80062410(seq_data)_SsSeqInit - validates 'S'/'p' magic + version byte 0x01, reads PPQN base (0x393_8700 = 60 000 000), BPM, ticks-per-quarter from the SEQ header.
FUN_80061C68(slot)_SsSeqGetVar - MIDI-style 7-bit-with-continuation varint decode for delta-time bytes; accumulates into +0x88 running tick.
FUN_80061EDC(slot, channel, vol, …)SsSeqSetVol - calls FUN_800683D8 to fetch (vol_l, vol_r), clamps target ≥ requested, calls FUN_8006206C (slewer), sets bit 0x20, clears bit 0x10 in +0x98.
FUN_8006206C(…)_SsSetSlideVolume - ramp from→to over N ticks. Signed-divide per-tick delta; gated by flags 4 & 0x100 in +0x98.

Voice / mixer (audible-output critical path)

FunctionRole
FUN_80067550_SsVoNoteOn - master-vol × velocity × channel vol/pan × four expression sliders × stereo-pan square law (uV*uV/0x3FFF).
FUN_80067E9C_SsSeqNoteOn - sequence-driven keyon. Iterates DAT_801CE344, calls FUN_80068B98 (program-change?).
FUN_80065978_SsVoKeyOnDirect - allocates a voice from _DAT_801CE208, looks up region in _DAT_801CE334 (stride 0x10), writes pitch + base note.
FUN_80066E50(key, fine)_SsPitchFromKey - indexes 12-entry pitch table &DAT_8007A940, octave-shift by (oct-5). Returns 16-bit SPU PITCH register value.
FUN_80065B88SsResetTranspose - single-store stub: zeros _DAT_801CE2E8 (a base-note offset shifted in by FUN_80065978).

SPU command shims (*0x81 scaling = 0..127 → 0..16383)

FunctionRole
FUN_80062AA0(x, y)SsSetMVol - packs [cmd=3, x*0x81, y*0x81], calls FUN_8006BCB4 (SPU-cmd dispatcher).
FUN_80065440(p1, p2)Single-shot SPU command (likely SsUtKeyOn / SsUtPitchBend) - [cmd=6, p1*0x81, p2*0x81], calls FUN_8006ACBC (sister of FUN_8006BCB4).

The cluster previously appeared in the renderer / GPU primitives inventory; that was a citation artefact (battle / field code triggers SFX cues during render passes). None of these functions is libgpu / libgs - they're all libsnd. The "renderer / GPU primitives" inventory in functions.md previously listed FUN_80061EDC / FUN_80067E9C / FUN_80066E50 / FUN_80067550 under the renderer; they belong here.

Interpretation: _DAT_8007BAC8 = bgm_id written by field-VM 0x35 is consumed by FUN_800243F0 to load a .SEQ payload via the streaming-asset path, and that payload is then handed to FUN_80062340 for sequencer playback.

Engine reimpl can stub the entire cluster behind a legaia-engine-audio::Sequencer trait without touching the per-note math.

libspu / SPU control (0x80068-0x8006D cluster)

Sits underneath the SsAPI sequencer and drives the SPU hardware directly. PsyQ libspu is statically linked here.

SPU globals

GlobalRole
_DAT_8007AF40SPU register base pointer (SPU MMIO at 0x1F801C00..0x1F801E00).
_DAT_8007AF40 + 0x180/0x182MAIN_VOL_L/R.
_DAT_8007AF40 + 0x1AASPUCNT (control register).
_DAT_8007AF40 + 0x1B0/0x1B2REVERB_VOL_L/R.
_DAT_8007AF40 + 0x1C0..0x1FEReverb config block (APF1, COMB1-4, IIR_ALPHA, …).
_DAT_8007AFA4Block table base. Each entry: bit 0x80000000 = free, 0x40000000 = end-of-table.
_DAT_8007AFF8Master attribute struct - 10 modes × 0x44 bytes = 0x2A8 bytes total.

libspu primitives

FunctionPsyQ nameNotes
FUN_8006A728SpuFreeBlock-table free - flips matching addr's high bit, calls FUN_8006A420 (compactor).
FUN_8006ACBCSpuSetVoiceAttrMask-driven dispatcher (mask=0..9 selects defaults from _DAT_8007AFF8 + i*0x44). 1272 bytes.
FUN_8006B1B4SpuSetReverbModeParam30-attr reverb commit; writes regs 0x1C0..0x1FE.
FUN_8006BCB4SpuSetCommonAttrMaster vol L/R + reverb regs + SPUCNT bits. 7-mode jump table.
FUN_8006C048SpuSetVoiceAttr (24-voice broadcaster)Loops i=0..23 over 1<<i mask, writes per-voice regs at +i*0x10. 1548 bytes.
FUN_8006C6E4_SsKey2PitchTwo-octave-table pitch math; returns 14-bit SPU PITCH (clamps 0x3FFF).

SPU DMA transfer engine

Sits between the SsApi seq layer and the libspu register primitives. This is the path SEQ/VAG bytes take when moving from PSX RAM into SPU RAM.

FunctionPsyQ nameNotes
FUN_80069B18(mode, addr, len)_spu_t core4-mode SPU transfer state machine. mode=0: arm READ (xfer-mode bits = 0x30); mode=1: arm WRITE (0x20); mode=2: stage start address into SPU +0x1A6; mode=3: COMMIT - wait for SPUCNT bits, kick the DMA channel, flip command-register direction. Times out at 0xF00 poll iterations.
FUN_800697E0(buf, len)_SpuTransfer outer wrapperSaves SPUCNT mask, sets transfer addr, loops over the transfer block in 0x40-byte chunks. Alternative path to FUN_80069B18 for non-DMA copies.
FUN_80069DA8(addr, len)SpuWrite (top-level)Picks between DMA and CPU paths based on _DAT_8007AF5C.
FUN_8006A020 / FUN_8006A04C_spu_a (read / write direction)Sets SPU command register bits 24..27 to 0x2 or 0x22.
FUN_8006A158SsSpuMalloc coreBlock allocator. Walks the block table, returns the start of the first free run of size ≥ request.
FUN_8006A420SpuFree compactorCoalesces adjacent free entries.

SsApi seq-management layer (above libspu)

FunctionRole
FUN_800684CC(vab_id)SsVabClose (by VAB-ID search) - iterates 0x801CDB60 + i*0x36, matches +0x0, calls FUN_80067480(0).
FUN_80068D94(seq_data, mode)SsSepOpen / SEP loader core. Validates 0x564150 ('VAP' magic), reads SEQ header numTracks at +0x12, calls SsSpuMalloc, patches per-track pointer table, writes MIDI body to SPU.
FUN_8006CA7CSsSeqGetStatus - resolves ctx via _DAT_801CE564, returns ctx +0x49 with state-code normalisation.
FUN_8006CDB0SsSeqSetCallback - resolves ctx via _DAT_801CE564, tail-calls FUN_8006DDC8.
FUN_8006DDC8SsSeqSetMarkCallback - installs trampolines at ctx +0x14/+0x18, sets active-flag at +0x46.

Slot bitmap @ _DAT_801CD2B8 → ptr table @ 0x801CD2C0 → per-slot record (stride 0x36) at 0x801CDB60 → VAB program-attr (stride 0xB0) at 0x801CD2C0[i] + prog*0xB0.

File-API leaf cluster

The dev/retail split for sound + monster-bank loading routes the dev branch through libapi-style file primitives at FUN_800608E0..FUN_80060A04: fopen / fseek / fread / fclose plus a vsync_wait (FUN_8005FCCC) and a BREAK 0x105 trap at FUN_80060A04. These are PsyQ kernel-call wrappers around the BIOS A() table - FUN_80056738 / 80056748 / 80056768 / 80057014 / 8005ACE8 are all jr 0xA0 BIOS dispatchers.

Engine reimpl can map the entire cluster to std::fs + a frame-paced sleep.

Engine-audio model - clean-room SPU port

crates/engine-audio ports the SPU side of the audio stack as a clean-room model. No Sony bytes; the spec is this page plus the libspu API surface and the standard PSX SPU register layout.

ModuleMaps to
spu::SpuThe 24-voice mixer (one Voice per slot) + master volume + a live Reverb processor (see below).
spu::voice::VoicePer-voice state: sample address, loop point, pitch, ADSR, L/R volume, reverb-send flag - the libspu SpuSetVoiceAttr + SpuSetVoiceReverb surface.
spu::reverbFaithful register-driven reverb network (same/different-side IIR + 4-tap comb + 2 all-pass) with the 9 standard libspu mode presets; ReverbMode enum + libspu-style mode-byte API (Spu::write_reverb_mode_byte).
spu::adsrThe 5-phase ADSR envelope (Attack – Decay – Sustain – Release – Off) with linear / exponential / increase / decrease modes per the standard PSX formula.
spu::adpcmStreaming SPU-ADPCM block decoder (28 samples per 16-byte block). One stateful instance per voice carries the inter-block prev1/prev2 history.
spu::ram512 KB SPU RAM model + libspu-shaped transfer engine (SpuRam::set_direction / write / read + SpuAllocator for SsSpuMalloc / SpuFree).
vab_bind::VabBankBridges legaia_vab::VabReport into the SPU: upload(spu, alloc, report, buf) drops every VAG body into SPU RAM through the allocator, and play_note(spu, voice, prog, note, velocity) translates a MIDI key into voice config + key-on. Pitch math matches _SsKey2Pitch / libspu key-to-pitch.
AudioOutOwns a single cpal output stream that drains the Spu at 44.1 kHz and resamples to the host device rate (linear). Engines call with_spu(|spu| ...) from outside the audio thread to push voice attributes / key-on masks.

What this does not model (out of scope for the first port pass):

  • Pitch modulation, noise, FM. None used by Legaia (verified against the libspu calls in the SCUS dumps - SpuSetPitch is the only pitch path).
  • Asynchronous DMA timing. The transfer engine here is synchronous (queue + drain collapsed) - fine because the playback layer reads SPU RAM directly during voice ticks. The model preserves the API shape (set_transfer_start_units_8 / set_direction / write) so the libspu callers map cleanly.

Reverb model (engine-audio)

The retail SPU implements reverb as a same-side / different-side IIR reflection pair feeding a 4-tap comb early-echo and two all-pass stages, run at 22050 Hz over a work buffer at the top of SPU RAM (mBASE = 0x80000 - work_size). The 9 standard libspu modes (Room / StudioA-C / Hall / Space / Echo / Delay / Pipe) plus Off each select a 32-register set (work-area size + IIR/comb/all-pass coefficients + tap addresses).

The engine-audio clean-room port reproduces that network register-for-register in spu::reverb: each ReverbMode loads the standard libspu preset (public PSX hardware-reference constants - the same tables every open SPU emulator ships, not Sony game data) into a recirculating i16 work buffer sized to that mode's work area. Address registers are in 8-byte units, taps wrap within the work area, and the reverb multiply is (sample × coeff) / 0x8000 (signed Q15, so a 0x8000 coefficient inverts phase exactly as the hardware does). So the wet tail carries the real comb/all-pass colouration the retail modes produce, not a flat slap-back.

Per-voice routing is per-voice: Voice::reverb_send = true (libspu SpuSetVoiceReverb analogue) sums the voice's pre-master output into the reverb send bus; the wet output is mixed back into the master in Spu::tick.

Retail reverb routing - Studio C, always on

A pure-Rust sweep of the save-state corpus (reading the SPU register shadow via PsxSpu::reverb_registers / voice_reverb_mask / reverb_master_enabled; CLI mednafen-state spu) pins what retail actually runs - and it falsifies the earlier “Spirit-Arts / echo cues selectively opt in, everything else dry” reading:

  • The reverb network is master-enabled in every captured state (SPUCNT bit 7) - field, town, battle, summon, title, minigames. No scene toggles it.
  • The mode is Studio C everywhere. The 32 reverb registers (0x1F801DC0..) are byte-identical across all captured states and match the StudioC libspu preset exactly (dAPF1=0x00E3, work area 0x6FE0); ReverbMode::identify resolves the captured block to StudioC.
  • Per-voice reverb-send (EON) is broad - typically 15–22 of 24 voices, BGM + SFX alike. Reverb is the default routing, not a per-cue effect.

So there is no per-cue reverb-enable source: the live engine matches retail by calling Spu::set_retail_reverb() once at SPU init (the live mixer does this) - ReverbMode::StudioC + every voice routed.

Boundaries: mode selection via Spu::write_reverb_mode_byte(raw) matches the libspu byte API (1=Room, 2=StudioA, …, 9=Pipe; out-of-range falls back to Off) - the engine half of SpuSetReverbModeParam (FUN_8006B1B4). The hardware's 39-tap FIR input/output resampler (44.1 kHz ↔ 22.05 kHz) is approximated by decimation + zero-order hold, and output volume (vLOUT/vROUT, not part of the mode preset) uses a fixed depth overridable via Reverb::set_output_volume.

Engine-audio model - Sequencer port

The legaia-engine-audio::Sequencer is the runtime side of the SsAPI sequencer cluster above. Surface mirrors SsSeqOpen / SsSeqPlay / SsSeqClose / SsSeqSetVol without copying any Sony bytes:

MethodMaps to
Sequencer::new(seq, bank)SsSeqOpen - bind one SEQ + one VAB bank, allocate channel state.
Sequencer::tick_sample(spu)Production playback clock - advance exactly one SPU sample (44.1 kHz).
Sequencer::tick_us(spu, dt_us)Wall-clock / per-frame poller (parity oracles, tests) - converts µs to whole samples with a carry.
Sequencer::set_master_vol(vol)SsSeqSetVol master.
Sequencer::set_loop_to(idx)External loop-point fallback (_DAT_801CD2C0[i] + 0x98 repeat-bit equivalent) for tracks with no in-stream markers.
Sequencer::stop(spu)_SsSeqCtrl(mode=1) - silences and freezes.
Sequencer::rewind_to(idx, spu)SsSeqRewind.

Voice allocation is round-robin over the 24 SPU voices, with the sequencer tracking (channel, key) → voice so the matching key-off can shut down the right slot. Tempo events from the SEQ override the running tempo at the event's absolute tick (matching libsnd's mid-stream 0xFF 0x51). PitchBend / Aftertouch are accepted by the parser and ignored by the playback path until the engine wires per-voice modulation.

Loop points are read from the stream: the NRPN-style control changes on 0xB0 (controller 99 value 20 = Loop Start, value 30 = Loop Forever). A Loop Start records the position immediately after the marker; a later Loop Forever - or an end-of-track that follows a Loop Start - rewinds there rather than to event 0, so looped BGM repeats from the correct bar instead of restarting the whole track. The rewind resets the integer sample-clock, so the looped body re-fires on the same sample offset every pass. set_loop_to is the fallback for the four retail tracks with no markers.

Timebase. The production playback path ticks the sequencer once per SPU sample (tick_sample), so the music clock is locked to the audio clock. Timing is computed with an exact integer accumulator (units of sample × ppqn × 1_000_000; an event of delta d fires when the accumulator reaches d × tempo_us × 44100) - no per-tick float, no long-track drift, and bit-deterministic for the replay oracle. Note the SEQ tempo gotcha: the header tempo is a 240 BPM placeholder, immediately overridden by the first body 0xFF 0x51 (which, in PSX SEQ, carries its 3 tempo bytes with no MIDI length prefix). Mis-parsing that override pinned playback at the placeholder rate (~3x too fast).

Engine-audio model - SFX bank + scheduler

Maps battle / field cue IDs (the kind byte the art-record HitCue / overlay scripts emit) to per-cue SfxEntry descriptors that describe how to fire a one-shot through the SPU. Engines populate the catalog at startup, then forward ScheduledCue-like requests through SfxScheduler which queues each request with its retail timing offset and dispatches when the per-frame tick reaches the firing frame.

SfxBank::from_descriptors builds the catalog from the disc-decoded static SFX table (DAT_8006F198). The bank those programs index is not a dedicated SFX VAB - it is the active scene's music VAB: FUN_80065034 reads the libsnd current-bank globals (_DAT_801ce33c/_DAT_801ce334/_DAT_801ce340), which point at the per-scene scene_vab_stream bank the BGM sequencer has open (13 distinct banks across the save catalogue; byte-identical to the disc music_01 VAB for a music_01-scene state). So the engine fires a cue with SfxBank::play_one_shot(spu, scene_vab) against the BGM VabBank it already loaded - no separate SFX bank.

Cue IDMeaning
0x1AGeneric SFX trigger ("play sound" hit cue). Catalog typically maps to per-strike weapon impact tones.
0x4CHit-effect visual (no sound on its own; engines that fold the visual into a synced sound use this slot).
0x80..=0xFEReserved per-character / per-art SFX IDs. Indexed from the per-actor +0x9C0 table at retail.

SfxBank::play_one_shot delegates to the existing VabBank::play_note for tone lookup, pitch math, and ADSR setup; the scheduler is a frame-driven queue that returns an SfxFireBatch per tick_frame call so engines can dispatch through the same VabBank they already wired for the BGM sequencer. A PendingCue with frames_remaining = 0 fires on the next tick, so a cue queued mid-frame doesn't fire immediately and gives the host a chance to clear render state first - matching the retail timing where a HitCue::timing_frames = 1 cue plays one frame after the strike begins.

Implementation: crates/engine-audio/src/sfx.rs.

XA-ADPCM (in-progress)

crates/xa decodes the format spec correctly on synthetic inputs. The on-disc .XA files use a non-standard interleave - ~90% of groups don't pass standard validation. Likely a custom event-trigger scheme rather than streamed audio. Pinning down the actual format needs runtime tracing.

Audio-trace parity oracle

Mirror of the VRAM-byte and mode-trace parity oracles on a third axis: per-frame voice activity. The retail side has two capture shapes, both emitting the same AudioTraceFrame JSONL wire format:

  1. Single-cycle snapshot lifted from a mednafen save state's SPU section via legaia_mednafen::PsxSpu (24 voice records, master volume sweep, voice-on/-off masks, reverb mode, 512 KiB SPU RAM). One .mc{slot} save → one retail AudioTraceFrame. Convergence: "did any engine frame in the window match retail's voice mask?".
  2. Multi-frame trace captured by autorun_audio_trace.lua running inside PCSX-Redux: per-vsync PCSX.createSaveState() calls, the SPU sub-message sliced out via FFI pointer arithmetic, decoded offline into JSONL by extract_audio_trace_from_sstates.py. Convergence becomes "for every retail vsync with audio playing, did the engine ever match?", applied frame-by-frame via first_audio_trace_divergence_multi.

The engine side runs a standalone legaia_engine_audio::Spu + optional Sequencer alongside a headless BootSession::tick, sampling voice / master / reverb state after each frame. JSONL is one AudioTraceFrame { frame, sequencer_playhead_ticks, sequencer_finished, master_volume, reverb_mode, active_voice_mask, voices[24] } per line. Convergence rule per retail frame: at least one engine frame's active_voice_mask is a superset of retail's mask AND for every retail-active voice the engine matches start_addr (when both sides report it).

PCSX-Redux's Lua API does not expose the SPU register file directly (SPUInterface::lockSPURAM is C++-internal, not bound). The probe leans on PCSX.createSaveState(), which returns the full state as a protobuf slice (~20 MiB); the autorun script walks the slice in-place via FFI and writes only the ~600 KiB SPU sub-message to disk so per-vsync GC pressure doesn't disrupt GPU::Vsync event delivery. The SPU schema is the one declared in PCSX-Redux's src/core/sstate.h + src/spu/types.h: Channel.Data.on || .stop is the retail-side "audible" criterion (ADSRInfoEx.state reads as Sustain even for unused voices, so it's not a reliable audibility signal).

Two known asymmetries the diff function explicitly models:

  1. Headless engine SPU. BootSession only attaches a real cpal AudioOut when enable_audio = true, which fails in CI. The oracle constructs a standalone Spu in parallel and routes scene-resolved BGM events into it. Not bit-identical to the retail SPU, but the voice-activity envelope is.
  2. Retail capture shape. The single-snapshot case freezes one SPU cycle; the multi-frame case carries per-vsync state. The engine produces frames + 1 records either way. NoFrameMatched stays tolerable drift in both modes; VoiceStartAddrMismatch and MasterVolumeMismatch are hard failures.

The engine drives BGM through a private TraceBgmDirector that routes field-VM op 0x35 events into a headless Sequencer in lock-step with SceneHost::route_bgm_events - so the engine trace does play music. NoFrameMatched is treated as tolerable drift (the scene prescript may not emit op 0x35 within the trace window, or may target a different track than retail captured); VoiceStartAddrMismatch and MasterVolumeMismatch are hard failures.

Entry points:

  • Library: engine_shell::audio_trace_oracle - build_engine_audio_trace, load_runtime_audio_trace_from_save, load_runtime_audio_trace_jsonl, first_audio_trace_divergence, first_audio_trace_divergence_multi, JSONL round-trip.
  • CLI: legaia-engine audio-trace --scene NAME (explicit), --scenario LABEL (single-snapshot vs .mc{slot} SPU), or --retail-jsonl PATH (multi-frame vs PCSX-Redux capture).
  • Disc-gated tests: audio_trace.rs auto-discovers scenarios with both expected_active_scene and an on-disk .mc{slot} save; audio_trace_multi.rs walks the same scenarios but skips unless LEGAIA_AUDIO_TRACE_JSONL_DIR points at a directory of <label>.jsonl files from the PCSX-Redux probe.

What's left

The per-scene .dpk / sound_data2 pack is decoded: it's a VAB + SEQ bundle (type-0 VAB header + type-1 VAB samples reconstitute one VAB; type-2 = SEQ), parsed by legaia_asset::sound_pack. The bse.dat master bank header is also pinned (the +2 ushort is an even byte-offset splitting it into two sections, not a record-count divisor). Residual: the .MAP / .PCH PsyQ SoundArtist intermediates are dev-side inputs that produce these VABs, not separate on-disc retail chunks.

See also