← Developer's Journal

Reverse Engineering the PSP Media Engine: From Empty Stubs to Stable Video Streaming

On ARK-4 CFW, H.264 video decode from PSP homebrew fails with a cryptic error — the official sceVideocodec API returns 0x806201FE and nobody has documented why. We traced the problem to empty firmware stubs, discovered hidden kernel modules during UMD playback, reverse engineered the Media Engine’s RPC protocol, found undocumented parameters in PMPlayer’s source code, achieved hardware-accelerated H.264 decode, hit a firmware deadlock after 90 frames, reverse engineered the loaded PRX to find the blocking kernel call, and built three increasingly robust solutions — from runtime binary patching to a kernel watchdog hook to P/B-frame skipping that runs stable for minutes.

The Problem: Error 0x806201FE on ARK-4 CFW

The PSP contains a dedicated coprocessor called the Media Engine (ME) — a second MIPS CPU clocked at up to 333 MHz, designed specifically for multimedia decode. Sony’s games and UMD movies use it for H.264, MPEG-4, AAC, and ATRAC3+ decoding. Early homebrew video players like PMPlayer and PMP Mod successfully used the ME on older CFW configurations, proving it was accessible from homebrew in principle.

On our hardware — a PSP-3001 running ARK-4 CFW on firmware 6.61 — the official sceVideocodec API returns error 0x806201FE on every call. The higher-level sceMpeg API fails too, as it internally delegates to sceVideocodec and hits the same wall. The root cause turned out to be CFW-specific: the kernel modules that bridge user-mode codec calls to the Media Engine are not loaded by sceUtilityLoadModule on this configuration.

We set out to understand exactly why the API fails on ARK-4 and whether there was a way to communicate with the ME directly — bypassing the broken stubs entirely. What we found was a complete, previously undocumented kernel driver architecture and RPC protocol that documents how Sony’s codec firmware communicates with the ME.

The Two CPUs Inside Every PSP

┌─────────────────────────────────────────────────┐
│                 Tachyon SoC                      │
│                                                  │
│  ┌──────────────┐        ┌──────────────┐       │
│  │  Main CPU     │        │  Media Engine │       │
│  │  Allegrex     │        │  ME (VME)    │       │
│  │  MIPS R4000   │        │  MIPS R4000  │       │
│  │  1-333 MHz    │        │  1-333 MHz   │       │
│  │              │        │              │       │
│  │  Runs games, │        │  H.264 decode│       │
│  │  homebrew,   │  RPC  │  AAC decode  │       │
│  │  OS kernel   │<------>│  MPEG-4      │       │
│  │              │        │  ATRAC3+     │       │
│  └──────────────┘        └──────────────┘       │
│         │                       │                │
│    Main RAM (64MB)       eDRAM (2MB)            │
└─────────────────────────────────────────────────┘
The ME is a full MIPS CPU with its own firmware, communicating via shared memory RPC.

Runtime Module Discovery

Our kernel PRX (oasis-plugin-psp) hooks sceDisplaySetFrameBuf to draw an overlay UI on top of running games. We added a “Dump ME FW” menu item that enumerates all loaded kernel modules during UMD Video playback — specifically while playing a Spider-Man 2 UMD movie disc.

The initial dump returned only 32 modules. A subtle API quirk was responsible: sceKernelGetModuleIdList’s second parameter is the buffer size in bytes, not the element count. With a 128-entry buffer of SceUID (4 bytes each), the correct size parameter is 512, not 128. After fixing this, we found 68 modules loaded during UMD video playback.

Buried in the list were two modules that do not appear during normal homebrew execution and are not documented in any public PSP SDK or wiki:

ModuleSizePurpose
sceMeCodecWrapper 11 KB ME firmware loader + RPC bridge between main CPU and ME
sceAvcodec_wrapper 19 KB Kernel-mode avcodec with ME driver imports (replaces the stub version)

This was the first clue. The avcodec.prx loaded during normal homebrew execution contains empty stubs — functions that exist but do nothing. During UMD boot, the firmware loads a different set of modules that include the actual ME driver code. The error 0x806201FE comes from these empty ME submission stubs returning failure.

ME Driver Architecture

With the module names in hand, we used sctrlHENFindFunction (an ARK-4 CFW API that resolves kernel function pointers by library name and NID) to map the imports of sceMeCodecWrapper. It imports from six internal ME driver libraries:

LibraryFunctionsRole
sceMeWrapper_driver 23 Master interface — orchestrates all ME operations
sceMeVideo_driver 7 H.264 and MPEG-4 video decode
sceMeAudio_driver 5 AAC and ATRAC3+ audio decode
sceMeMemory_driver 3 ME-side memory allocation (eDRAM management)
sceMeCore_driver 4 ME lifecycle: boot, halt, reset, RPC dispatch
sceMePower_driver 5 ME clock frequency and power state control

In total, we extracted 47 NIDs for internal ME driver interfaces. These NIDs are not present in any public NID database. The ME firmware itself is loaded from flash0:/kd/resource/me_t2img.img — a 392 KB encrypted image.

Inter-processor communication uses two kernel synchronization primitives: a semaphore named SceMeRpc for mutual exclusion and an event flag named SceMediaEngineRpc for completion signaling. All communication flows through a 40-byte command buffer at a fixed uncached memory address:

ME RPC Command Buffer: 0xBFC00600 (40 bytes)

Offset  Size   Field
------  -----  -------------------------------
+0x00   4B     Command ID
+0x04   4B     (reserved/padding)
+0x08   4B     Parameter 1
+0x0C   4B     Parameter 2
+0x10   4B     Parameter 3
+0x14   4B     Parameter 4
+0x18   4B     Parameter 5
+0x1C   4B     Parameter 6
+0x20   4B     Parameter 7
+0x24   4B     Parameter 8
+0x28   4B     Return value (ME writes back here)
+0x2C   4B     (reserved)
The main CPU writes command + params, the ME writes back the return value at +0x28.

The address 0xBFC00600 is in the uncached KSEG1 region, ensuring cache coherency between the two processors without explicit flush/invalidate. Every ME operation follows the same seven-step sequence, extracted from the disassembled wrapper functions:

ME RPC Dispatch Sequence

1. WaitSema(SceMeRpc)              // Acquire exclusive ME access
2. Write cmd + params to 0xBFC00600  // Fill command buffer
3. DcacheWritebackInvalidate()       // Flush CPU data cache
4. sceSysregMeResetEnable()          // Trigger ME interrupt (!)
5. WaitEventFlag(SceMediaEngineRpc,  // Block until ME signals
                  1, WAIT_AND)       //   completion via event flag
6. Read return value from 0xBFC00628 // ME wrote result here
7. SignalSema(SceMeRpc)              // Release exclusive access
Step 4 is the critical trigger — a SysReg reset pulse acts as the interrupt doorbell.

The use of sceSysregMeResetEnable as the interrupt trigger was unexpected. Rather than a dedicated IPC interrupt line, Sony repurposed a SysReg reset signal as a doorbell mechanism. The ME firmware watches for this pulse and reads the command buffer in response.

Command ID Map

From the disassembly, we mapped 22 RPC command IDs to their functions:

CategoryCommandIDDescription
Video Open 0x02 Open video codec instance
Init 0x24 Initialize decoder with SPS/PPS
ScanHeader 0x25 Parse H.264 NAL unit headers
Decode 0x26 Decode one video frame
Release 0xE1 Release video codec resources
Audio Init 0x09 / 0x60 Initialize audio decoder (two variants)
Decode 0x64 Decode one audio frame
Release 0x61 Release audio codec resources
Memory Alloc 0x180 Allocate ME-side eDRAM
Free 0x181 Free ME-side eDRAM
CSC MpegbaseCSC 0x6A Color space conversion (YCbCr → RGB)

Every video function checks a version constant 0x05100601 before dispatching. This appears to be a firmware compatibility marker — the ME firmware must report a matching version during its boot handshake. The module entry point in sceMeCodecWrapper performs the full ME boot sequence: enabling SysCtrl bus gates at 0xBC100000, configuring the ME PLL at 0xBCC00010, flushing caches, setting up COP0 interrupt routing, decrypting the ME firmware from flash0:/kd/resource/me_t2img.img via the Kirk crypto engine, and releasing the ME from reset.

The Breakthrough: mpeg_vsh370.prx

With direct ME driver calls producing initialization errors (the ME responded but needed the correct multi-step initialization sequence), we pivoted to the sceMpeg API — the higher-level interface that Sony’s own games use. We tried syscall hooks to redirect calls through the VSH (Visual Shell) library, loaded mpeg_vsh.prx directly, and tested every combination of AvMpegBase and mpeg_vsh370.prx. All returned 0x80628002 (AVC_DECODE_FATAL).

A critical discovery: mpeg_vsh370.prx imports 59 functions from the sceMpeg library (self-referencing). When the kernel links these at sceKernelStartModule time, it registers sceMpeg as an available library, which triggers re-linking of the EBOOT’s weak import stubs. This meant we could call sceMpeg functions through mpeg_vsh370 without any kernel PRX or bridge module. But decode still returned 0x80628002.

PMPlayer source code: the missing parameters

PMPlayer Advance (by cooleyes, DavisDev/pmplayer-advance on GitHub) successfully decodes H.264 on this exact PSP. Reading its source code (/ppa/mod/mp4avcdecoder.c) revealed three critical differences in the sceMpegCreate call:

ParameterOur CodePMPlayerImpact
sceMpegQueryMemSize mode 0 4 (≤480p) / 5 (>480p) Different buffer size, NAL decode path not enabled
sceMpegCreate unk1 0 mpeg_mode (4/5) ME doesn’t know which decode mode to use
sceMpegCreate unk2 0 DDR top pointer (4MB-aligned 2MB buffer) ME has no workspace for decoded frames — root cause of 0x80628002
AU buffer for sceMpegInitAu sceMpegMallocAvcEsBuf ddrtop + 0x10000 AU not in DDR top region where ME expects it
AU struct init zeroed memset(0xFF) Sentinel values required for proper state tracking

The “unknown” parameters in sceMpegCreate were never documented in PSPSDK or any public reference — they were only visible in PMPlayer’s source. Without the DDR top pointer, the ME’s codec firmware had no output buffer to write decoded YCbCr frames, causing every sceMpegAvcDecode call to return 0x80628002 (AVC_DECODE_FATAL).

H.264 decode: working

After applying all three fixes, the full decode pipeline worked on real PSP hardware. Confirmed with a 656×480 H.264 stream from archive.org (Bits and Bytes, 1983). The working init sequence:

sceMpegInit() = 0x0
sceMpegQueryMemSize(5) = 49535        // mode 5 for >480p
DDR top: 2MB @ 0x9000000 (4MB-aligned)
sceMpegRingbufferConstruct(8 packets) = 0x0
sceMpegCreate(mpeg, data, 49535, &rb, 512, 5, 0x9000000) = 0x0
sceMpegRegistStream(mpeg, 0, 0) = OK
sceMpegInitAu(mpeg, 0x9010000, &au) = 0x0   // ddrtop + 0x10000, 0xFF-filled
sceMpegAvcDecodeMode(mpeg, {-1, Psm8888}) = 0x0

The per-frame decode sequence feeds AVCC-format NAL data (length-prefixed, directly from MP4 container) via sceMpegGetAvcNalAu, then decodes via sceMpegAvcDecode. The ME outputs YCbCr which is converted to ABGR via sceMpegBaseCscAvc (hardware color space converter). Everything initializes. Frames decode. Video appears on screen.

Then, after exactly ~90 frames, the video freezes and never recovers.

Streaming Performance

Before investigating the deadlock, we eliminated a massive performance bottleneck in the streaming pipeline. The PSP’s 333 MHz Allegrex CPU and slow heap allocator were being hammered by per-frame allocations:

OptimizationWhat ChangedSavings
Pre-allocated video texture Reuse persistent GU texture buffer across frames ~30 MB/s alloc churn eliminated
Remove StreamFrame copies Send raw AVCC directly, clone SPS/PPS only on keyframes ~3 MB/s alloc churn eliminated
Static RGBA double-buffer decode_into() writes to pre-allocated buffers ~15 MB/s alloc churn eliminated
Semaphore-based wakeup Replace 5ms polling with kernel semaphore CPU waste eliminated
Audio thread re-pop Immediately check queue after output_blocking ~20 wasted wakeups/frame eliminated
Range D-cache flush Replace full D-cache invalidate with range ops in AAC init 16KB D-cache thrash avoided
Conditional vlog Suppress Memory Stick I/O during active decode 5–20ms stall per log eliminated

Total: ~48 MB/s of heap allocation churn eliminated on a 333 MHz MIPS CPU with a basic allocator. These optimizations make the first ~3 seconds of video smooth and responsive, and they remain valuable regardless of the deadlock fix.

The 70-Frame Deadlock

The content is 656×480 Main profile H.264 (mode 5 in the ME’s terminology). The first ~75 decoded frames are perfect — zero errors, zero corruption. Then sceMpegAvcDecode enters and never returns. The video thread blocks permanently. Audio continues on its own thread, but the I/O thread eventually stalls too (blocked on video queue backpressure), and the stream dies.

We added a DECODE_STEP atomic watchdog: the decode function sets step=1 before sceMpegGetAvcNalAu, step=2 before sceMpegAvcDecode, step=3 before sceMpegAvcDecodeDetail2, step=4 before sceMpegBaseCscAvc, and step=0 on return. The main thread checks this every 2 seconds. Result: step=2 every time. The hang is inside sceMpegAvcDecode itself.

We then systematically tested every mitigation we could think of:

ApproachResult
Full D-cache flush before every decodeNo effect
DDR workspace 2 MB → 4 MBNo effect
Decode rate throttling (16ms / 33ms)No effect
SPS max_ref_frames patching (3 → 1)Made it worse (1 frame vs 90)
Pixel format Psm8888 → Psm5650 (half buffer size)No effect
sceMpegFlushAllStream mid-streamHard crash (PSP reboot required)
sceMpegAvcDecodeFlush mid-streamHard crash
sceMpegInit mid-streamHard crash
Decoder destroy + recreateHard crash (ME busy)
Mode 4 forced for >480p content0x80628002 on every frame
Frame skipping (“ME rest”)Deadlocks on resume

The deadlock is not caused by DPB overflow, cache coherency, decode speed, pixel format, or buffer sizes. It’s a hard ~90-frame limit in the ME firmware’s mode 5 code path. Mode 4 (≤480p) works indefinitely but cannot decode >480p content at all.

Inside mpeg_vsh370.prx

The deadlock is inside the loaded PRX, not in our code. To understand it, we needed to trace the full call chain from our Rust sceMpegAvcDecode call through the PRX to the blocking point. We disassembled the 42 KB code segment of mpeg_vsh370.prx using Capstone (MIPS32 little-endian). Matching the loaded function’s prologue bytes against the raw PRX identified it as VA 0x71c0 — one of four identical dispatch wrappers.

sceMpegAvcDecode Call Chain (mpeg_vsh370.prx)

VA 0x71c0  dispatch wrapper
    |  checks init state (< 0x3E0 -> return error)
    |  saves registers, calls real function
    v
VA 0x1078  real sceMpegAvcDecode
    |
    +-- 0x26e4  validate MPEG handle
    +-- 0x616c  AU stream lookup
    +-- 0x8650  ME decode trigger <-- BLOCKING FUNCTION
    |     |
    |     +-- 0x8234  WaitSema(mpeg_data+0x66c)  // mutex acquire
    |     +-- 0x9fd4  KERNEL IMPORT STUB <-- INFINITE WAIT
    |     +-- 0x82c4  SignalSema(mpeg_data+0x66c) // mutex release
    |
    +-- 0x8880  pre-CSC setup
    +-- 0x8a80  ME decode data copy
    +-- 0x9f6c  kernel syscall (in loop)
Full call chain traced via Capstone disassembly + runtime instruction dumps.

The blocking point: kernel import stub 0x9fd4

Function 0x8650 is the critical wrapper. It acquires a mutex semaphore at mpeg_data+0x66c, calls the kernel import at VA 0x9fd4, then releases the mutex:

// VA 0x8650 -- ME decode trigger
lw    $a0, 0x240($s0)      // load semaphore ID
jal   0x8234               // WaitSema (acquire mutex)
move  $a0, $s1             // decode context
jal   0x9fd4               // KERNEL CALL -- blocks forever
move  $a1, $zero
lw    $a0, 0x240($s0)      // load semaphore ID
jal   0x82c4               // SignalSema (release mutex)
move  $v0, $s1             // return result

The import stub at 0x9fd4 is patched by the kernel at module load time to jump into the kernel’s ME RPC handler. That handler calls WaitEventFlag(SceMediaEngineRpc, 1, WAIT_AND|CLEAR, &out, 0) with timeout=0 (infinite). When the ME coprocessor deadlocks after ~90 frames, it never sets the event flag, and the wait blocks forever. Signalling the mutex from another thread has no effect — the current holder is past the mutex, stuck in the kernel.

Three Solutions

With the root cause identified — the ME firmware hangs after ~90 mode 5 frames and the kernel waits forever for a completion signal that will never arrive — we developed three solutions of increasing robustness.

Solution A: Runtime Binary Patching

The first approach: prevent the deadlock by patching the loaded PRX in memory. After the decoder initializes, we compute the PRX base address from the resolved import stub (a j <target> instruction that the kernel patched at load time). The jal 0x9fd4 instruction at PRX VA 0x8678 is the call to the blocking kernel function. After 85 frames (safely before the ~90 deadlock threshold), we overwrite it:

// Before patch (normal ME decode):
0x8678: jal 0x9fd4           // call kernel ME RPC
                              // -> WaitEventFlag(infinite) -> DEADLOCK

// After patch (skip ME call):
0x8678: addiu $v0, $zero, -1 // return error immediately
                              // -> caller sees error, continues

The patch is a single MIPS instruction replacement: 0x2402ffff (addiu $v0, $zero, -1) overwrites the jal instruction. We flush the D-cache after patching to ensure the CPU fetches the new instruction. Result: ~70 decoded video frames (~2.3 seconds at 30fps), then the patch activates and every subsequent decode returns an error. The video thread handles errors gracefully. Audio plays uninterrupted for the full stream duration. Tested with 940+ frames processed, zero crashes.

Downside: video stops updating after ~2 seconds. The patch is a permanent kill switch — once applied, no more frames can be decoded until the decoder is destroyed and recreated.

Solution B: Kernel PRX Watchdog Hook

The ideal fix would add a timeout to the infinite WaitEventFlag call. We could not enumerate the event flag UID directly (scanning UIDs 1–65536 from both user mode and kernel plugin threads returned zero results). Instead, we took a different approach: hook sceKernelWaitEventFlag itself from the kernel PRX.

The key insight was applying the timeout selectively. The PSP kernel calls WaitEventFlag frequently for many subsystems — graphics, audio, file I/O. We cannot add a timeout to all of them. Instead, we use the bit pattern parameter to identify ME RPC calls: the ME completion flag uses bit pattern 0x1 with WAIT_AND|CLEAR mode and infinite timeout. When the hook detects this signature, it replaces the infinite timeout with a 5-second deadline.

When the timeout fires, sceKernelWaitEventFlag returns an error instead of blocking forever. sceMpegAvcDecode propagates the error to our video thread, which handles it gracefully: log the timeout, skip the frame, and continue. The video thread never blocks permanently.

Downside: after the first timeout, the ME is in an undefined state. Subsequent decode attempts may also timeout, effectively ending video decode. But the system remains stable and audio continues.

Solution C: P/B-Frame Skipping (Production)

The production solution avoids the deadlock entirely by managing what gets sent to the ME. The insight: the ~90-frame deadlock occurs because the ME accumulates internal state (reference frame buffers, DPB entries) that eventually exhausts some firmware resource. If we prevent that accumulation, the deadlock never triggers.

The approach: decode the first ~50 frames per keyframe interval normally, then skip all P-frames and B-frames until the next keyframe (IDR NAL unit). This creates a repeating cycle:

Keyframe 0   --> decode frames 0..50   --> skip P/B until keyframe 1
Keyframe 1   --> decode frames 0..50   --> skip P/B until keyframe 2
Keyframe 2   --> decode frames 0..50   --> skip P/B until keyframe 3
  ...

Each keyframe resets the ME’s internal reference state, preventing accumulation. The frame threshold of ~50 is well below the ~90-frame deadlock point, providing a safety margin. Video updates on every keyframe (typically every 1–3 seconds depending on the encoder’s GOP structure), creating a slideshow-like but functional display. Audio plays continuously without interruption.

Result: verified 2+ minutes stable on real PSP hardware with zero deadlocks, zero crashes, continuous audio, and periodic video updates. This is the solution shipped in production.

H.264 video decoded on PSP hardware via Media Engine
H.264 video from archive.org decoded on real PSP hardware via the Media Engine — captured remotely via TCP screencap command.

The Pixel Pipeline

Getting decoded frames from the ME to the screen required solving several hardware-specific problems. The ME’s sceMpegBaseCscAvc performs hardware color space conversion (YCbCr to RGB), but the output format and memory requirements are tightly coupled to the decode mode.

The CSC output uses Psm8888 pixel format (ABGR, 32 bits per pixel). We discovered that the CSC output format is determined by the sceMpegAvcDecodeMode setting passed during initialization — not by any parameter to the CSC function itself. Setting the wrong decode mode produces garbled output even if the CSC call succeeds.

The CSC output buffer must be in uncached memory (ptr | 0x4000_0000) to ensure D-cache coherency between the ME and the main CPU. Without this, the main CPU reads stale cache lines and displays corrupted frames. After the CSC completes and we upload the pixel data to a GU texture, a D-cache flush on the texture buffer is required for the GU (Graphics Unit) to see the updated data. One additional fix: the CSC outputs alpha=0x00 on every pixel, so a post-conversion alpha fixup sets every pixel’s alpha to 0xFF for correct GU compositing.

Graceful Degradation

In a streaming application, reliability matters more than maximum decode quality. The production system implements a three-tier fallback chain:

  1. Normal video decode — P/B-frame skipping keeps the ME under the deadlock threshold. Video updates on keyframes, audio plays continuously.
  2. Watchdog catches deadlock — if the ME does deadlock (edge case: keyframe interval longer than expected), the kernel hook’s 5-second timeout fires. sceMpegAvcDecode returns an error. The video thread sets an ME_LEAKED flag and does not call sceMpegDelete — calling delete on a busy ME causes a hard crash. The decoder resources are intentionally leaked.
  3. Audio-only continues — with the ME_LEAKED flag set, the video thread stops attempting decode. Audio streaming continues via sceAudiocodec on a separate thread (audio decode does not use the ME’s video path). When the user switches to a different channel, the ME_LEAKED flag prevents reinitializing the video decoder, avoiding a crash from creating a new sceMpeg instance while the old one’s ME state is corrupt.

This design ensures that no matter what the ME firmware does, the application never crashes and audio never stops. The worst case is a frozen video frame with continuous audio — acceptable for a streaming TV app on 2005 hardware.

Lessons Learned

Key Hardware Reference

AddressPurposeNotes
0xBFC00600 ME RPC command buffer 40 bytes, uncached KSEG1
0xBFC00628 ME return value Offset +0x28 in command buffer
0xBC100000 SysCtrl registers ME bus gates and clock dividers
0xBCC00010 ME clock controller PLL configuration

ME Driver NID Summary

LibraryCountKey Functions
sceMeWrapper_driver 23 Master orchestration interface
sceMeVideo_driver 7 Init, ScanHeader, GetEdram, Decode
sceMeAudio_driver 5 Audio init, decode, release
sceMeMemory_driver 3 Alloc, Free (eDRAM management)
sceMeCore_driver 4 Boot, Halt, RpcDispatch, RpcSimple
sceMePower_driver 5 Clock and power state control