Integrating Photon Voice with FMOD for Voice Chat: A Custom Implementation

Voice chat is essential for multiplayer experiences, but integrating it properly with a professional audio middleware like FMOD requires careful consideration. In this guide, I'll walk through a custom FMOD integration for Photon Voice that goes beyond the stock implementation, using FMOD Studio Events, DSP callbacks, and proper signal routing to give you complete control over your voice chat audio pipeline.

**Note:** This article describes a custom implementation that extends Photon Voice's FMOD integration with event-based architecture, configurable buffers, and spatial audio support. The source code is not publicly available, but feel free to reach out if you're implementing something similar and get stuck - I'm happy to help!
Dara (dara@SkywardSonicStudios.com)

Why Build a Custom FMOD Integration?

While Photon Voice includes basic FMOD support, a custom implementation provides significant advantages:

Unity audio engine can be disabled - Improved performance and reliability by going fully FMOD
Event-based architecture - Use FMOD Studio Events instead of raw Sound objects for better mixer integration
Real-time DSP processing - Process audio on the DSP thread before transmission
Configurable buffers - Dynamic jitter buffer and ring buffer sizing
Spatial audio support - Full 3D positioning for voice chat
Professional signal routing - Complete control through FMOD's mixer with effects, sends, and returns
Robust error handling - Production-ready defensive programming patterns

The Architecture

The custom integration uses three main components:

Key Components

FMODMicrophoneEventReader - Custom event-based microphone capture with DSP callbacks
FMODRecorderSetup - Photon Recorder setup with factory pattern for device selection
FMODSpeaker - Enhanced speaker with spatial audio and configurable jitter buffers

Why Event-Based Instead of Raw Sounds?

The stock Photon FMOD integration uses raw FMOD Sound objects. This custom implementation uses FMOD Studio Events with Programmer Instruments, which provides:

Full mixer integration - Events flow through your mixer hierarchy
Real-time processing - Apply effects within the event before broadcast
DSP callback access - Process audio on FMOD's DSP thread
Better debugging - See audio flow in FMOD Studio profiler

Unity Setup

Microphone (Recorder)

Add these components to your voice recorder GameObject:

Recorder (Photon Voice)
FMODRecorderSetup (Custom)

The custom FMODRecorderSetup uses a factory pattern to create the event-based reader:

[RequireComponent(typeof(Recorder))]
public class FMODRecorderSetup : VoiceComponent
{
    [Tooltip("The FMOD event to use for recording.")]
    public EventReference recordEvent;

    [Tooltip("The FMOD recording device ID to use. System Default is 0.")]
    public int fmodDeviceID = 0;

    private Recorder _recorder;

    protected override void Awake()
    {
        base.Awake();
        _recorder = GetComponent<Recorder>();
        _recorder.SourceType = Recorder.InputSourceType.Factory;
        _recorder.InputFactory = CreateFmodReader;
    }

    private FMODMicrophoneEventReader<short> CreateFmodReader()
    {
        return new FMODMicrophoneEventReader<short>(
            RuntimeManager.CoreSystem,
            recordEvent,
            fmodDeviceID,
            (int)_recorder.SamplingRate);
    }
}

This creates an instance of the custom event-based reader that captures through FMOD Studio.

Speaker (Playback)

Add the custom FMODSpeaker component for voice playback:

[AddComponentMenu("Photon Voice/FMOD/FMOD Speaker")]
public class FMODSpeaker : Speaker
{
    [SerializeField]
    [Tooltip("Playback the speaker in 3D space or 2D")]
    private bool spatialSpeaker;
    
    [SerializeField]
    [Tooltip("The FMOD Studio Event to use for playback.")]
    private EventReference eventReference;

    protected override IAudioOut<float> CreateAudioOut()
    {
        // Get jitter buffer config from VoiceChatManager
        var configuredPlayDelayConfig = GetConfiguredPlayDelayConfig();
        
        var instance = RuntimeManager.CreateInstance(eventReference);

        if (spatialSpeaker)
        {
            RuntimeManager.AttachInstanceToGameObject(instance, gameObject);
        }
        
        instance.start();
        
        return new AudioOutEvent<float>(
            RuntimeManager.CoreSystem, 
            instance, 
            configuredPlayDelayConfig, 
            Logger, 
            string.Empty, 
            true);
    }
    
    private PlayDelayConfig GetConfiguredPlayDelayConfig()
    {
        if (VoiceChatManager.Instance?.VoiceSettings != null)
        {
            var settings = VoiceChatManager.Instance.VoiceSettings.Value;
            return new PlayDelayConfig
            {
                Low = settings.jitterBufferLow,
                High = settings.jitterBufferHigh,
                Max = settings.jitterBufferMax
            };
        }
        
        // Fallback defaults
        return new PlayDelayConfig { Low = 200, High = 200, Max = 1000 };
    }
}

Key features:

Spatial audio toggle - Attach to GameObject for 3D positioning
Centralized settings - Jitter buffer config from VoiceChatManager
Graceful fallback - Default values if config unavailable

FMOD Mixer Setup

This is where the magic happens. Here's the signal flow architecture:

Microphone Signal Path

Programmer Instrument Event - Receives mic input from FMOD recording device
DSP Callback - Process audio in real-time on DSP thread (optional)
Event Effects - Apply processing within the event (compression, EQ, etc.)
Group Bus - Routes to mixer for additional processing
Pre-Fader Send to Return Bus - For local monitoring (optional)
Gain Plugin (Pre-Fader) - Mutes the main signal
Photon reads from event - Captures processed audio for transmission

FMOD Microphone Event (Setup as an action).

Microphone Event Programmer Instrument (Set to loop).

Critical Detail: Pre-Fader Gain vs Post-Fader Volume

Important: Use a Gain plugin to mute the recorder signal, NOT the channel fader. Here's why:

Using the post-fader volume disables audio processing entirely in FMOD when muted for everything pre-fader
The Gain plugin (pre-fader) keeps the signal chain alive while muting the output
This allows you to tap the signal via a send before it's muted
The send goes to a return bus for local monitoring

Monitoring/Feedback Control

To allow players to hear themselves (optional, see image above):

Create a Return Bus for monitor audio
Add a Send from the microphone group before the muting Gain plugin
Control monitor level via the Return Bus fader or VCA
The main signal remains muted via the Gain plugin

Speaker Signal Path

Programmer Instrument Event - Receives voice from other players
3D Spatializer (optional) - For positional audio
Group Bus - Routes through mixer
Multiband EQ - Voice clarity (cut below 80Hz to remove boom an unnecessary noise)
Limiter - Prevents clipping from voice peaks
Master Output - Mixed with game audio

Signal Routing Decision: Where Does Photon Listen?

The custom implementation reads directly from the event before it hits the group bus. This means:

Processing within the event (compression, EQ, pitch) is captured and transmitted
Processing on the group bus (sends, reverb) is NOT transmitted
This gives you control over what gets sent vs what's local-only

The Custom Event-Based Reader

Architecture Overview

The FMODMicrophoneEventReader is the core innovation. Here's how it works:

public class FMODMicrophoneEventReader<T> : IAudioReader<T>
{
    // Dual sound system
    private AudioDataBundle _dataBundle;
    
    private class AudioDataBundle
    {
        public Sound Recorded;      // Captures from FMOD recording device
        public Sound Manipulated;   // Assigned to programmer instrument
        public float[] DataBuffer;  // DSP processing buffer
    }
    
    // FMOD Event integration
    private EventInstance _eventInstance;
    private EVENT_CALLBACK _soundEventCallback;
    private DSP_READ_CALLBACK _dspReadCallback;
}

Why dual sounds?

Recorded - Captures microphone input via FMOD's recording system
Manipulated - Assigned to the programmer instrument in the event
This separation allows DSP processing between capture and playback

Initialization Flow

public FMODMicrophoneEventReader(
    FMODLib.System coreSystem, 
    EventReference eventRef, 
    int device, 
    int suggestedFrequency)
{
    // 1. Validate parameters
    // 2. Initialize device info
    // 3. Create recording sound
    _coreSystem.createSound("Photon AudioIn", MODE.OPENUSER | MODE.LOOP_NORMAL, 
                            ref exinfo, out _dataBundle.Recorded);
    
    // 4. Start recording from device
    _coreSystem.recordStart(device, _dataBundle.Recorded, true);
    
    // 5. Create manipulated sound for event
    _coreSystem.createSound("EventRec", MODE.OPENUSER | MODE.LOOP_NORMAL, 
                            ref exinfo, out _dataBundle.Manipulated);
    
    // 6. Setup DSP buffer
    _coreSystem.getDSPBufferSize(out var bufferLength, out _);
    _dataBundle.DataBuffer = new float[bufferLength * bufferMultiplier];
    
    // 7. Create event instance with callbacks
    _eventInstance = RuntimeManager.CreateInstance(eventRef);
    _eventInstance.setUserData(GCHandle.ToIntPtr(_dataBundleHandle));
    _eventInstance.setCallback(_soundEventCallback);
    _eventInstance.start();
}

The Programmer Instrument Callback

This is where the recorded audio is assigned to the FMOD event:

[AOT.MonoPInvokeCallback(typeof(EVENT_CALLBACK))]
private static RESULT EventCallback(EVENT_CALLBACK_TYPE type, IntPtr instancePtr, IntPtr parameterPtr)
{
    var instance = new EventInstance(instancePtr);
    instance.getUserData(out IntPtr dataPtr);
    var dataHandle = GCHandle.FromIntPtr(dataPtr);

    switch (type)
    {
        case EVENT_CALLBACK_TYPE.CREATE_PROGRAMMER_SOUND:
            if (dataHandle.Target is AudioDataBundle audioData)
            {
                var parameter = (PROGRAMMER_SOUND_PROPERTIES)Marshal.PtrToStructure(
                    parameterPtr, typeof(PROGRAMMER_SOUND_PROPERTIES));
                
                // Assign our recorded sound to the programmer instrument
                parameter.sound = audioData.Recorded.handle;
                parameter.subsoundIndex = -1;
                
                Marshal.StructureToPtr(parameter, parameterPtr, false);
            }
            break;
            
        case EVENT_CALLBACK_TYPE.SOUND_STOPPED:
            // Clean up when stopped
            if (dataHandle.Target is AudioDataBundle audioData)
            {
                audioData.Recorded.release();
            }
            break;
    }
    
    return RESULT.OK;
}

This callback is critical - FMOD asks "what sound should play?" and we provide our recording buffer. The event then plays this through its DSP graph, applying any effects in the event.

DSP Read Callback (Optional Advanced Feature)

For real-time processing on the DSP thread:

[AOT.MonoPInvokeCallback(typeof(DSP_READ_CALLBACK))]
private static RESULT CaptureDSPReadCallback(
    ref DSP_STATE dspState, 
    IntPtr inBuffer, 
    IntPtr outBuffer, 
    uint length, 
    int inChannels, 
    ref int outChannels)
{
    // Get our data bundle from user data
    var functions = (DSP_STATE_FUNCTIONS)Marshal.PtrToStructure(
        dspState.functions, typeof(DSP_STATE_FUNCTIONS));
    functions.getuserdata(ref dspState, out IntPtr userData);
    var objHandle = GCHandle.FromIntPtr(userData);
    
    if (objHandle.Target is AudioDataBundle obj)
    {
        int lengthElements = (int)length * inChannels;
        
        // Copy input to our buffer
        Marshal.Copy(inBuffer, obj.DataBuffer, 0, lengthElements);
        
        // *** PROCESS AUDIO HERE ***
        // Apply custom DSP effects to obj.DataBuffer
        
        // Copy processed audio to output
        Marshal.Copy(obj.DataBuffer, 0, outBuffer, lengthElements);
    }
    
    return RESULT.OK;
}

This runs on FMOD's DSP thread at audio rate - perfect for low-latency processing like noise gates or ducking.

Reading Audio for Photon

The Read method delivers frames to Photon Voice:

public bool Read(T[] readBuf)
{
    // Get current recording position
    _coreSystem.getRecordPosition(_device, out uint micPos);
    
    // Track buffer wraparound
    if (micPos < _micPrevPos)
        _micLoopCnt++;
    _micPrevPos = micPos;
    
    var micAbsPos = _micLoopCnt * _bufLengthSamples + micPos;
    var nextReadPos = _readAbsPos + readBuf.Length / _deviceInfo.Channels;
    
    // Only read if we have enough data
    if (nextReadPos >= micAbsPos)
        return false;
    
    // Lock and copy from ring buffer
    var lockOffset = (uint)(_readAbsPos % _bufLengthSamples * _sizeofT * _deviceInfo.Channels);
    var lockSize = (uint)(readBuf.Length * _sizeofT);
    
    _dataBundle.Recorded.@lock(lockOffset, lockSize, 
        out IntPtr ptr1, out IntPtr ptr2, 
        out uint len1, out uint len2);
    
    // Copy to Photon buffer
    Marshal.Copy(ptr1, readBuf as short[], 0, (int)len1 / _sizeofT);
    if (ptr2 != IntPtr.Zero)
        Marshal.Copy(ptr2, readBuf as short[], (int)len1 / _sizeofT, (int)len2 / _sizeofT);
    
    _dataBundle.Recorded.unlock(ptr1, ptr2, len1, len2);
    
    _readAbsPos = (uint)nextReadPos;
    return true;
}

The ring buffer approach ensures continuous capture without dropouts.

Configurable Settings

The implementation uses a centralized VoiceChatManager for settings:

public struct VoiceSettings
{
    // Microphone settings
    public int microphoneRecorderRingBuffer;  // Default: 2000ms
    public int microphoneDspBuffer;           // Default: 8x multiplier
    
    // Jitter buffer settings
    public int jitterBufferLow;   // Default: 200ms
    public int jitterBufferHigh;  // Default: 400ms
    public int jitterBufferMax;   // Default: 1000ms
}

Why configurable?

Ring buffer size - Larger = more latency tolerance, smaller = lower latency
DSP buffer multiplier - Affects real-time processing capacity
Jitter buffers - Network condition adaptation

FMOD Studio Project Setup

Microphone Event Configuration

Create a new event: Events/Voice/Microphone
Add a Programmer Instrument to the timeline
Set the event to loop (right-click timeline → Loop)
Optional: Add effects to the event (compression, EQ, noise gate)
Route to a new Group bus: Groups/Microphone

Microphone Group Bus Configuration

Pre-fader effects: Additional processing if needed
Send (pre-fader): Route to Groups/Monitor Return at -∞ dB by default
Gain plugin (pre-fader): Set to -∞ dB to mute the direct path
Post-fader effects: None needed (signal is muted)

Monitor Return Bus

Create return bus: Groups/Monitor Return
Add a VCA to control monitor level from game code
Route to master

Speaker Event Configuration

Create a new event: Events/Voice/Speaker
Add a Programmer Instrument to the timeline
Set the event to loop
Optional: Add 3D Spatializer if using spatial audio
Route to a new Group bus: Groups/Speaker

Speaker Group Bus Configuration

Add a Multiband EQ for voice clarity:
- High-pass filter: 80-100 Hz (remove rumble)
- Presence boost: +3-6 dB at 2-4 kHz (clarity)
- De-ess: -2-4 dB at 6-8 kHz (reduce sibilance)
Add a Limiter:
- Ceiling: -2.0 dB
- Release: 10ms
- Prevents clipping from loud voices
Route to master

How the Audio Pipeline Works

Complete Microphone Flow:

FMOD captures microphone input to Recorded Sound
Event callback assigns Recorded to Programmer Instrument
FMOD plays the sound through the event's DSP graph
Effects process audio (compression, EQ, etc. in event)
Event outputs to Group Bus
Send taps signal for local monitoring (pre-mute)
Gain plugin mutes the direct signal
Photon reads processed audio from the event
Photon encodes and transmits over network

Complete Playback Flow:

Photon receives compressed audio from network
Photon decodes to PCM float samples
AudioOutEvent writes to FMOD Sound buffer
Programmer instrument callback assigns Sound to event
FMOD plays through event (with spatializer if enabled)
Event outputs to Speaker Group Bus
EQ enhances voice clarity
Limiter prevents clipping
Master outputs mixed with game audio

Performance Considerations

Use short (PCM16) for microphone input to reduce bandwidth
Use float (PCMFLOAT) for speaker output for better quality
Ring buffer size of 2000ms provides good latency/stability balance
DSP buffer multiplier of 8x handles most processing needs
Jitter buffers should adapt to network conditions
FMOD's MODE.LOOP_NORMAL enables continuous ring buffer operation

Common Gotchas

Don't forget to enable the define: Make sure PHOTON_VOICE_FMOD_ENABLE is set in your project scripting defines
Muting strategy matters: Always use pre-fader Gain plugin, never post-fader volume (this is likely a performance optimization in FMOD that disables processing when a channel is muted)
Event must loop: The programmer instrument event MUST be looped for continuous playback
Monitor feedback: When enabling local monitoring, be careful of feedback loops - always mute the main signal with the Gain plugin
GC handles: Ensure proper cleanup of GCHandle allocations to prevent memory leaks
Callback marshalling: Use [AOT.MonoPInvokeCallback] attribute for IL2CPP compatibility

Troubleshooting

No audio playing

Check that PHOTON_VOICE_FMOD_ENABLE is defined in your project settings
Verify the programmer instrument event is set to loop
Ensure the event reference is properly assigned in the inspector
Check FMOD profiler to see if the event is running

Hearing myself (local feedback)

Check that the Gain plugin on the microphone bus is set to -∞ dB
Verify the monitor return bus is muted by default
Ensure you're not accidentally routing the mic signal to master

Choppy or distorted audio

Increase ring buffer size in VoiceChatManager settings
Increase jitter buffer values for worse network conditions
Check FMOD DSP buffer size (should be 512 or 1024 typically)
Verify sample rates match (48kHz recommended)
Check CPU usage - DSP thread may be overloaded

Crashes in callbacks

Ensure GCHandle is allocated before event starts
Check that GCHandle.Target is the correct type before casting
Use defensive null checks in all callback code
Never throw exceptions in callbacks - log and return OK

Memory leaks

Always call Dispose() when cleaning up
Free GCHandles in the DESTROYED callback
Release all FMOD sounds and event instances
Stop recording before releasing sounds

Advanced: Spatial Audio

The custom FMODSpeaker supports 3D spatial audio:

// In your FMODSpeaker component
spatialSpeaker = true;  // Enable in inspector

This attaches the FMOD event instance to the GameObject, enabling:

Distance attenuation - Volume decreases with distance
3D panning - Sound position based on GameObject location
Occlusion/obstruction - If configured in FMOD event
Doppler effect - If configured in FMOD event

Configure spatializer settings in the FMOD event:

Add 3D Spatializer to the event
Set Min/Max distance curves
Configure Sound size for realistic positioning
Add Occlusion parameter if needed

Summary

This custom FMOD integration for Photon Voice provides professional-level control over voice chat:

Key innovations:

Event-based architecture for full mixer integration
Dual-sound system with DSP callback support
Configurable buffers for performance tuning
Spatial audio support for 3D games
Robust error handling for production use

The signal flow:

FMOD records microphone → Event processes → Photon transmits
Photon receives → Event plays → FMOD mixes → Master output

Critical mixer technique:

Use pre-fader Gain plugin to mute while keeping processing alive
Send pre-mute signal to return bus for optional monitoring
Apply effects within the event for transmitted processing
Apply effects on group bus for local-only processing

This architecture ensures your voice chat sounds professional, integrates seamlessly with your game audio, and gives you complete control over every aspect of the signal flow.

Resources

Source Code:

Assets/Audio/Voice Chat/Scripts/FMODMicrophoneEventReader.cs - Custom event-based reader
Assets/Audio/Voice Chat/Scripts/FMODRecorderSetup.cs - Recorder setup component
Assets/Audio/Voice Chat/Scripts/FMODSpeaker.cs - Enhanced speaker component

External Documentation:

This post covers a custom FMOD integration for Photon Voice with event-based architecture, DSP callbacks, and spatial audio support. For more audio development insights and VR game development content, stay tuned.