Hello Simon
A good starting point is the PsychToolboxSound protocol in the repository.
When the state machine reaches the state that triggers the sound, a USB message containing 1 byte is sent to the MATLAB side. That byte is called a "soft code". In this case, it is either "255" (stop playback), or 1-254 (the index of the sound to play).
The soft code byte is handled by a SoftCodeHandler function in your protocol folder, which can map the byte to any MATLAB function. In this case, it is mapped by SoftCodeHandler_PlaySound to trigger the PsychToolboxSoundServer plugin's "Play" and "StopAll" functions.
Bpod updates the state machine and processes all outputs every 100 microseconds - so in theory, "play" and "stop" can be sent to MATLAB 100 microseconds apart. Unlike Bpod's onboard channels and modules, the computer's OS is not real-time, so there will be latency (typically ~8ms) and jitter (typically ~1.5ms). In your task, if the subject pokes out 1 millisecond after it pokes in, the instructions to play and stop will be sent 1ms apart - but the OS jitter will make the actual start and stop times harder to predict.
PsychToolboxSoundServer gets you partially around this issue for a reaction time task. When you load a sound with PsychToolboxSoundServer('load', SoundID, Waveform), the left and right audio channels are loaded with the sound in the "Waveform" vector (which can be a 2xn vector if you need stereo). Fortunately, Xonar DX has 5 more audio channels. Alongside your sound data, the 'load' function loads 1ms of max voltage to channels 3-4, to create a TTL pulse coincident with sound onset. The Xonar DX contains a 1/8" stereo -> RCA cable. If you attach this part to the end of the RCA cable, you can connect the sound-onset signal directly to one of Bpod's BNC input channels - so you have the moment the sound actually played, recorded on Bpod's clock as a BNC1High event.
This would serve all of your needs if the task had a fixed-duration stimulus. Unfortunately, for a reaction time task, we cannot predict in advance when the port-out event will occur - so we can't load a waveform with a "sound-off" TTL pulse at the future time the animal will withdraw. The best you can do for now, is to characterize the distribution of sound onset latencies on your system, add its mean to the time Bpod instructed the sound to stop on each trial, and keep in mind that your trial by trial measurements of offset time have ~1.5ms of jitter.
I hope this helps - please write in if you run into any challenges setting up!
Cheers,
Josh