Hello Yuri,
I'd choose any camera that has a TTL output that toggles with each frame (also called a "strobe" GPIO pin). The camera output TTL can be patched into the state machine BNC input channels, or the 'S' terminal on a port interface board to create timestamps in your behavior data for each frame. If frames will be acquired continuously, I'd structure your protocol using TrialManager, (e.g. this) to ensure that you don't lose frame sync pulses during inter-trial downtime.
The RJ45 jacks on the state machine are not Ethernet ports - they wire wired for RS-485, and are used to exchange event bytes between the state machine and its peripheral modules.
I hope this helps,
-Josh