In your case streaming the data might be more efficient. If you send it as just another type of data through your game server then you can perform actions on the player object to determine where the voice is coming from and encode it properly. The big thing in games these days is 3D sound. In your scenario this theoretically could be solved by using the vectors that the players are currently on.
Look forward to more response here, though you will probably get a better response in the general programming section.
Kressilac
------------------
Derek Licciardi