Inquiry on Voice-Over MMORPG Feature
As I have recently stated, I am working with a fledgling film production company to create a new kind of MMORPG. I have presently been charged with the task of gathering ideas on how to implement some of our planned features. Without any further adieu, I'd like to explain this feature and ask the developers in the community what they feel is the best way to achieve it, or any ideas on how to improve the system. Our 3D MMORPG is set in a medieval/fantasy setting. On of our key features will be the ability to chat with other players over a microphone, somewhat similar to Voice Over programs such as TeamSpeak / Ventrilo, and in-client features in games like Counter-Strike. What makes our idea of the system different is that you must be near another player to hear him/her speak. When a character speaks, they hold down a binded hotkey and begin to talk. An icon will appear over their head to show other players that they are speaking. In one of the game's options you can choose to hear up to x-y clients at the same time. In this case if you are nearby you'll hear that player speak in (with the help of a standarized codec) excellent clarity. If you have our second option selected, you'll see their red speech icon and if you click it, they will toggle to unmuted and you'll hear them speaking as well. Now, everyone's voice should be adjusted to be the same balanced volume. As your character comes closer to the speaker the sound will increase until it is as if you were sitting next to the person. However as you back away the voice will become fainter. Characters of different races will have a filter applied to their speech which modifies the voice so that it indeed sounds like a foreign language. As your character hears it more and more, over a long period of time and studies in the said language the filter becomes less and less prevolent until that character can understand the new language. Additionally, caves will echo sounds, certain spells will silence you and so on. We also hope to include certain voice-recognition features, for example: In order for a casting character to cast a spell, they must actually speak out the incantation (assuming of course that they have trained with the client-side voice recognition program). If they make a mistake the spell does not work. In this case another filter will be necessary since as the incantation nears completion the voice should sound deep and multilayered. I would love to hear the community's feed-back on how this might be achieved and any ideas or other suggestions are more than welcome. Please post constructively. I thank you for your time. *edited for silly spelling error
A big problem with voice chat in mmorpgs, is that you will likely want to stream voice data between the players in a sort of peer-to-peer way, due to data amount.
This however is a huge security risk, since the players will be able to sniff out the addresses of other players. And then they could do stuff like dos-attacks to drop other players.
This however is a huge security risk, since the players will be able to sniff out the addresses of other players. And then they could do stuff like dos-attacks to drop other players.
Shields up! Rrrrred alert!
Yep, I think you are biting off a little bit too much here.
Basically, you are saying that every player could potentially be talking at once. That means quite a few incoming voice streams (500-1000ish?). Then every one of those streams may be sent out to other people, so lets say each person is near only 5 other people. That is 6 streams (1 in, 5 out) per player. So per server, thats around (3000-6000) streams. I believe you are going to need eithe r alot of bandwidth or very good compression to pull that off.
Next problem is the computation you want to do on the voices. Voice recognition is slow enough running on my brand new laptop only trying to recognize my one voice. Imagine if you are trying to do the same with 500 speakers at once? I don't know if it is possible or not, but either way it is going to end up being a large part of the project budget to support.
Basically, you are saying that every player could potentially be talking at once. That means quite a few incoming voice streams (500-1000ish?). Then every one of those streams may be sent out to other people, so lets say each person is near only 5 other people. That is 6 streams (1 in, 5 out) per player. So per server, thats around (3000-6000) streams. I believe you are going to need eithe r alot of bandwidth or very good compression to pull that off.
Next problem is the computation you want to do on the voices. Voice recognition is slow enough running on my brand new laptop only trying to recognize my one voice. Imagine if you are trying to do the same with 500 speakers at once? I don't know if it is possible or not, but either way it is going to end up being a large part of the project budget to support.
Turring Machines are better than C++ any day ^_~
Thanks for the post, Pete. This was an issue that the Dev's have anticipated, I can recall. :D
Would some sort of non-p2p system be possible, such as a variety of voice servers for an area, lets say one map, and characters simply enter or leave certain channels depending on their location? Or perhaps there are free servers in an area that players belong to by default. When a player comes close to you or toggles your speech icon, they in effect invite you to a personal channel of sorts? If I'm not making sense here feel free to stop me. Thank you, and keep the feed-back a comin'.
Would some sort of non-p2p system be possible, such as a variety of voice servers for an area, lets say one map, and characters simply enter or leave certain channels depending on their location? Or perhaps there are free servers in an area that players belong to by default. When a player comes close to you or toggles your speech icon, they in effect invite you to a personal channel of sorts? If I'm not making sense here feel free to stop me. Thank you, and keep the feed-back a comin'.
Quote:
Original post by intrest86
Yep, I think you are biting off a little bit too much here.
Basically, you are saying that every player could potentially be talking at once. That means quite a few incoming voice streams (500-1000ish?). Then every one of those streams may be sent out to other people, so lets say each person is near only 5 other people. That is 6 streams (1 in, 5 out) per player. So per server, thats around (3000-6000) streams. I believe you are going to need eithe r alot of bandwidth or very good compression to pull that off.
Next problem is the computation you want to do on the voices. Voice recognition is slow enough running on my brand new laptop only trying to recognize my one voice. Imagine if you are trying to do the same with 500 speakers at once? I don't know if it is possible or not, but either way it is going to end up being a large part of the project budget to support.
Indeed, the task is daunting, but we think the concept can revolutionize how players interact with games, and as a result: how immersive the game may become.
What if a lot of the streaming was p2p, but with servers that simply directed the players to one another and acted as a buffer to prevent sniffing. In that scenario, you'd set your game settings to whatever you can handle. If you can't handle more than 5 characters speaking to you at once, you could limit it in options. Also if the system is in part p2p it would divide the burden equally among the players. Wouldn't it?
intrest86, In regards to our voice regonition features, we plan on having it much more crude than current systems for comprehesive typing. The game need not understand exactly what is being said, only if it falls under a category of "Player wants to buy" "Wants to sell" "How are you?" "Wants to repair item" to interact with the voice-recorded NPCs on a script, or "Xeu Xeiu Chiet" as recognized as a spell. We're obviously in early stages and trying to explore how much client side processing power would actually be necessary for such a task. Thanks for the thoughts, btw. I appreciate all the feedback. =]
The problem is, if you want to apply filters and cast spells by speaking the words, the server will have to receive the sound, process it (see if it's a spell), add the appropriate filter for EVERY player in the range, then send the modified sound (each player hears it differently, depending on race, language skill, etc.) to the players in the range.
Now, unless you require broad band as a requirement for the players (ie. no dialup and ISDN) and have a LOT of bandwidth on your server, this will be impossible.<br>There is no way to make the system p2p, as the players will find a way to cheat the filter, and send the sound just like that, unencrypted.<br>This idea as a whole is nice, but I don't think today is the right time to implement it. Maybe in 5-10 years, when the BB internet will be as normal as a telephone.
Now, unless you require broad band as a requirement for the players (ie. no dialup and ISDN) and have a LOT of bandwidth on your server
We implement in-world voice chat in our platform (OLIVE) and it works quite well. If you want to try it out, you can go to There and sign up for a free trial account. We send voice chat through our servers, to increase reliability and be able to implement necessary controls; the bandwidth cost really isn't that big of a deal.
Note that There uses an older version of our code with higher latencies and not as good voice level management as our current military project, but it should give you a good idea of what you can do (or license :-) in this area.
For getting voice attenuation by distance, we use Direct3D or similar spatialization technologies, and they work fine. We have also worked hard on integration between avatars to make the voice-in-3d-world integration come out as a whole new medium, rather than just a "CB radio" kind of feature.
When it comes to voice recognition, I'd suggest doing it on the speaking client end, and sending the recognized voice commands along using side-band signalling. The reason for this is that otherwise, each client would have to run recognition on all other clients incoming, for a multiplying of necessary processing. Also, the signal is always clearer at the recording end, before compression. Maybe that's what you're thinking, already.
Regarding speech filters, it's really quite hard to garble voice in a way that it still sounds like speech, but isn't recognizable. The best I can think of involve a combination of formant shifting, and time-based overlap-add with jittering of the windows, or maybe even running (some?) windows backwards. Music-DSP is a good mailing list for these kinds of things.
Note that There uses an older version of our code with higher latencies and not as good voice level management as our current military project, but it should give you a good idea of what you can do (or license :-) in this area.
For getting voice attenuation by distance, we use Direct3D or similar spatialization technologies, and they work fine. We have also worked hard on integration between avatars to make the voice-in-3d-world integration come out as a whole new medium, rather than just a "CB radio" kind of feature.
When it comes to voice recognition, I'd suggest doing it on the speaking client end, and sending the recognized voice commands along using side-band signalling. The reason for this is that otherwise, each client would have to run recognition on all other clients incoming, for a multiplying of necessary processing. Also, the signal is always clearer at the recording end, before compression. Maybe that's what you're thinking, already.
Regarding speech filters, it's really quite hard to garble voice in a way that it still sounds like speech, but isn't recognizable. The best I can think of involve a combination of formant shifting, and time-based overlap-add with jittering of the windows, or maybe even running (some?) windows backwards. Music-DSP is a good mailing list for these kinds of things.
enum Bool { True, False, FileNotFound };
A little offtopic:
I went to There and tried the Sign up page, and I got this:
I think it is poor taste to allow only IE in order to just sign up. Are you using ActiveX for that?
I went to There and tried the Sign up page, and I got this:
Quote:
We've noticed that you're currently using a non-supported browser.
Please switch to Internet Explorer v. 5.0.1 or later to continue.
You can get the latest version of IE free at
http://www.microsoft.com/windows/ie/default.asp.
After downloading and installing Internet Explorer, please launch it and go to:
http://webapps.prod.there.com/register
in order to continue the registration process (you should cut and paste or write this link down for when you're ready to return).
You do not need to switch your default browser settings.
Return to There.com
Member Agreement | Privacy Policy | Behavior Guidelines
© Copyright 2005. There Inc. All rights reserved.
I think it is poor taste to allow only IE in order to just sign up. Are you using ActiveX for that?
I am having some difficulty with the said program. Chiefly, I suspect I'm required to purchase a membership in order to test out the voice features. Perhaps you could contact me so that I might be able to test it without any of the other features.
If it is what is appears to be though, we'll definately look into implementing something similar or perhaps contacting your people to help us develop the system.
Once again, thank you for the input.
If it is what is appears to be though, we'll definately look into implementing something similar or perhaps contacting your people to help us develop the system.
Once again, thank you for the input.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement