Advertisement

MMOs and modern scaling techniques

Started by June 10, 2014 01:26 PM
65 comments, last by wodinoneeye 10 years, 4 months ago

Ping times are a source of complexity and gameplay challenges, but they are not a source of scalability problems.

Ping times for wired connections will not drop dramatically in the future, because they are bound by the speed of light -- current internet is already within a factor of 50% of the speed of light, so the maximum possible gains are quite well bounded.

enum Bool { True, False, FileNotFound };

Ping times are a source of complexity and gameplay challenges, but they are not a source of scalability problems.

Ping times for wired connections will not drop dramatically in the future, because they are bound by the speed of light -- current internet is already within a factor of 50% of the speed of light, so the maximum possible gains are quite well bounded.

does the relativity efects show in the gameplay ? ;/

i understand that "showing" of the problems is the response delay

that server give to the client, and the art is to keep it low below some treshold.. could maybe someone know how the treshold is (is this a sum

of time of sending info to server + server processing time + time of sending response to the client?) how values are this to feel tha game is really fine?

(sorry for basic questions in more advanced thread but if I get an opportunity i wold like to understand things a bit, maybe also some thoughts will appear ;/)

Advertisement

Ping times are a source of complexity and gameplay challenges, but they are not a source of scalability problems.

Ping times for wired connections will not drop dramatically in the future, because they are bound by the speed of light -- current internet is already within a factor of 50% of the speed of light, so the maximum possible gains are quite well bounded.

does the relativity efects show in the gameplay ? ;/

i understand that "showing" of the problems is the response delay

that server give to the client, and the art is to keep it low below some treshold.. could maybe someone know how the treshold is (is this a sum

of time of sending info to server + server processing time + time of sending response to the client?) how values are this to feel tha game is really fine?

(sorry for basic questions in more advanced thread but if I get an opportunity i wold like to understand things a bit, maybe also some thoughts will appear ;/)

Since there is always a delay in the information passing through the server (the ping), in order to not show it to the player, you need to either:

-Show the server answer to the player after a set time X.

**Play some animation of length X before showing the player whether you succeeded in something or not (which is dictated by server)

**The player cannot tell the lag because the only way to tell lag is the time it takes for the server to answer, and we have hidden that information

**This breaks down if the ping gets higher than X

-Predict the servers answer accurately

**This can probably never work to 100% accuracy, but it can be used to hide most of the lag resulting from the ping

**Eg predict actions of other players at current time, although you cannot know this until X seconds later because this info comes from the server (and thus has delay)

**You can also predict the result of your own actions if an authoritative response is required from server. Eg if you open a chest, the client can ASSUME that theres nothing in there (with the clients luck), which is most often correct, to hide the ping, but this might be wrong and then has to be corrected when the 'real' information is available.

So you can either hide the lag of obtaining information, or predict the information before obtaining it.

o3o


does the relativity efects show in the gameplay ? ;/

Absolutly. Mostly because of mass increase of the data packages starting to be significant at about one third of lightspeed. If you throw a stone onto another player it will magically produce more hitpoints. This is why many people change place into countries that are closer to the game servers.

@up

Im curious what are the response times of todays internet infrastructure,

assume such model: world and a 1000 players no it

there is NOW frame, everything is perfect ok on the map, each player moves and sends its new position to the server, it takes various times to them +10 +30 +70 ms (i dont know)

When server will receive the last position we can count the FUTURE frame (also perfect) then w send this future state back to the players

now we can do it again

(this is a kind of my imaginary model but probably can be realized)

this kind of working will "pulse" with the frequenzy of more laggish player,

so lets take an approach that when some most lagish players will delay

after some treshold we can throw them off out the game

I wonder how theshold in milliseconds can be set that will allow to alive at least a half of most speed connections? (I got no idea as i am not programming network or even not playing network games)

It would be interesting to me estimate this time. I worry that if each sending times has some variation (i mean like gaus 10+-300) this killing connection approach will kill most of the players - but wold very like to

get some idea how many players will be survive in such a test

(i know games uses some techniques that allow masking delays etc

but would be curious how much it wold take in raw perfect sate of things)

has someone as they say some "intuition" how many players would stay alive at which delay treshold here?


does the relativity efects show in the gameplay ? ;/

Absolutly. Mostly because of mass increase of the data packages starting to be significant at about one third of lightspeed. If you throw a stone onto another player it will magically produce more hitpoints. This is why many people change place into countries that are closer to the game servers.

as far as i know variety effect could appear, for example when you will send the data with near light spead and it will bac the data should be much younger than the data that was staying at home

Advertisement

Ping times are a source of complexity and gameplay challenges, but they are not a source of scalability problems.

Ping times for wired connections will not drop dramatically in the future, because they are bound by the speed of light -- current internet is already within a factor of 50% of the speed of light, so the maximum possible gains are quite well bounded.


I'm not sure if I'm misreading you. But I feel like what you said is very misleading. The real world performance of network infrastructure is not even slightly approaching 50% light. We typically max at 20% in best case scenarios.

The majority of transit time is eaten up by protocol encoding/decoding in hardware, and improving the hardware or the protocol can dramatically increase transit latency. Ex. Going from tcp to infinband inside a cluster can reduce latency from 2milliseconds to nanoseconds.

Not saying it's practical by any means, but we're bound by switches/protocols far more than light.
I'm not experienced with MMOs at all, however I am with scalability, games and networking; and I have a couple of comments.

In recent discussions with web and app developers one thing has become quite clear to me - the way they tend to approach scalability these days is somewhat different to how game developers do it. They are generally using a purer form of horizontal scaling - fire up a bunch of processes, each mostly isolated, communicating occasionally via message passing or via a database. This plays nicely with new technologies such as Amazon EC2, and is capable of handling 'web-scale' amounts of traffic - eg. clients numbering the the tens or hundreds of thousands - without problem. And because the processes only communicate asynchronously, you might start up 8 separate processes on an 8-core server to make best use of the hardware.

I wouldn't put so much trust on "how web developers approach scalability".
Not too long ago, we had the C10K problem. Browsing through the net, recommendations were "just use fork, it scales very well". Turns out, fork had an initialization overhead (which like you said, you preallocate to prevent this problem). Then they said "use one socket per thread, super scalable! The Linux Kernel does magic for you. Thus select and poll were recommended" and someone digged up the Linux Kernel src and found that the Kernel linearly walks through the list of sockets to know which process/thread needs to be delivered. Some of the algorithms had O(N^2) complexity.
Someone fixed this, and then we got epoll.

But still the problem remains that we have one socket per TCP connection and that sucks hard. On a a C10M problem article points out these problems, and points at the driver stack as the biggest bottleneck.

The first time I deeply digged into scalable networking for a client project, I met this bizarre architecture flaw and no one talked about it as if there was no problem at all. Then that C10M article appeared, and I was relieved to hear a voice that finally someone pointed the same problems I saw.

So, no. I don't trust the majority of web developers in doing highly scalable web development. Most of the time they just get lucky their servers don't get enough stress to (D)DoS. But my gut is that if they were better at that job, they could handle the same server load with far less farm budget.

Sure, at a very high level with distributed servers like Amazon EC2, these paradigms work. But beware that a user waiting 5 second for the search results of their long-lost friend on Facebook is acceptable(*). A game with a 5 second lag for casting a spell is not.
Half an hour of delay until my APK gets propagated across Google Play Store servers is acceptable and reasonable. Half an hour of delay until my newly created character gets propagated so I can start playing is not.

(*) Many giants (i.e. Google, Amazon) are actively working on solutions as Amazon (or was it Apple) found out a couple milliseconds improvement in page loading correlated with higher sales.

Web content has a much higher consumption rate than production rate. Games have the annoying property that have both frequent read and write access to everything (you can mitigate by isolating, but there's a limit).
Frequent write access hinders task division, which is necessary for scaling across cores/machines/people/whatever.
So this is an interesting topic actually. The trend is to move reliability back up to the layer that defined the need in the first place, instead of relying on a subsystem to provide it.

Just because the network layer guarantees the packets arrive doesn't mean they get delivered to the business logic correctly, or processed correctly. If you think 'reliable' udp or tpc makes your system reliable, you are lying to yourself.

http://www.infoq.com/articles/no-reliable-messaging

http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.txt

http://doc.akka.io/docs/akka/2.3.3/general/message-delivery-reliability.html

Ok, but I'm still not following. Of course reliable transport doesn't mean correct business logic. But these are 2 separate issues.

Your first link basically makes the argument for application-level sequence numbers because it wants to preserve business logic even when the transport is down. That's reasonable, but it's not a key concern for most games. If game server 1 has lost its connection to game server 2 for some reason, you probably have bigger problems than making sure Jimmy's Magic Missile is propagating across servers properly. There's no guarantee you can recover in a meaningful way so it is arguably best to abort entirely. The exception would be for any sort of real-money transaction or similar, but as I hope was made clear in earlier posts, I'm happy with those being done in a more complex and yet more robust way, as they are a small minority of what a game server normally handles.

The other two links elaborate on this and say things like "The only meaningful way for a sender to know whether an interaction was successful is by receiving a business-level acknowledgement". That's fine, but not necessarily relevant. As above, a reliable transport layer will give you (short of astronomically unlikely occurrences) a guarantee one of two things happened:

  1. The message arrived at the intended place in an undamaged form. All is well if you coded the receiving method properly.
  2. The message didn't arrive, and your game is therefore broken.

With this in mind, the reliability guarantee given by the OS is going to be sufficient for any typical gameplay functionality.

So the problem with a pure message-passing approach remains that of whether it is practical to code all gameplay features to work in that way, given that it's not necessary.

First: Try writing a robust real-time physics engine that can support even a small number like 200 players with vehicles on top of the web architecture that @snacktime and @VFe describe. I don't think that's the right tool for the job.


I'd be inclined to agree. smile.png I was going to add the caveat of no real-time Newtonian physics, but I left that out to keep things simple. Perhaps that was a mistake.

You'd think that, given how much has been said on the matter, that there would be at least one instance of people talking about using different methods, but I've not seen one.


Personally, I've actually talked a lot about this in this very forum for the last ten-fifteen years. For reference, the first one I worked on was There.com, which was a full-on single-instance physically-based virtual world. It supported full client- and server-side physics rewind; a procedural-and-customized plane the size of Earth; fully customizable/composable avatars with user-generated content commerce; voice chat in world; vehicles ridable by multiple players, and a lot of other things; timeframe about 2001. The second one (where I still work) is IMVU.com where we eschew physics in the current "room based" experience because it's so messy. IMVU.com is written almost entirely on top of web architecture for all the transactional stuff, and on top of a custom low-latency ephemeral message queue (written in Erlang) for the real-time stuff. Most of that's sporadically documented in the engineering blog: http://engineering.imvu.com/


I have seen you post a lot about the There.com architecture and in general the games I've worked on have shared many of its characteristics, so that side is known to me.

However I don't recall seeing much detail on the IMVU stuff. I've started taking a look at that link (thanks) but I've skimmed through the whole thing and I can't see anything specific to the handling of state changes, or multi-node transactions, or decomposing complex behaviour into messages, etc.

This topic is closed to new replies.

Advertisement