Advertisement

MMO Network Layer Architecture

Started by September 17, 2013 09:08 AM
15 comments, last by wodinoneeye 11 years, 4 months ago

Hi,

I'm developing a game server for an online game, and I have some questions regarding how to handle threading from the network layer to the main loop. Some of the articles I have read regarding high performance servers actually raised the questions I'm about to ask. Here are the two articles if anyone is interested:

http://pl.atyp.us/content/tech/servers.html

http://www.kegel.com/c10k.html#books

My main concern after reading these articles is the context switching. I've previously developed a prototype game server using python using microthreads/green threads to avoid context switches (to a large extent anyway). In this prototype I used two processes where they shared a msg queue. The network process would enter a message into the queue and then wait (its really one of the green threads waiting,..) for the game loop process to finish its processing. Once the game loop is finished it notifies the waiting green threads about the result (this has to be done, because the client expects a new state and a result from the previous operation). This sending of messages back and forth between processes/threads is one of the items the first articles discusses as a performance killer for servers. My question is now how to best remedy the situation when I'm developing the real server using C# 5.0.

The articles describes a way to remedy the situation of too many context switches by allowing the same thread that handles the incoming network connection to also take care of the processing and then send back result to client. In this model we may have say 10-15 threads (just a suggestion), waiting for a connection on the network side, once a connection has been established and message received, the thread turns into a worker. The worker then tries to aquire a lock (shared by the 10-15 network threads), and processes the message and sends back result. Afterwards the lock is released and the thread waits for next connection etc.

Does anyone have any experience with this kind of setup? Pros/Cons? One of the obvious pros is the ease of knowing who sender/receiver is as you're responsible for the current connection.

My goal is to be able to serve 50k messages/second. The game is not heavy when it comes to math, so the game loop won't have an issue crunching 50k messages/second. I'm more concerned with the network issue...

Thanks in advance!

Given you mention C#, you already have some very good tools in place you should be leveraging for for this. Specifically the async socket functions and the thread pool work together to give you a very fast way to process connections without blocking a large number of threads. While waiting for an async function to complete, .NET will release the thread back to the thread pool so that it can be used to process other connections. I suggest you have a read up on them and make sure you understand how the two interact.

If your server is running a standard main loop type arrangement for the actual game simulation you will want some way to communicate with it from the async handlers. Personally I make use of .NETs ConcurrentQueue. The items posted into the queue include metadata on which client sent the message etc.

This is not necessarily the fastest way to process a single message, but it is designed to scale nicely to a large number of them. Here are some starting points for you:

http://msdn.microsoft.com/en-us/library/5w7b7x5f.aspx

http://msdn.microsoft.com/en-us/library/bbx2eya8.aspx

Between these you get much of what the article is discussing.
Advertisement

Given you mention C#, you already have some very good tools in place you should be leveraging for for this. Specifically the async socket functions and the thread pool work together to give you a very fast way to process connections without blocking a large number of threads. While waiting for an async function to complete, .NET will release the thread back to the thread pool so that it can be used to process other connections. I suggest you have a read up on them and make sure you understand how the two interact.

If your server is running a standard main loop type arrangement for the actual game simulation you will want some way to communicate with it from the async handlers. Personally I make use of .NETs ConcurrentQueue. The items posted into the queue include metadata on which client sent the message etc.

This is not necessarily the fastest way to process a single message, but it is designed to scale nicely to a large number of them. Here are some starting points for you:

http://msdn.microsoft.com/en-us/library/5w7b7x5f.aspx

http://msdn.microsoft.com/en-us/library/bbx2eya8.aspx

Between these you get much of what the article is discussing.

What you are suggesting is what I'm trying to remedy. smile.png

Your suggestion (or assumption) with a main thread and a lockless queue that each connection will post messages to might run into problems related to context switching. This is outlined in one of the articles I referenced. Another issue with this design is who returns the result to the client? You will most likely have another queue for this task, which will incur another context switch. I'm trying to get rid of these context switches by using the threads as workers, as recommended by the article. This design, however, might not conform to best practise (atleast not when seen from high cohesion/low coupling point of view..), but I'm trying to make the server as efficient as possible and am thus willing to make a few sacrifices for this.

I intend on using what you mention in your post (async), but the question is related to who actually makes the computation. Is it a dedicated thread (by your assumption) or should every thread that handles a connection take care of this themselves (to avoid context switches and data transfers among threads).

Thanks for the links!

While I agree that context switching and data copies are a problem for scaling, taking things to far trying to avoid them is equally problematic in other ways. Let's just look at the initial networking layer and what you are considering doing to it. First off, under low load, a typical async network model is going to be something along the following:

1) Issue 1..n async begin's.
2) Eventually go into a wakeable wait state.
a) If there are any waiting async results, there is no context switch, the thread just starts processing them. Assume that this simply reads the message, posts it to a queue and issues a new async begin.
b) Sleep until there is an async result.

Nothing surprising here but unfortunately defined in the manner you seem to want to avoid. The question is if the avoidance makes sense? First off, you need to consider other locations of bottlenecks in an MMO server. One is directly related to the things you are trying to solve, the context switches and data copy/moves. Another is generally something which is often attributed to other things incorrectly and that is over utilization of the receiver thread. I actually had a very specific discussion about the context switch costs and reasons why some of those costs are a "good" thing in several cases. The cost itself is never a "good" thing, the decoupling is, but not simply in the software architecture but also in the server behavior. The little testbed I wrote modified things in the following manner:

1) Issue 1..n async begin's.
2) Eventually go into a wakeable wait state.
a) If there are any waiting async results, there is no context switch, the thread just starts processing them. Simulate 1-10ms processing time for each message, i.e. moving the processing to be in the receiver thread so there is no context switching nor data copies being made.
b) Sleep until there is an async result.

As the old saying goes, everything is fine, until it is not. Initially this sort of thing looks great because the lower CPU utilization and easily provable lower buss traffic. Unfortunately where it fails, it not only fails but it completely falls on it's face because of the unpredictable nature of having the simulation (sim of a sim in this case smile.png) in the same thread. As you scale up, the CPU utilization in the server goes up fairly smooth, the latency on message processing is fairly consistent and everything looks great. Unfortunately as soon as the numbers get high enough the CPU utilization starts varying massively due to the numbers of longer running processing items which just happen to start lining up on the threads, which in turn means more messages pile up which in turn causes the threads to start falling behind, then playing catchup, falling behind again etc. This is unfortunately a self reinforcing issue and if the CPU hits 100% utilization for any extended time, boom, the entire house of cards falls flat on it's face until the server is locked at 100% CPU, the network buffers are completely full, and even the clients end up stalling because their outgoing buffers are full.

The problem with such a system is the inherent lack of predictability tied in at two locations. One location is the network traffic itself, it often comes in in bursts for various reasons ranging from game play events causing everyone to start moving all of a sudden (flash crowds to see something interesting for instance) or the internet itself having to reroute due to a busy router somewhere etc. The other lack of predictability is in trying to process the messages as part of the same thread, some messages require more work than others, that's just a fact of life in games. The combination of the two items self reinforce the failure point on the threads/networking layer and I'd argue that the cure in this case is worse than anything else. Using the queue solution removes one side of predictability and "buffers" the burst data in between in hopes that the actual server processing can play catch up before the queue in between explodes. You can also watch the queue inbetween as a control system to say speed up the processing side (allocate more threads if possible), decrease sending rates of the clients before things explode, etc etc.

All this and I've not even really gotten into the discussion of the context switching. In reality, with a MMO, message processing is/should not be a cause of constant contention on synchronization and as such context switching should be minimal. As with the network side, the actual switches that do take place should be because the overall system is working 'well' and is not stressed out. It is the combination of high frequency switching and a stressed CPU/memory buss environment which will cause scaling fails, normal operation *does* include a reasonable amount of context switching though, that is unavoidable and actually desirable. And, as to the memory copies, if the messages are processed into individual reference counted objects when placed on the queue, there is no copy/move cost beyond that of a pointer.

So, long story short. I would argue that you are worrying about the wrong things. There are many gotcha's involved in this but an over abundance of context switches and/or memory copies is caused by other things than the separation of the network code and the processing code.

Thanks for the detailed explanation of your own results!

How would you solve the output/result issue with queues though?

We have one queue for the main thread, and one for output I presume? But would you have separate threads in the network layer that would only handle the output part, sleeping while no result is available? In this pipe-line approach the only thing bothering me is the output part.

I'd appreciate any ideas.

Thanks!

Context switching is not a problem until it is a problem.

Assuming that you're using a modern server on a gigabit Ethernet link, you are highly unlikely to be able to see context switching in the top of your profile.

Assuming your Internet link is smaller to begin with (say, 100 Mbit) this is even less likely.

That being said, the modern Async I/O functions in C# are great, because they allow you to avoid a much more likely source of problems: high garbage collector pressure.

So, if you use the Async APIs on sockets, and let the system deal with worker threads, you'll probably do just fine!

enum Bool { True, False, FileNotFound };

For the most part what you have learned about threading is 10 year old stuff. Times have changed. Thread per connection/user is pretty much a dead approach.

The right way to do threading is to have a rather small thread pool,and multiplex jobs between them. This is usually done with message passing and immutable objects.

Take a look at Akka for a good example of how to do concurrency well. It's probably the simplest abstraction for concurrency I've used, have been fairly impressed with it for this kind of task.

Trying to do concurrency in C# is an exercise in frustration, especially on mono. Every language has things it does well and things it does badly. Both will work up to a point, but when you hit that point, you will start swearing up and down at C#.

Chris

Advertisement

For the most part what you have learned about threading is 10 year old stuff. Times have changed. Thread per connection/user is pretty much a dead approach.

The right way to do threading is to have a rather small thread pool,and multiplex jobs between them. This is usually done with message passing and immutable objects.

Take a look at Akka for a good example of how to do concurrency well. It's probably the simplest abstraction for concurrency I've used, have been fairly impressed with it for this kind of task.

Trying to do concurrency in C# is an exercise in frustration, especially on mono. Every language has things it does well and things it does badly. Both will work up to a point, but when you hit that point, you will start swearing up and down at C#.

Chris

I never suggested a thread/connection. Quite the contrary. The discussion is about the message passing (and work to be done with said message) after it has been received by the network layer. I'm merely inquiring about what experience other people have with this type of approach, or if a "standard" shared queue is the way to go.

I'll look into Akka.

Luckily I won't be using .NET on Mono. :)

/Philip

Given the way the async works, you generally have to consider things a bit differently between input and output. The "input" (i.e. reading from client) side is basically something like:

AsyncBegin
Sleep (or wait for run control messages)
-> AsyncResult, associate message data with internal server object Id, post to queue
-- Issue another AsyncBegin

Now, given how async *write* works, this pattern is not proper for the output side. If you do an Async Write Begin, it typically triggers a result instantly, at least to start with because the buffer is empty until you start pushing stuff onto it of course. So, you don't issue the begins until there is something to actually be sent to the specific session/connection. Even then, you generally don't want to start triggering begin/waits in this case because it is a waste if there is no more data to be posted. (I.e. doing so would cause a mass of those context switches you don't like. smile.png) So the write side is a bit more complicated and delves into architecture bits. Let me just generalize things at a high level, this is not a perfect picture but hopefully gives us common terminology to work with:

Read threads:
1..n network clients --> messages received on server are mapped to the "session" for each client
--> session id/object/whatever + message put on queue (I.e. multiplexing all messages to a queue)

MMO guts, more threads:
message queue -->
. process the available messages into the various objects
.. By nature this is demultiplexing the messages back to internal server objects represented by the session.
. run game loop --> produces messages to "sessions" somewhere in here
. repeat at some simulation rate

Send threads, much more painful than read threads:
Sessions get messages from the game loop/simulation/whatever. I.e. the game loop itself demultiplexed everything.
. Is there an outstanding async in progress?
. Yes- buffer the data.
. No- try a direct send, if would block, send a portion, buffer the rest and kick off the async begin for writing.
.. on async results, if no more data in buffer, don't issue a new begin.

Actual send threads only contain async's outstanding for sessions with buffered data. This means a communications system to wake such threads and tell them, start a send async for xxx session. This has to be carefully designed or you will be beating on mutex's and all that constantly and this can/will cause your nasty context switching issues. (NOTE: this is actually a problem for most AIO networking API's, they are designed with TCP bulk transmission in mind but a game only sends a bit here and there usually.)

Now, a network server can/should be more asynchronous than a standard client game loop so I generalized that considerably. The point of the walk through though is that with three threads in this simple (yeah, right smile.png) outline, you only have two locations to worry about context switching with and they are easy to remove as problems. For instance, the queue between the reads and the game loop does not actually need to be a queue, nor does it need to be shared between threads. Instead of a queue, each "read" thread simply writes to it's own copy of a vector. Once a game loop the server says, let me have all the contents, you start filling this new empty vector. This does impose a mutex lock/contention (potential context switch) but once a loop is very acceptable. (With a centralized queue, things are MUCH more expensive.) The output side is much the same using a different system, posted items would be in pooled blocks of a fixed size, locklessly post new blocks into the outgoing queue (remember, per session, so the likely hood of hitting actual contention is near zero so even a mutex lock would be acceptable) only if there are async operations already in progress, otherwise attempt to send it without bothering to touch the writer thread. (NOTE: May want a queue to let the writer thread actually do the send even if there is no buffer, would have to see if there are any notable costs in calling the API directly.)

Hopefully this gives you a full high level picture of things. This is not the "only" way to do things nor even necessarily the best, but there is nothing here which causes any outstandingly high number of memory copies or context switches unless you implement one of the pieces wrong. A test client and do nothing server using all of this can be written in a couple days. Removing any remaining performance issues a couple more days. Generalizing, getting clean startup/shutdown, yet another couple days. Throwing 1k simulated client connections at it, a day. Fixing things when it falls on it's face with 1k connections sending/receiving 5kb/s each, a couple more days. Making an MMO out of the framework by yourself, see you in your next life. smile.png

I'm with you on the details of the IO part, and yes the network layer for my game is probably not something I'll write on my own either way (I'd much rather focus on the game engine :) ). Any recommendations with regards to open source alternatives?

Just to clarify exactly how the game server/client will be used in practise: The game server won't be exposed directly to the players. The players will play the game on mobile devices and desktop web browsers (it's a browser based game). The front-end servers will in turn connect to the game server and then return result to player in either html or json (depending on device). In other words, the communication between the front-end and back-end need not handle authentication or be encrypted in any way (they are both on the same network behind a firewall).

Trying to do concurrency in C# is an exercise in frustration, especially on mono.


I don't quite understand this statement.

C#, the language, actually has a few useful features for concurrency (lock, async/await.)

The .NET system libraries have some *excellent* features for concurrency, especially with the later "Async" versions of I/O instead of the "Begin" versions. (The Begin versions are still better than Java, but do generate garbage per operation.)
I've found the mono implementation of the system libraries to be shoddy, though, just like the mono implementation of pretty much any other library feature. Perhaps this is what you're referring to?

The WinForms UI framework, by contrast, is not so great for concurrent and threaded operations. Luckily, most servers don't need to worry about this UI problem much.
enum Bool { True, False, FileNotFound };

I think you are really trying to "put the cart before the horse". If you're *really* making a MMO, then the funding you have for content-creation should mean that you can afford to hire some experts for the network parts.

On the other hand, if you're making a smaller online multiplayer game (which is fine, too!) then you really don't need to worry about "the C10k problem". Pretty much any approach will definitely work.

Personally I can't really see a situation where any amateur-developed game is likely to hit the limitations of networking, unless you've made some incredibly bad decisions on the protocol design - which is something you should really think about early on.

There are a lot of hard things about network game programming, and the things you've mentioned are not part of them.

This topic is closed to new replies.

Advertisement