Advertisement

Responsive mobile multiplayer - UDP or TCP?

Started by July 25, 2013 11:03 PM
7 comments, last by lerno 11 years, 4 months ago

I'm taking a second look at the architecture for my multiplayer game, and I'm revisiting pros/cons of using TCP vs UDP.

The game is currently using TCP/IP and that's what I'm used to. I used to work at one of the major poker networks and we had game servers running about 5000 players on each server instance without problems.

However, poker is very forgiving in terms of latency.

My current game is a sort of RPG/Strategy hybrid, where you move around on a world map of a fantasy world (reminiscent of the world map of old school RPGs such as Ultima 4 or early Final Fantasy).

Bandwidth usage / Simultaneous connections

Since the typical play time will be around 5-10 minutes a day(!), even a huge player base will only show a rather modest amount of simultaneous players.

In addition, the game is split into game worlds that each only hold around 100 players, so 1-2 simultaneous players in a single game world would probably be standard. Add to this that in general you only see the actions of players in the exact same world grid as yourself - and the world is 80x80, and it's easy to see that broadcasting results of a player's actions will only be a very modest addition to the bandwidth.

The hunt for responsiveness

Given the low requirements on simultaneous connections, we can look at the network as if there was just a single player.

Typically, a player will want to navigate around the map as fast as possible. This is especially true when "trading", i.e. moving from one well known village to another nearby village, trading goods. This could be something like "move west, move west, move north, trade, move south, move east, move east, trade" repeated over and over again.

Now, using a phone to play the game, even when the phone is connected to the same wifi router as the development computer, the time it takes to complete an action varies greatly in a seemingly random manner.

Since turns are taken in quick succession, this random delay feels even more jarring.

Animation will help a bit

As long as the delay isn't too long, using an animation (as opposed to directly updating the position as in the development version), will help smoothen the experience.

The longer the delay, the more of the longer delays are hidden, but at the same time all movement will be slower.

Game design

The current design basically takes each collected commands from all players in a world and process them command by command, sending notifications and state updates after each command has processed ok.

A client is guaranteed to either see a successful result, or get an "ActionFailed(<action type>, <reason>)" message back.

One of the fundamentals of the game is the fog of war. Initially, the entire map is hidden, and throughout a game (which may last over a month), each player gradually explores it. Secondly a player only sees other players in the exact same square.

Trying to imagine this game with UDP

Regardless of communications, it's essential that the client receives the server response to a move as there are three things that might happen that the client cannot predict:

1. The player clears part of the fog of war and reveals more of the map.

2. The player walks into a map location with one or more players, revealing their current sizes and more.

3. The player enters a part of the map that just changed due to another player's action (magic changed the environment, another player conquered that area, or built a fortification there)

4. Another player attacked, or in some other way constrained the player before the movement could be processed.

These issues are there regardless of transmission protocol.

(1) could be fixed sending the game map, but we still have (3) which are changes to the base game map.

(2) could be handles by sending the all players' locations and data, but we still have the chance that someone attacked or did something else to invalidate those values.

(3+4) these can't be known beforehand.

In other words, it looks like reliable UDP would be necessary.

How could reliable UDP be better than TCP?

One thing is that the game doesn't really care how the game changes are sent, so if we timed out the ack for update n, and we're in the process of sending n + 4, then it's ok to send [n, n + 1, n + 2, n + 3, n + 4] instead of [n + 4] and ignore timeouts for n + 1, n + 2, n + 3.

Mobile and wifi networks are fragile, UDP might have a better chance to recover from connection issues, as we don't need a reconnect.

But is it really?

UDP won't solve the original problem with variable latency over the network, it might not even help too much with the cases where packets are lost. And it definitely makes things more complex.

Help me - what's the verdict?

Implementing UDP would be fun, but I wouldn't mind using a low level solution such as eNet either. The question is if it will pay off.

I've tried to write down as much detail as I could. Hopefully someone has done something similar and has advice.

P.S. Computer and phone latency are two quite different things. Even using wifi, the phone is much slower with a significantly worse quality connection.

I'm mostly coming from the PC side, but IMHO TCP is the winner, for three reasons.

1. Less work. It's already reliable and in-order so you have no worries about reinventing those wheels.
2. Stateful connections built in. You can tell if your TCP socket goes away fairly easily, which may well be important on mobile devices.
3. UDP may not even be an option due to security restrictions on real carrier networks.


You're also not sending so much data that the difference in having a reliable/unreliable split layer really matters. You can just make everything reliable and not worry about drowning your bandwidth. Latency is going to be a much bigger issue and UDP won't fix that automatically, especially not if you go the route of building/using a reliable UDP layer.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Advertisement
Since you write that you are seeing performance variations even on local wifi, I suspect that you have other issues. Performance through local wifi should be rock solid. There are the usual culprits like the Nagle algorithm, but it seems like it could be something deeper. That part seems like an implementation bug.

The TCP vs UDP debate really boils down to this: When things are flowing smoothly over the wire there will be very little performance difference between the two. When data exceptions happen UDP allows partial information to flow, TCP blocks until all the data arrives.


The exceptional case data flow --- partial and out-of-order vs stalled but complete and in order --- is what you must consider.


I'd ask if your game is designed so that partial information is enough. If you are trading goods, and you say "sell a, sell b, buy c", are you okay if a and b don't go through but c does, how does that affect your game play?

For some games this is not an issue. In a fast-paced FPS or RTS you can update some units if you have their information, then correct for missing data later after it gets resent. In this design UDP's partial information is useful. Proceeding with partial information is a little more work, but provides a better user experience than stalling until all data has arrived.

For other games where ordering is important, where you cannot reasonably proceed with item b until item a has been processed, this will be an issue. You cannot proceed with missing data. In this design UDP provides no benefit but incurs a cost. Using TCP in this scenario is easier than writing your own reliable protocol.


For a trading game, I would very carefully study your design to ensure UDP's partial data approach would be acceptable.

Performance through local wifi should be rock solid.


I have friends who live in apartment buildings with > 30 WiFi access points fighting for spectrum. I'm "lucky" in that I can only see five others from my place. Even so, I think the Netgear I'm using sometimes actually hiccups, as there are drop-outs that shouldn't happen. Or maybe it's Comcast ISP. But, from a game's point of view, it doesn't matter -- crappy consumer internet will generate temporary hiccups, be it TCP or UDP.
enum Bool { True, False, FileNotFound };

I'm mostly coming from the PC side, but IMHO TCP is the winner, for three reasons.

1. Less work. It's already reliable and in-order so you have no worries about reinventing those wheels.
2. Stateful connections built in. You can tell if your TCP socket goes away fairly easily, which may well be important on mobile devices.
3. UDP may not even be an option due to security restrictions on real carrier networks.

You're also not sending so much data that the difference in having a reliable/unreliable split layer really matters. You can just make everything reliable and not worry about drowning your bandwidth. Latency is going to be a much bigger issue and UDP won't fix that automatically, especially not if you go the route of building/using a reliable UDP layer.

What I've noticed is that there's quite a difference between mobile connections and regular PC connections. TCP is mostly smooth for PCs, even to the point that 3G -> phone -> bluetooth -> PC gives a smoother experience than directly browsing on 3G using the phone. I can only speculate in the reason for that.

Regarding your points:

1. There's the option of going with some fairly low level protocol on top of it, such as enet. That would make this quite a bit less of a hassle.

2. Statefulness isn't necessarily great. In unstable network conditions - imagine something like 50% packet loss - TCP would have a real difficult time re-establishing a connection, while for UDP there's no reconnection and no need to do the login handshake again. That could be quite advantageous, especially if you keep the same ip+port when moving from one 3G antennae to another - but I suspect that isn't the case. Still, one could use the ping to signal that one's ip has updated and get a fairly fast reconnect anyway.

3. This is a concern, but from what I heard it's not all that much of a problem in a server-client architecture where the server sits on a static IP.

I should mention that my wifi situation is a bit interesting (and probably accounts for part of my pathological latency). Basically the wireless router is centrally placed, but my development computer and most other places where I test are near the bounds of the wifi's reach.

This gives me a lot of insight in how [poorly] the phone behaves when working with a bad wifi connection (and what happens when it switches back and forth between wifi and 3G)... smile.png

Advertisement

Since you write that you are seeing performance variations even on local wifi, I suspect that you have other issues. Performance through local wifi should be rock solid. There are the usual culprits like the Nagle algorithm, but it seems like it could be something deeper. That part seems like an implementation bug.

The exceptional case data flow --- partial and out-of-order vs stalled but complete and in order --- is what you must consider.

I'd ask if your game is designed so that partial information is enough. If you are trading goods, and you say "sell a, sell b, buy c", are you okay if a and b don't go through but c does, how does that affect your game play?

For a trading game, I would very carefully study your design to ensure UDP's partial data approach would be acceptable.

Nagle's algorithm is definitely disabled on both ends and things run smoothly locally (well, actually I send to the wifi router's external ip, which forwards things to my computer - so I think I actually get that router roundtrip) without any delays whatsoever.

As I wrote, I'm using fairly bad wifi conditions to test, but even so I want to have as non-annoying an experience as possible.

In the case of trade (and other actions), they're actually each encapsulated. You don't send "sell food 10, sell ore 10, buy cloth 20", instead you send "trade(sell food 10, sell ore 10, buy cloth 20)". This is because the game is fundamentally limited to a number of actions per day (which accumulate if you miss a day of play by the way).

Each action corresponds to (usually) 1 action point. In the end, the natural design is to bundle actions so that they correspond to something that costs a single action point. In the case of trade, then the trade action would cost a single point, regardless of how much you sell or buy. That in turn makes all actions atomic - either all of it succeeds or none of it does.

In fact, this means that on average, a player will not send more than 80 packets (excluding responding to server pings!) during a game session(!).

The problem is that for movement, anything with more than a few hundred milliseconds of delay will feel laggy.

There are a few special circumstances in the game:

1. The client never has more than one message "in-flight". It always waits until the previous action has completed before allowing a new action. Thus, re-ordering of client packets is a no-issue.

2. The client still has to deal with a flaky connection and detecting TCP disconnects has its quirks. Periodically resending UDP messages until ack:ed is not much harder than detecting server disconnect.

3. The client can take action without having the latest server state with no ill effect, as long as it eventually get the updates.

Also, the usual approach is to prevent interaction until the last command is resolved, so the client actually *never* issues (sends) a new action until the previous is resolved.

What things boil down to are this:

1. Can I use UDP to get a lower latency on commands than with TCP/IP? It would seem that I would have more options, like resending a request every 100 ms until I get a response.

2. Will UDP give me the possibility to do a more graceful reconnect?

Can I use UDP to get a lower latency on commands than with TCP/IP?


Maybe. If the exact guarantees you need to provide with UDP are the same as TCP, then no.
Specifically, if you need to guarantee that things arrive in order (so, loss detection and re-transmission)
AND you need to guarantee that bandwidth is efficiently used (so no over-spamming the network)
AND you need to guarantee that the network won't fall down in case of congestion (so exponential back-off)
then you probably can't do better with UDP than TCP.

The most commonly relaxed requirement is the "must be reliable in order" requirement. If you can relax that, then UDP can perform better than TCP.
I would not recommend relaxing the "network won't fall down in case of congestion" requirement.
enum Bool { True, False, FileNotFound };

Can I use UDP to get a lower latency on commands than with TCP/IP?


Maybe. If the exact guarantees you need to provide with UDP are the same as TCP, then no.
Specifically, if you need to guarantee that things arrive in order (so, loss detection and re-transmission)
AND you need to guarantee that bandwidth is efficiently used (so no over-spamming the network)
AND you need to guarantee that the network won't fall down in case of congestion (so exponential back-off)
then you probably can't do better with UDP than TCP.

The most commonly relaxed requirement is the "must be reliable in order" requirement. If you can relax that, then UDP can perform better than TCP.
I would not recommend relaxing the "network won't fall down in case of congestion" requirement.

Retransmissions could compress updates. So for example, say that I send update 1, then want to send update 2. If I don't have an ack, I simply send update 1+2 instead of 2. So I don't really need ordering in the case of server-to-client packets. For client-to-server, only one packet (or the retry of that packet) will be in transit, so again ordering becomes unimportant in the client-to-server case as well.

Client packet frequency is limited by retry interval and server response, so that should be fairly self-regulating. Client packets are small.

Since server packets only occur in response to player actions and the number of simultaneous players are small, it's easy to see that the server packets will be bounded.

E.g. 3 simultaneous players, client limited to 1 command per 100 ms -> theoretical max = 30 updates / s seen by each client.

Thanks to all of you for helping out with opinions on this by the way.

This topic is closed to new replies.

Advertisement