I'm taking a second look at the architecture for my multiplayer game, and I'm revisiting pros/cons of using TCP vs UDP.
The game is currently using TCP/IP and that's what I'm used to. I used to work at one of the major poker networks and we had game servers running about 5000 players on each server instance without problems.
However, poker is very forgiving in terms of latency.
My current game is a sort of RPG/Strategy hybrid, where you move around on a world map of a fantasy world (reminiscent of the world map of old school RPGs such as Ultima 4 or early Final Fantasy).
Bandwidth usage / Simultaneous connections
Since the typical play time will be around 5-10 minutes a day(!), even a huge player base will only show a rather modest amount of simultaneous players.
In addition, the game is split into game worlds that each only hold around 100 players, so 1-2 simultaneous players in a single game world would probably be standard. Add to this that in general you only see the actions of players in the exact same world grid as yourself - and the world is 80x80, and it's easy to see that broadcasting results of a player's actions will only be a very modest addition to the bandwidth.
The hunt for responsiveness
Given the low requirements on simultaneous connections, we can look at the network as if there was just a single player.
Typically, a player will want to navigate around the map as fast as possible. This is especially true when "trading", i.e. moving from one well known village to another nearby village, trading goods. This could be something like "move west, move west, move north, trade, move south, move east, move east, trade" repeated over and over again.
Now, using a phone to play the game, even when the phone is connected to the same wifi router as the development computer, the time it takes to complete an action varies greatly in a seemingly random manner.
Since turns are taken in quick succession, this random delay feels even more jarring.
Animation will help a bit
As long as the delay isn't too long, using an animation (as opposed to directly updating the position as in the development version), will help smoothen the experience.
The longer the delay, the more of the longer delays are hidden, but at the same time all movement will be slower.
Game design
The current design basically takes each collected commands from all players in a world and process them command by command, sending notifications and state updates after each command has processed ok.
A client is guaranteed to either see a successful result, or get an "ActionFailed(<action type>, <reason>)" message back.
One of the fundamentals of the game is the fog of war. Initially, the entire map is hidden, and throughout a game (which may last over a month), each player gradually explores it. Secondly a player only sees other players in the exact same square.
Trying to imagine this game with UDP
Regardless of communications, it's essential that the client receives the server response to a move as there are three things that might happen that the client cannot predict:
1. The player clears part of the fog of war and reveals more of the map.
2. The player walks into a map location with one or more players, revealing their current sizes and more.
3. The player enters a part of the map that just changed due to another player's action (magic changed the environment, another player conquered that area, or built a fortification there)
4. Another player attacked, or in some other way constrained the player before the movement could be processed.
These issues are there regardless of transmission protocol.
(1) could be fixed sending the game map, but we still have (3) which are changes to the base game map.
(2) could be handles by sending the all players' locations and data, but we still have the chance that someone attacked or did something else to invalidate those values.
(3+4) these can't be known beforehand.
In other words, it looks like reliable UDP would be necessary.
How could reliable UDP be better than TCP?
One thing is that the game doesn't really care how the game changes are sent, so if we timed out the ack for update n, and we're in the process of sending n + 4, then it's ok to send [n, n + 1, n + 2, n + 3, n + 4] instead of [n + 4] and ignore timeouts for n + 1, n + 2, n + 3.
Mobile and wifi networks are fragile, UDP might have a better chance to recover from connection issues, as we don't need a reconnect.
But is it really?
UDP won't solve the original problem with variable latency over the network, it might not even help too much with the cases where packets are lost. And it definitely makes things more complex.
Help me - what's the verdict?
Implementing UDP would be fun, but I wouldn't mind using a low level solution such as eNet either. The question is if it will pay off.
I've tried to write down as much detail as I could. Hopefully someone has done something similar and has advice.
P.S. Computer and phone latency are two quite different things. Even using wifi, the phone is much slower with a significantly worse quality connection.