- Is TCP fast enough or will I need to build on UDP?
TCP is no “faster” or “slower” than UDP.
TCP has different characteristics than UDP, so it requires significant amounts of buffering and session state on the server per-client, where UDP lets you get away with less. TCP also has the “head of line” blocking problem in case a packet gets dropped in transit.
RPC seems like one way to go
Most RPC systems are too high overhead, because they track too much state.
RPC systems also solve a separate problem, which is “how do I structure the payload inside a packet so the other end can make sense of it,” but that's not really the challenge you have here.
How much overhead is building up directly on sockets
What do you mean by “overhead?”
You write more code when going to the underlying sockets layer, but you end up having more control, and generally can end up with lower server load, and less packet overhead, assuming you do at least as competent job with implementation as your higher-layer library would have done.
Note that almost all “high profile” networked games have people on it who have significant experience and skill in this area. The approach of “I'll use some existing library/technology, and then figure out how to make it good enough” that works so well in web development and enterprise IT, is not at all how high-end games are built. And if your game tries to scale up past the “usual envelope” on some particular scale (such as “number of people in a single simulation”) then you're trying to build a high-end game.
What do I need to consider when choosing the programming language
If you want to scale to very high numbers of players, then you need a language with low overhead, and a good I/O model. If you want latencies to be low, you want a language without garbage collection. Unfortunately, that counts out all three of the languages you choose. Go is especially bad, because its JSON and Protobuf and other similar packetizing solutions are quite slow, and its GC is much less mature than that of Java and C#. Java has traditionally had a poor asynchronous I/O model, meaning that C# is probably the least bad of the three, but if you want to push the envelope, know that previous games in this genre have been written in C or C++. (I guess Rust would be a modern, non-garbage-collected, system with a good io-uring support library, if you can't stand the C/C++ combo.)
Is there anything I need to consider when choosing a VPS
Measure the latency of the scheduler over time – the hypervisor may have significant random jitter. This is especially true on smaller instances, and cheaper hosts.
Measure the actual throughput peering with the networks you're interested in. Some cheap VPS I've used had “free 3 TB bandwidth” but the achievable throughput per connection out to the US west coast from the east coast was about 50 kB/second.
Make sure the networking infrastructure actually supports the kind of load you want to make. Some providers focus on web traffic, and do really poorly for UDP and real-time traffic.
Can you recommend any papers/articles
I mean, it's all right there, in the man pages for UDP sockets …
… the trick being: trying to decipher what the terse, accurate language about kernel and implementation behavior actually means when scaled up to a large system!
If I built a system where “every player” needed to see some kind of state that's derived from “every other player,” then I would build it on UDP. I would start with a single socket, and a single thread. I would set the input and output buffers of that socket really large (10 MB or more,) I would use two threads, one for reading, and one for writing. I would use blocking I/O. I would use a non-blocking primitive of some sort between the threads – you have to discard data at some point, if you back up, and your choice is “getting an asynchronous failure when writing to the kernel” (non-blocking mode) or “under control of your program” (blocking I/O, internal non-blocking queues.)
OK, the threads would simply receive all the incoming packets with their source addresses, and send all the outgoing state packets to all the known addresses, respectively. Because you will likely get many packets in for each time it takes to cycle through the output address list, you need to aggregate updates. This is also necessary to avoid an N-squared growth in number of players – if every player sends an input, and ever other player needs to see that specific input, you have an irredeemable N-squared problem.
So, it might look something like this:
int udpSocket;
void reader() {
char buf[MAX_PACKET_SIZE];
char addrbuf[MAX_ADDRESS_SIZE];
while (true) {
socklen_t addrsize = MAX_ADDRESS_SIZE;
int sizeRecv = recvfrom(udpSocket, buf, MAX_PACKET_SIZE, 0, &addrbuf, &addrsize);
handle_error(sizeRecv, "recvfrom");
playerId = maybe_insert_address(addrbuf, addrsize);
update_state_based_on_packet(buf, sizeRecv, playerId);
}
}
void writer() {
char buf[MAX_PACKET_SIZE];
while (true) {
auto timeThen = my_clock();
int sizeSend = snapshot_game_state_into(buf);
for (int i = 0; i != MAX_PLAYER_COUNT; ++i) {
if (gPlayers[i].active) {
int w = sendto(udpSocket, buf, sizeSend, 0, gPlayers[i].address, gPlayers[i].addressSize);
handle_error(w, "sendto");
}
}
auto timeDelta = my_clock() - timeThen;
if (timeDelta < MINIMUM_TIME_BETWEEN_PACKETS) {
usleep(to_microseconds(MINIMUM_TIME_BETWEEN_PACKETS - timeDelta));
}
}
}
Is this syntactically correct code that will compile first try? No :-)
If you find that your CPU runs flat out on the two threads you have, and you're still not saturating your network interface (could happen with a 10 Gbps or higher network interface, and/or if your implementation is less than efficient) then you can perhaps open multiple sockets, and run one of these pairs of loops per socket. You'd have to figure out how to make your different clients send to the different ports of those different sockets somehow. You'd still end up being bound on the “update state based on packet” global state update; depending on how complex that function is, that may be easy to scale across cores, or not.
In general, for very high throughput servers, locking is the enemy. You should be able to implement “maybe insert address” and “gPlayers” in a non-blocking manner. Especially if it's OK that some player gets one additional identical packet through a cycle, meaning you don't need locking primitives across management of the gPlayers array, just make sure to keep the “active” flag properly updated (and memory sequenced, if you're on ARM or Itanium or some such where that matters.)
Now, maybe I'm making some bad assumptions. I'm assuming the “simulation” (or “game state update”) is simple – like a “counter” – such that “who affected the counter at what step” generally doesn't matter. I also assume that “a lot of players” is a goal of 10,000 players or more, all affecting a single state. If “a lot” means a hundred players, then efficiency doesn't matter much, unless you're building some kind of fancy physical simulation, similar to a modern FPS game.
Good luck with the game!