Before we begin this journey into the inner workings of networking in games, it's important to define some terms, and get some background on the Internet and how it works. This is of inestimable help later when trying to explain why certain things are done the way they are when coding for the net, plus, if you're anything like me, it's just plain interesting.
I'm going to go over some history of the Internet, with some simple examples of how it works, without getting too technical. This is not intended to be a programming reference document, more an enlightenment of what others are talking about when they talk about latency, pings, TCP/IP and so on. I will avoid those areas of the net that aren't directly related to games, since there is no reason to bore the pants off anyone more than is strictly necessary. This is not a 'how to' document, but more a FYI type of thing. There is nothing in this about making your home system better over the Internet, but more an explanation of why so many Internet games have networking troubles, and where they come from.
[size="5"]So let's begin
The Internet as everyone knows it came about from a much smaller network called ARPANET that the Pentagon created a) because America was on a science kick in the 60's and wanted to get a head start in burgeoning industries and b) because the Pentagon wanted to use and keep tabs on the expensive mainframes it was funding at places like MIT and UCLA without having to use multiple remote terminals.
To cut a long story short, the pentagon put out a contract to tender that would link multiple mainframes together, for use in real time. This would mean that one man at one terminal should be able to access multiple machines, share data and run programs on different machines.
From this contract, the concept of packet switching and routers was born. Now everyone bandies those words 'routers' and 'packet switching' around, but what do they actually mean? Well, first up, lets dissolve one common misconception. Many people use the phone system as an example when discussing the Internet. "It's like phone system" they say "You have an IP address that's like a telephone number". Well, not really. A better one would be to use the post office as an example. Imagine that when you send a file from one computer to another it's like a letter being sent. It first goes to your post office, where it is examined, and it's decided if it's intended for someone that has an address that post office serves. If it's not, then it's forwarded to another post office for examination again. Eventually it will arrive at a post office that says, "Oh, I know where the post office that this letter is intended for is located" and it's forwarded directly to the correct one, which then sends it on to the intended recipient. Long winded but you get the idea. Well, a router is effectively a post office. It sorts files that come in and decides what to do with them and where to send them. This is very different from the phone system where you end up with a direct link between you and who you are calling. With routers, there is no direct link. Incidentally, there is a common myth that states that the original APRANET and by default, the Internet, was designed to withstand a nuclear war, so that if one machine was taken out, then others would still be able to communicate, since there was no one route that everyone depended on. Having researched this, there is no actual proof that this was ever an original requirement. It would be able to withstand loosing a large portion of it's connecting machines, but that would appear to be more of a side benefit than an original requirement.
Anyway, back to packet switching. When you send a file over the Internet, you don't actually send the whole thing in one big chunk. It's broken up into small packets - like postcards if you want to continue the post office simile - and each one is transmitted one after the other. The beauty of this is that the routers can handle many, many of these little packets, without ever having to know what's in them (or indeed, the order they are transmitted in). So your packets get mixed in with someone else's, and the data stream gets maximum efficiency. All your machine has to do is create the little packets, number them, so they get re-assembled on the other end in the correct order, and send them out to the router. Of course they need an address too. That's where IP's come in. An IP is a unique address for your machine on the Internet. It's a 4-digit number, all of which are between 0 and 255. For instance 204.57.198.32. All those www.whatever.com are actually converted into IP addresses when packets are exchanged with another machine on the net. Sometimes these are specific and constant on one machine, more often than not they are dynamically allocated by the host system. Every time you log onto you service provider, they send you an IP address they have free from a range that's been allocated to them. For instance your ISP may have the range 204.198.32. 0 to 255, which gives them 256 possible IP addresses. 256 people can all be using the system at once, but no more than that. When you log in, the system looks to see what IP's are free, and sends you one. That way more than 256 people can be on the books for this Host, but only 256 can use it at once.
The alternative to this would be the phone system approach, which would mean creating dedicated routers that would reserve an entire line for you to send data to and from the other computer, but that would not get used most of the time, especially if you are doing stuff like typing in real time. You may think you are a fast typist, but in the time between a message going to and from your machine to another, the network could have transmitted War and Peace several times. A good simile that I heard used would be "like reserving the entire Interstate road system to drive a car from Washington DC to LA". You would never dream of doing that, instead you share it with other car drivers. Just like on the Internet. Maybe that dumb 'super highway' label thing has some merit after all
I'm sure you can see how the sharing lines with others, and breaking messages into small packets is the most efficient use of network time and data streams. The same system is in use today as was originally designed for the ARPANET way back when. Why? Cos it works real well
Where do ISPs come into this? Let's think of it this way. The routers are machines that sit attached to mainframes and stuff that we are treating as big post offices. An ISP (Internet Service Provider) is one step removed from that - like the postman himself. They are attached to a machine that often has a router (not always), but they also have a ton of modems attached to them. Your little PC at home uses its modem to call up the modem attached to the ISP's machine, which then accepts your packets and then forwards them, in bulk and mixed in with everyone else's, to the Internet with a capital I.
Cable modems, ISDN and DSL are pretty much the same thing, except that the modem-to-modem part is removed, and faster bandwidth communication devices are used instead. In fact DSL is basically just a faster modem with a better phone line anyway
Ok - so now we know how data gets to and from, and about on the Internet. It all sounds cool and froody, so what's the problem? Why doesn't Quake play well then? Well, there are many potential problems. I'll list some - and by all means not all - here.
- The routers. Routers have a finite capacity. They can only examine and forward one packet at a time. The rest sit in a 'queue' waiting to be dealt with. Once the queue is full, any packet that gets submitted will be ignored. Welcome to the world of dropped packets. Actually, this is pretty rare, you have to have a real heavy load on a router before it does this, but it does happen. Another problem with queuing is that it takes time. It delays your packet before it's processed and adds to the round trip time it takes for your packet to get to its destination. More often there are problems with the router itself, or it leads to a dead end. To explain, when a packet hits a router, it's destination address (its destination IP) is examined, and the router compares it against it's own route tables. These routes come in two flavors, static and dynamic, (at least they do now - older routers have only the static lists). Route Tables are basically a list of destination addresses it knows about. For instance router A gets a packet that wants to go to Router F. Router A can see routers B and C - but not router F. What does it do? It requests those routers that B and C can see (and in this case both know where router F is), and decides based on the info it receives which one to send it to. This info includes loads on the B & C routers, number of hops to get to Router F and so on. Since loads can vary second to second, the decision to send it via B or C can change second to second. Hence you can see how multiple paths can be used for packets going to the same destination. So what's the difference between static and dynamic trace routes? Static are routes that are 'programmed in' to the router. It KNOWS these routes exist, and expects them to be there at all times. Dynamic routes are those that it gets from other routers. This list is constantly changing and updating dependent on what routers are up, what routes are the fastest and what routers may be inoperable further up the chain. If a line goes down somewhere, or a router breaks down, most routers with only static lists don't know/care. They send it on, since they KNOW the route is supposed to be there, but once it gets to the next one, there is nowhere for it to go, because a line was down. Before dynamic trace routes were around, if a line went down between you and your destination, you could well be SOL. Obviously I've take some liberties with exactly how routers work and simplified it considerably, but this is more a layman's document than a programmers guide.
One of the worst things about all this is that there is nothing that you, as the user, can do about this. The Internet was designed to be robust, in real time, but not instant. It's a shame, but Quake wasn't on their minds at design time.
As an aside, you might be interested to know that with the new IP design that has 6 IP numbers instead of 4 (apparently we are going to run out of IP addresses by 2014), some new addressing schemes include a 'preferred route' in the header for IP packets. This way the router itself won't be doing the decision-making, but letting the packet creator chose the route itself. At least this will gain consistency, and reduce lost packets & out of order packets, but at the risk of speed of transmission. - TCP/IP - UDP. As we learnt in the last bullet point, you can't guarantee that packets are going to be delivered at all. Another draw back to this situation is packet ordering. You may transmit you packets in order, but they may end up going via different paths, and encounter different delays in getting to their destination, with the practical upshot that they get there out of order. This is a problem, and there is not much that the hardware of the Internet can do about this. But a solution is in hand in the shape of Internet Protocols. We've all heard about TCP/IP but what does it mean? Well, it stands for Transmission Control Protocol / Internet Protocols. While we are talking about initials lets define UDP as well. That's User Datagram Protocol. So we know that they stand for. Does this make everything clear? No. Ahhh. So let's clarify. TCP/IP and UDP/IP are two layers of systems. The IP bit is the part that figures out the transmission of packets of data to and from the Internet. UDP or TCP hands it a big fat old packet of data, and the IP part splits it up into sub packets, puts an envelope around it, and figures out the IP address of its destination, and how it should get to where it's going, then sends it out to your ISP or however you are connected to the Net. It's effectively the bit where you write down what you want to send on a postcard, stamp it, write the address on it, and stuff it in a mail box.
UDP and TCP are higher layers that accept the packet of data from you, the coder or you, the game and decide what to do with it. The difference between UDP and TCP is that TCP guarantees delivery of the packets, in order, and UDP doesn't. UDP is effectively an access way to talk directly to IP, whereas TCP is an interface between you and IP. Complicated, but you should get the drift. It's like having a secretary between you and your mail. With UDP you would type up your letters yourself, put them in an envelope etc. With TCP you would just dictate the letter, give it to her and let her do all the work and follow up to be sure the letter arrived.
You can see TCP/IP in action right this second if you want. If you're in windows, open up an MS-DOS prompt and type PING 205.229.73.43 and press return. What you've just done is sent a message to the machine that runs this website and said "are you there?" And it's replied, "Yes, I am." The values you see there is the time taken for the packets of info to make the round trip - from you to them and back again. This is called Ping time, or Latency. Latency is one of those weird phrases that mean different things to different people. We here at Raven treat it as an average. Ping is the round trip for one packet; latency is the average round trips over the last 30 or so packets. As a rule of thumb, those hosts that you are trying to get to that have the least amount of routers to go through are the ones that will have the lowest ping. Usually these are the closest to you in physical location, but not always. If you want to see the route you have to go through to get to a particular host, type tracert 205.229.73.43 at the MS-DOS prompt. This returns all the routers your packet hit on the way to the host.
However, all this wonderful work-done-for-you comes at a cost. In order to be sure that packets that are sent via the Internet get there ok, TCP expects an Acknowledgement (an ACK in net parlance) to be sent back from the destination for every packet it sends. If it doesn't get an ACK within a certain time, then it holds up sending any new packets, re-sends the one that was lost, and will continue to do so until the destination responds. We've all seen this in action when you've gone to a web page, and half way through the download it stops for bit and then restarts. Chances are (assuming its not an ISP problem) a packet has been lost somewhere, and TCP is demanding it gets resent before any more come down the pipe.
The problem with all this is the delay between the sender realizing something is amiss, and the packet actually getting through. This can get into the seconds sometime, which is not that much of a worry if you are just downloading a file or a web page, but if it's a game packet, of which there are at least 10 a second, then your in real trouble, especially since it's holding up everything else. This is actually such a problem that almost no games use TCP/IP as their main Internet protocol of choice, unless it's it not a real time action games. Most games use UDP - they can't guarantee order or delivery, but it sure is fast. We'll talk about how they handle this later. - ISPs. Often the bane of a game players life. Some ISP's get all upset about the idea of people playing games using their precious bandwidth that they actually use Packet Sniffers. These are programs that scan the packets going through the network looking for Quake game packets, and when they find them, they kill them dead. What a bunch of spoilsports. I'd be interested to know exactly how they know these packets are Quake packets, since packets can contain anything, but apparently there are programs like this out there. Of course the other big bit of bad news about ISP's is their server load. The way that modem banks work is that all the modems tie into one large pipe that goes into the main host machine that then forwards these packets to the Net itself. Now, the lower the spec machine that is used for the hosting, and the larger the bank of modems attached to it, the longer the response time is on both packets going in and out of the machine. Fairly obviously the main pipe is only so wide, with the upshot that once it's full, your modem waits. Of course this doesn't just apply to the ISP machine, this can apply to any of the routers on the way and also the destination machine too - we've all seen those download problems on machines that have something popular on them. This means that you may be connected to a 56k modem, but you're only getting 28.8 performance out of it, due to limitations beyond your control. And this sucks. Some ISP's are worse than others, with cheap crap modems that drop the connection and stuff like that. I won't mention AOL here. Again, all you can do is just shop around.
- Network coding in the game. This is pretty much all we as developers can do to accommodate the intricacies of the Internet, but there is a surprising amount that can be done. Reading all of the other points kind of makes you wonder how online real time gaming can ever be done at all, but it has to be said, the net works more than it doesn't. We'll discuss some of the cool things that can be done programming wise in a second, but first, we'll look at some of the no-no's. First up is the use of TCP/IP as your main protocol. I've already explained why this can be (and usually is) bad news. It is often used during game setup to ensure all players have the correct starting data, before we start the game data flowing.
Secondly, packet bloating. You have to be careful only to transmit that data that is required; otherwise you are just sending data for the sake of it. The larger the packet you give to the UDP system, the more you are asking the network to handle. This has a big impact in client/server setups when your packet gets to the server, since YOU are only transmitting one packet, but the SERVER is receiving many such packets. This also impacts modem bandwidth. If you are running a 28.8 and getting a pretty good sustained throughput, you need to be sure that you are not allowing the packets to exceed what it's possible to push through the modem. Too big = packets getting shunted into a buffer while the modem struggles with what it's got to send, and eventually the buffer overflows and you end up at a crawl, assuming the game hasn't already puked.
Third, packet frequency. Are you expecting packets to be sent faster than the communications infrastructure can really handle? You may be running at 60 frames per second, but you can bet that the Internet will have trouble sustaining that kind of packet rate.
Fourth is handling out of order packets (assuming you are using UDP) and dropped packets entirely. This is more involved and requires you to be cleverer than you might think. However, if you don't handle it right, you end up with missing events, missing entities, missing effects, and sometimes, completely FUBAR'd games.
Lastly, there is the aspect of online client cheating to consider. With CPL and other frag fests offering cash to winners, this is more important to consider than it used to be. So ok, we've seen the mess that is the Internet, and all the pitfalls, what can we do about them as game developers?
Rats, I knew someone was going to ask that. I thought I was done, check please. But noooo, more stuff to have to type up. Oh well.
Well, the first thing we should do is define the difference between client/server type games and peer to peer games.
Peer to peer involves two or more games talking to each other, each running the game itself and only exchanging input data. This reduces network traffic to a minimum, but brings several other problems to the table, like coping with lost traffic. This is far more important when more than one game is running, since contention occurs over who is correct and who is not. Variance in game play can get very sticky in these situations, as each game must stay synchronized with the others. Additionally, each game must wait for the input from the others before it can simulate the next frame - remember playing DOOM and it would lock up momentarily?
Client/Server involves one machine running the game and dictating to all the clients what the state of play is and what they should be displaying. Effectively the clients become pretty much dumb terminals transmitting the user input to the server, and letting it handle almost everything. They draw the scene the server tells them to display, and play the sounds the server tells them to play. Actually, it's not quite as bad as this, as the server does on occasion tend to offload functionality onto the client, but that's the basic idea.
What I'm going to discuss has more to do with Client/Server type setups than peer to peer since almost all online type games have some degree of Client/Server architecture to them - every game has to have one client that 'hosts' the game and is considered 'correct' in the case of world event contention between peers. (Unless they don't, in which case, you'd just get an "out of synch" error and quit.)
Ok, now on to our problem list - the TCP/IP selection is a no brainer - we don't have to discuss that anymore.
One down.
Packet bloating. This one can be tricky. Obviously a max packet size in the code is in order here to stop modem buffer overloading. We here at Raven are actually implementing a floating max packet size, for those people who are running over a local network, or that have large bandwidth available to them. When you hit a packet that breaks your buffer size, the secret is to split the data into two smaller chunks - only send in the first packet what is really necessary to be there that instant. Data like entity movements and so on. Stuff like chat messages can wait till the next packet, since no one is going to miss that being one packet late. Still, tough decisions need to be made as to what's important and what isn't, and sometimes this can make the game feel a little sluggish and un-responsive. This is where the floating packet size can be helpful, since it should remove that feeling from those with large bandwidth or running local games. Not the best solution, but one that's worth a try.
Other stuff that's worth thinking about includes tokenizing text messages. If your server is sending a lot of preset text messages, it makes more sense to have these pre-loaded on the client, and just send them a text string reference number rather than the whole string. This reduces out message traffic considerably. The same trick can be played with sending down filenames when the server asks the client to load something. For instance you can break down the file into path names, and then filenames. If you are asking for a bunch of sound files to be loaded, then only send the path once, and from then on, refer to the path as a token in the string. For instance we'll ask the client to load "sound/weapons/death.wav". Once the client receives this string, it will store away the path as a token, and the next time we want a sound, we send "%1pain.wav" and the client knows by the %1 to go away and use that path it got first time to load this sound. Little things, but they all help.
Something else worth considering is reducing the complexity of floating point data. Traditional floating point is 32 bits long - 4 bytes. The question is, do you really need that degree of accuracy? Reducing 32 bits to 16 of floating point is not out of the question; many games do this, but I'll bet you haven't noticed. While we are on that subject, being very sure of the size of the data you need to transmit is also a necessity here. If you are sending a value of between 0 and 170, do you really need a long word to do it? It would fit in a byte, and you've just saved 3 bytes. Obvious when you think about it, but you'd be surprised at how much it gets forgotten about when you are just getting the game working.
Only sending objects that have relevance to the scene you are displaying is helpful. Remember, the client is dumb, and doesn't need to know about what's out of the view or hearing threshold. Who cares? They aren't being rendered or heard, so what difference does it make? The server knows about them, and it's running the game, not you. This sucks of course if you are out in the open, or in a space sim, since everything is visible, but that's a game design decision that you make based on your technical abilities.
Further to that, offload special effects. Remember the client is pretty dumb, but it's smart enough to do clever effects for you. There is no reason for the server to be sending all the info on effects to the client, wasting both server time and network space. It's enough that the server says "an explosion happens here" and the client does the rest, superimposing that effect on the main display. We did this in Heretic II, which was the major reason it ran so well on the lower end machines.
Of course the biggest thing you can do to help packet size is to delta-compress info. Without giving away all of our (game developers' that is, not just Raven) technical secrets, the idea here is to only transmit data that has changed from one frame to the next. Simply keep a copy of what you sent last time, and on an object-by-object basis, compare what you want to send this frame with what you sent last, and only transmit that which has changed. Of course this doesn't work when you have a new object to transmit, since it all has to go across. But then if you figure out the percentage amount of this happening, it comes out to about between 5% and 10% of the time. That's some saving.
If you want to, you can implement some compression schemes on the resulting packet to make it even smaller, but in these cases the trade off of time to compress on the server and decompress on the client can be worse than having a slightly large packet.
Frequency - control over this is a must. Quake actually has a server that runs at 10 frames per second, transmitting data over the net at that rate. Actually, it does transmit faster than that when it's doing stuff like downloading client requested files, or responding to server info requests, but during game time, the client expects data at a 10fps rate. It runs at 10fps a) because of the amount of data it is processing for each client. And b) because this is a nice easy network packet rate to sustain.
There, that one was easy.
Out of order and missing packets. The trick here is to only treat one symptom and ignore the other. If you number your game packets (when we talk about packets here, I mean game server frame packets - IE the packet that contains a complete frame update from the server) as they go out to the client, the client can know if it gets an out of order packet. The simplest solution is to dump it, and treat it as a missed packet entirely. Doing this is a must if you are dealing with deltaed packets, since the delta values in the packet refer to the frame that came before.
If you keep a copy of the last packet you received from the server on the client, you can compare the latest one you got to it and see if an object has been dropped. At that point, you can either just dump the object immediately, or store it off into a list and check a few packets down to be sure it's still gone, and then dump it. The beauty here is that you never actually have to send a 'remove' function to the client from the server, since by omission from the game packet from the server, the object is gone. Even if you have some dropped packets, it doesn't matter since eventually you will get one and that object will still be missing in the latest packet, and thus it will get deleted. Cool eh?
Now we'll take a moment and talk about client prediction. And what a clever but nasty beast this is. In the cases where both the client misses a packet from the server, and the time between getting normal gaming packets, (think about it - the server may only be running at 10fps, but that doesn't mean you want the client side representation to), the client needs to be doing something to make it look like it IS still getting data. So we predict the world and events in it. Since we know what's going on with the client's player - after all, we are right there at the input point right? - we can predict what he is going to do. If he fires a weapon, we can show it on screen, since that's what we know he's going to do. We can also predict - to a lesser degree - what the other players are doing, at least to complete out any animations they may be in, if they are dropping still have gravity performed on them and so on. Now of course this only works for a time measured in seconds, but usually that's enough for the packet system to come back on line, and start re-receiving stuff from the server, at which time the client can correct it self for any events that it predicted wrong. At the best, it's totally on target, and you will never have known that you were missing data. At middle, the client is a bit out, so it starts correcting via a smoothing operation, that way no one 'snaps' really obviously to a new location. And at worst, you are dead via an attack you didn't even see, since it occurred while you were missing packets. However, there is no way around this situation so it's something that has to be lived with, and it's better than jerky motion and snapping updates.
However, what do you do if you miss a baseline packet? IE one that has a new object in it that wasn't there before? You've missed all the information that came down initially, but you will be getting updates from that point on. Well, to be honest - that's the trick isn't it? I've given away most of the tricks of the trade already, but some must remain. I'll give you a clue though; it is possible to fix a situation like this.
In every type of game there are some packet types that WILL require a guaranteed delivery. So be prepared to create some kind of structure to cope with this, because UDP doesn't. But be sure you don't use it too much or you will end up back with the same problems that TCP/IP has.
Online Cheating. There are a few ways to try to deal with this, but be warned, what's man-made is man-hackable. This is not so much a big deal at big frag fests since all the matches there are moderated, but it can have an impact on those that qualify for these fests. And of course, it just plain sucks to be playing on a server where someone is unbeatable because they are cheating. Cheating can occur in many ways, modifying the client to never display walls in the game, adding lights or white skins to other players, displaying a local map (if you want to get really ambitious), modifying your aim so it's always dead on other players, or simply firing a weapon at an opponent with deadly aim the moment they are in sight. All of these are hacks to the client end of the game, and when done properly, are pretty un-observable back at the server. There is some stuff you can do, checking the accuracy of each player and dumping those that go over a certain scale. You may loose some really good players that way, but it's unlikely that anyone can get over an 80% hit rate all the time. All the checks of the client in the world can really be gotten around since the result of the check has to be returned to the server at some point, and if it's intercepted there and replaced with what the server expects, the server is fooled. Using the result to decrypt the data that comes from the server is possible, but again, it's done on the client side and with enough patience and a good dis-assembler it can be gotten around. Client integrity is the key here, and keeping it the aim. The Quake 1 & 3 solution, that of a virtual machine, where instead of the client loading the game up the client 'builds' or 'compiles' the game it's going to run via instructions from the server, is a good start, since re-writing someone else's compiler is beyond all but the very best of hackers. God knows, writing it in the first place is a nightmare I wouldn't want to contemplate. But it is within the bounds of possibility. All the games developer can do is make it as difficult as he possibly can for the budding hacker and be content with that.
[size="5"]Last thoughts for Developers
Peter Lincroft who was involved with X Wings vs. TIE fighter had an article in Game Developer Magazine and did a talk at GDC last year about his experiences with net gaming, and I'd like to reiterate some of his ideas here for completeness' sake.
When testing a game, make sure you find a really horrible ISP to do some real Internet testing. Most games get built and tested to start with on the internal LAN at the developer's offices. This isn't really a fair test, since LANs rarely drop packets and have great PINGs. Find a bad ISP and do some REAL testing. This really works wonders for you later.
Emulate Internet conditions. Stick some code into your code base that emulates packet dropping. Have it settable so you know at what point your game is going to break down - is it 10% drop out or 30%? These are things you should know so you can automatically drop someone from a game if this occurs. Something to bear in mind here is that the Internet doesn't typically just drop one packet. Usually they occur in batches, so don't just dump one at a time, do them several at a time.
Remember your server is going to be sending out far more data than each client has to worry about. If client messages are around the 2k mark a second, and there are 10 clients, then the server is banging out 10x2k packets, which is 20k. Be sure that the communications infrastructure you are using is capable of supporting this.
Well there you have it, some ramblings and thoughts on Networking 101 for Games. I've probably made some mistakes, but the gist of it should be sound. Have fun out there, and be amazed it works at all
Big thanks must go to James Monroe for adding any inaccuracies this document may have. Blame him and send him mail for any mistakes instead of me.
Very useful and full of information document. Thank you so much