Advertisement

MMOG Server-Side. Front-End Servers and Client-Side Random Balancing

Started by January 04, 2016 08:01 AM
34 comments, last by Sergey Ignatchenko 8 years, 9 months ago

Or in other words, when you're handling petabytes per day, a multi-terabyte attack that would flatten a lesser service isn't even a blip on the performance radar.

The "Massive" in "Massively Multiplayer Online" really does mean "Massive". It is a scale that takes getting used to.

The "Massive" in "Massively Multiplayer Online" really does mean "Massive". It is a scale that takes getting used to.


I actually don't think you get up to terabits per second for any kind of service in any particular location, no matter whether you are Facebook or Google. Even Netflix is highly distributed; the bandwidth from any one site is likely significantly less than that!

This is why really big DDoS-es can still work; connections fatter than 10 Gbps start needing both fancy hardware and opportunistic data exchange locations -- OC-768 is still the fastest OC back haul I know of, and that's only 40 Gbps, and carriers only have it in the higest-end co-los and exchanges. You can, of course, pay to pull your own fiber to your own center ... if you're really rich :-) At some point, I imagine trying to put too many of those in parallel becomes a hassle! As far as I know, 100G interfaces don't carry that far yet, so even if you can get connections of that speed, they will have to split at some point...

From what I hear from the mitigation vendors, their bigger customers are the financial companies, because someone from a country with less well functioning legal system will decide to try to hold a stock broker / investment bank / money exchange hostage by flooding them. That's where the real money is! Games are, despite being bigger than movies, peanuts by comparison.
enum Bool { True, False, FileNotFound };
Advertisement

So, in one breath, you talk about a $1e9 enterprise, and in the next, you're talking about a start-up.

Exactly. This is one of Big Challenges I have with this book (and it indeed often brings me at the edge of MPD as in "Multiple Personality Disorder" blink.png ): everybody wants to start small, and to end up big (and to make things worse, I cannot blame them for that, it is a perfectly natural desire). So I need to describe a path which starts from dirt cheap rental servers, and goes all the way to $1e9 enterprise with multiple Telehouse-hosted ISPs, swimming pools, and headquarters on Bahamas (all that without rewriting the whole thing from scratch). Which is possible, I've been a part of it myself. On the other hand, techniques used at each of the stages, vary greatly across the industry, so I'm always eager to hear about real-world experiences (which tend to be very different from marketing materials each and every vendor is throwing at you sad.png ).


Unfortunately, they come with enterprise contracts that may prevent me from talking too much about them, but I can talk in general.

Thanks a LOT!


The vendors I know from some level of detail are Verisign, Neustar, and Akamai.

The guy from whom I've heard about these things, told about the game successfully using F5 Sliverline, but wasn't able to provide any details beyond BGP redirection and tunnel sad.png (it wasn't his scope).


We have not measured additional latency. I'd expect dozens but not hundreds of milliseconds.

Are you sure about it (for quite a few games out there, the difference between dozens and hundreds will be a deal-breaker)? It would require the vendor to be really close to your ISPs (though as they have worldwide presence, it should be possible too). Did you notice how far their BGP redirection gets you (in terms of hops, or kilometers, or whatever else)?


Return traffic typically does not go back through them.

Do you know if it also applies to TCP? IIRC, TCP attack mitigation (beyond SYN floods, which are a very different and unpleasant story) is much more efficient if they know about outgoing packets (basically, they can filter out all the non-SYN out-of-existing-connection traffic as illegitimate).


The cost for DDoS mitigation is some amount of yearly costs (thousands per month) and that includes some amount of mitigation events per year (an event might be defined differently per vendor.) Additional events cost additional money, on the order of a some thousands per.

This sounds as not too much starting from, say, some-Gbit/s outgoing traffic (well, that's assuming that "thousands" are more like "single-digit thousands" and less like "hundreds of thousands" smile.png ).


Curiously, the bigger you are, the less "nuisance" attacks you see, because your ingress is big enough that the small DDoS-es don't actually affect you.

Yep, I've seen it too. OTOH, it means that you need to handle these small-scale nuisance attacks yourself (so that the small-scale attack doesn't cause service interruption, or the need to initiate the "event" which is expensive, disruptive, etc. etc.).


I actually don't think you get up to terabits per second for any kind of service in any particular location

FWIW: these days, terabit/second is of the order of the bandwidth of the whole (but single) datacenter (it varies, but for serious guys it is usually from 0.5TBit/s to several TBit/s, and also depends how we count it - accounting for redundance or not, but the order of magnitude remains). So if you have your own datacenter - it is technically possible to have terabit/second in one single location smile.png . Paying for it is a different story, though recently I've seen pricing of EUR500/month for an unmetered 1GBit/s port (with a provider I can trust to deliver on it - within reason, of course) - and should say that's damn cheap, so terabit is going to cost A DAMN LOT...

Are you sure about it (for quite a few games out there, the difference between dozens and hundreds will be a deal-breaker)?


I'm not sure, but logic says that it can't really be hundreds of milliseconds. Given the volumes involved, additional buffering/delay is just additional capacity consumed for no real gain on their side. They either keep up, which means they drain incoming traffic right away, or they don't, which means they'll instantly back up and fill any buffer they build.
If you're very geographically challenged, the distance hop might be a problem, but if you're on the same coast of the same continent as a scrubbing center, you'll be fine. One of the things to ask those sales people: Where are the centers?

recently I've seen pricing of EUR500/month for an unmetered 1GBit/s port


Yeah, we were paying about USD1000 for the same for years. (Cogent being the provider. Big and cheap ...)
There are now 10 G options -- I saw a banner ad for $3k unmetered 10Gbps the other day, which is consistent with what we pay per.

Do you know if it also applies to TCP?


I had the same question when we engaged, and they didn't care. The reason being that they treat any TCP connection they didn't initiate as "bad" which means they track it because the ACKed the SYN. Which means that existing connections drop when you swing to mitigation. This is not a problem for web sites; for our more persistent connections, we've made our clients somewhat smart about trying re-connecting when something drops/times out for a little bit, with a bit of back-off. We also made our view protocol recover state on re-connection, so a sufficiently smart GUI can "hide" the event (in the sense that a 20-second lag spike can ever be "hidden" :-) UDP doesn't have that problem.

So, in the end, someone with a botnet larger than your summed ingress, can cause your players a lag blip on the order of one minute. You'll stay on mitigation for a few days after you start mitigating (A typical definition is "48 hours after attacks cease, the event is over") so worst case they can do this every 2 days. That ends up not being bad enough to matter, and thus the return on investment isn't there.
enum Bool { True, False, FileNotFound };


They either keep up, which means they drain incoming traffic right away, or they don't, which means they'll instantly back up and fill any buffer they build.

Makes sense. Then (assuming they have a datacenter on the same coast, and you're not on different sides of the coast) we're in 50-70ms range, which is pretty good for quite a few games out there. Still, would need to check in real-world, but this stands for pretty much everything anyway :-).


they track it because the ACKed the SYN

I think I got it, THANKS!

Advertisement

This topic is closed to new replies.

Advertisement