Memory bandwidth on modern x86-64 class hardware is usually a few gigabytes/sec
Xeon E5 memory bandwidth is 59 GB/s, assuming you can make use of entire cache lines when you read them (streaming, or aligned objects, or such.) One of the benefits of compacting garbage collecting allocators is that they actually tend to allocate objects that are used together, together in memory, which is good for cache and TLB. (For desktop enthusiasts, rather than server folks, the Core i9 with 2666 RAM touches 80 GB/s I'm told.)
concatenating strings with the + operator
That's the classic newbie O-n-squared trap, yes :-) It's a great example of why algorithmic understanding is actually a necessity for programmers, even with modern languages and runtimes to help them.
The question of "how do I think about performance and resource limitations for servers for games" is actually quite deep.
In general, though, there are two options:
-
Pool all the things! Never cause the garbage collector to run. In fact, pre-allocate the maximum amount of objects/memory you will ever need, and if you ever run out, decline accepting the new load. (Disconnect the user, deny the spell cast, or whatever.) This is the "hard guarantee" model of system thinking, and it has the benefit that, once you account for ALL the resources that are scarce, you can actually make guarantees, and stick to them. The draw-back is that it's a lot of work, and you will on average run at less than full system utilization, because you always budget for the worst case.
-
Let the runtime take care of it! It's useful to reduce and re-use objects where it makes sense -- don't use wanton allocation without any thought -- but live in the managed heap world, and live with the garbage collection overhead. Ideally, you can stop-and-collect with a known frequency, such as once per tick (!) or once per second or whatever. If you collect once per second, the server will essentially run 30 ticks in the time of 29 ticks, and then collect, and then repeat, so you end up with an additional latency because of the collector of one tick or so. When it comes to budgets, keep pushing things in until you blow up. Most of the time, you will run at high system utilization. Once in a while you'll guess wrong, and some memory won't be there when you need it, which means you will have to blow up some entire instance and let the players re-connect again.
If you're a pragmatic developer who mainly cares about delivering "good enough" performance to consumers with minimum amount of work, option 2 is your choice.
If you're a systems programmer or out of the real-time community or just care about knowing that you deliver exactly what you say you will deliver, then option 1 should be your way.