Advertisement

Fine tuning TCP connections

Started by March 02, 2016 04:38 AM
5 comments, last by Sergey Ignatchenko 8 years, 8 months ago

My server is serving a couple hundred clients, and I am noticing some differences in how these clients are being served, and wonder what are the fine tuning we can do to make the connection more robust and consistent across all of client devices.

A couple of background stories. All clients are running on embedded devices with limited RAM and CPU, and slow storage speed, but they can connect to high speed connection.

We had a problem earlier when sending large files. We are noticing that it can take a while for the clients to read data from TCP and write it to its local storage. The server sends them very fast, and it all seems like it's been sent out even though clients are slowly receiving them. So we implemented an ack packet from the client after it has finished processing the file so the server can proceed with the next packet.

Now we are seeing a different problem. Server implements a 5 second timeout for certain types of packets because they require an almost-immediate ack from client. These are not very big packet, around 100 bytes. I thought 5 second delay is pretty long for Internet connection nowadays especially for 100 bytes packets. On some clients they never made the 5 second timeout, but on some other clients, they could hit this timeout more often.

We have certainly seen two different behaviors on the same client, so client's hardware should not be the issue here. Internet connection should also not the issue here as we have tested and seen these in office with high speed connection.

I wonder why?

Could the TCP write buffer be too large that it sits there for more than 5 secs? Could it be the server RAM is under heavy pressure from handling hundreds of concurrent connections (who knows how many of these hundreds are sending out large files at that time). I am seeing the memory usage is dropping consistently on our production server until it hits a very low number and stays there.

I am currently looking at the TCP write buffer and thinking of reducing its max per connection to something small, say 128KB/connection. I thought of asking for advice here before proceeding with that since I heard that's a 'no-no-land-unless-you-know-what-you-are-doing' type of tuning.

This question is similar to this thread here, but I don't want to hijack that one:

http://www.gamedev.net/topic/676417-tcpip-fast-writer-slow-reader-algorithm/

The server sends them very fast


No, it doesn't. The speed is negotiated between the server and the client, initially based on socket buffer sizes (typically) and then based on actual measured throughput.
If you want to make the connection low latency, then you want to set the TCP_NODELAY option. You also want to set the buffer size of the connection to very small -- 4 kB, perhaps.
Note, low latency will also mean lower throughput.

Another thing to watch out for is "network buffer bloat" which can happen at a lower level (IP or Ethernet or even ATM) and can wreak havoc with the TCP throughput calculation/adjustment algorithms.

If buffering is large, then you can easily end up with more than 5 seconds of data buffered in the outgoing stream, and your clients won't actually see the packet until it's too late to respond.
If you have these kinds of timing requirements, TCP is likely the wrong protocol to be using.

If you HAVE to use TCP, then set up a maximum message payload size (1000 bytes?) and a maximum number of outstanding messages (3?) and build code that has queuing in the server, parcels up the data in sliced chunks of the right size, and the client acknowledges at application level to get the next packet. That way, you control the buffering entirely, and if there is a high-priority message to send, there are at most 2 other messages in the way on the pipe.
The draw-back is that your throughput will be low. If the latency is 400 ms and your total outstanding data size is 3 kB, this means you can only send 3 kB every 400 ms and thus only send 7.5 kB per second. (The term to Google for here is "bandwidth delay product")
enum Bool { True, False, FileNotFound };
Advertisement

Also, I don't see how your server could be running out of memory by just serving files to a few hundred concurrent clients, unless you're running on an Arduino or something; even the weakest raspberry pi could easily achieve that, and more. Are you loading the entire file in memory before serving it to the client? Don't do that, stream it from disk into the socket (unless you need to for some reason, but you probably don't). Then you will never need more than, say, 4 KB of memory per client.

If you're in fact generating these files on the fly through a memory-heavy process, then ignore the above.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”


Also, I don't see how your server could be running out of memory by just serving files to a few hundred concurrent clients, unless you're running on an Arduino or something; even the weakest raspberry pi could easily achieve that, and more. Are you loading the entire file in memory before serving it to the client? Don't do that, stream it from disk into the socket (unless you need to for some reason, but you probably don't). Then you will never need more than, say, 4 KB of memory per client.

No, all packets are sent in 16KB chunks or less. In case of sending large packets like a file, they will be split up into 16KB chunks at a time. At first I thought server app is leaking memory, but I have profiled it and tested it and doesn't exhibit even a byte of leak.

I had kind of put memory leak issue aside since it led to nowhere. Now this issue has brought back my curiosity of this possibly leak. My tests are obviously running only in a few minutes with fast clients. The leak doesn't really show after it's running for a few days. I am seeing the sawtooth pattern if I zoom out the mem usage chart to around a month.


No, it doesn't. The speed is negotiated between the server and the client, initially based on socket buffer sizes (typically) and then based on actual measured throughput.

Then why are we seeing the server "finished" sending a file almost instantaneously, but client is slowly receiving them? We run the server and client side by side with logs enabled and do a 5MB file transfer. At the end of file transfer, we output "file transfer complete" in the server log. Server outputs "file transfer complete" in 1 or 2 seconds, but client logs is slowly receiving the file over several minutes as it has to persist each chunk into local storage. I was expecting the server speed to match that of client's speed, but it wasn't the case.


If you want to make the connection low latency, then you want to set the TCP_NODELAY option. You also want to set the buffer size of the connection to very small -- 4 kB, perhaps.

I wrote this server app in Go, and Go's doc says it should be enabled by default.

The data has to be buffered somewhere. Thus my suspicion that this could be causing the sawtooth pattern in the memory usage above. Could it be that there is so much file transfers going on, even though I have split the chunks into 16KB, but with slow clients those bytes are buffered in the outgoing stream anyway? And this would kind of explain why some clients don't see the request coming after over 5 seconds.

I will look at the "bandwidth delay product" and see if we can implement something like that. It does look like we have to throttle the speed manually if we want to achieve consistency across all clients.

Keep in mind that depending on your underlying implementation of TCP, it is entirely likely that you can "send" N bytes of data that is seen as "done" by your application layer, but is still buffered underneath and waiting to transmit.

You have the right idea of measuring your transmission times, but you should be using a packet capture instead (either provided by the NIC of the server itself or directly on the wire between server and clients). This is the best way to see problems in TCP window configurations because you can deconstruct the PSH/ACK sequences directly.

Another way a packet capture will benefit you is that you'll be able to see traffic in aggregate instead of per-stream. This will reveal potential issues with saturation, for instance.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Then why are we seeing the server "finished" sending a file almost instantaneously, but client is slowly receiving them?


My guess is your socket buffers are too big. However, depending on topology, there could be other things buffering in the way -- anything from smart proxies to NAT gateways.

Anyway, you can't be having flow control at the application layer (only send at most 3 packets of 16 kB ahead of what the client has acked) and see 5 MB being sent all at once at the same time.

sawtooth


Sawtooth in what metric? Note that "free" memory on the system is a useless metric, because the system will start using available RAM for buffering/caching of file system and perhaps other things. This goes somewhat for the "memory used" by your process, too. RSS is a better metric if you want to track "actual" usage. Although with Go, another sawtooth comes from the runtime only cleaning up memory every once in a while -- the system thinks the program is still using it, because the Go runtime hasn't marked it as unused yet. This happens to all garbage collected languages more or less.
enum Bool { True, False, FileNotFound };
Advertisement

If you HAVE to use TCP, then set up a maximum message payload size (1000 bytes?) and a maximum number of outstanding messages (3?) and build code that has queuing in the server, parcels up the data in sliced chunks of the right size, and the client acknowledges at application level to get the next packet. That way, you control the buffering entirely, and if there is a high-priority message to send, there are at most 2 other messages in the way on the pipe.

This. While it is not exactly "controlling buffering entirely", it is the closest thing you can get over a single TCP stream (and BTW, for over-1-sec acceptable delays it works pretty good). EDIT: well, for BSD servers they say that SO_SNDLOWAT is working, and if it is indeed working, it should be a better option, but I've never tried it myself (yet?)


My guess is your socket buffers are too big. However, depending on topology, there could be other things buffering in the way -- anything from smart proxies to NAT gateways.

90+% chance it is socket buffers.

One further thought - one thing which MIGHT happen with TCP in mobile environments (eating up ports and all-important non paged memory) - is infamous "tons of sockets in TIME_WAIT state" problem. Try running netstat -a (or equivalent) and see how many sockets you have in this strange state. Most likely, it is not a problem, but when it hits - it hits badly :-( . If it happens - you MAY want to try playing with SO_LINGER (it is universally hated by netadmins as it causes valid-but-frowned-upon RST instead of FIN, but does help against TIME_WAIT).

This topic is closed to new replies.

Advertisement