My server is serving a couple hundred clients, and I am noticing some differences in how these clients are being served, and wonder what are the fine tuning we can do to make the connection more robust and consistent across all of client devices.
A couple of background stories. All clients are running on embedded devices with limited RAM and CPU, and slow storage speed, but they can connect to high speed connection.
We had a problem earlier when sending large files. We are noticing that it can take a while for the clients to read data from TCP and write it to its local storage. The server sends them very fast, and it all seems like it's been sent out even though clients are slowly receiving them. So we implemented an ack packet from the client after it has finished processing the file so the server can proceed with the next packet.
Now we are seeing a different problem. Server implements a 5 second timeout for certain types of packets because they require an almost-immediate ack from client. These are not very big packet, around 100 bytes. I thought 5 second delay is pretty long for Internet connection nowadays especially for 100 bytes packets. On some clients they never made the 5 second timeout, but on some other clients, they could hit this timeout more often.
We have certainly seen two different behaviors on the same client, so client's hardware should not be the issue here. Internet connection should also not the issue here as we have tested and seen these in office with high speed connection.
I wonder why?
Could the TCP write buffer be too large that it sits there for more than 5 secs? Could it be the server RAM is under heavy pressure from handling hundreds of concurrent connections (who knows how many of these hundreds are sending out large files at that time). I am seeing the memory usage is dropping consistently on our production server until it hits a very low number and stays there.
I am currently looking at the TCP write buffer and thinking of reducing its max per connection to something small, say 128KB/connection. I thought of asking for advice here before proceeding with that since I heard that's a 'no-no-land-unless-you-know-what-you-are-doing' type of tuning.
This question is similar to this thread here, but I don't want to hijack that one:
http://www.gamedev.net/topic/676417-tcpip-fast-writer-slow-reader-algorithm/