Recently I asked a very seasoned developer on his personal recommendation for receiving packets, queuing, and processing them. To provide a little context, I asked:
Hi,
I've been studying Uru code. I had a question.
I'm looking for how packets are queued and eventually exposed to rest of game. Would you recommend running low level packet recv and send on a thread, while communicating to the rest of the game via event queue? In particular I wanted to see how synchronization primitives were used to make a higher level abstraction for game logic code to utilize.
I'm assuming there's actor model style encapsulation and messaging going on. But like we talked about its hard to see big picture by reading code due to distributed nature.
Thanks!
Randy
And got this response:
Assuming that you're going to be running many game contexts in one process, I have a complicated answer to your question, but the short answer is to receive on worker threads and queue messages onto a per-game queue, which is then scheduled to be run when the worker threads run out of input to receive. GetQueuedCompletionStatus with zero timeout works perfectly for this. There is probably a similar but harder way to do this with epoll, which I've read about but have avoided so far because it seems not as well designed as IOCP and kqueue. Packets should be sent directly from the game worker thread, not queued to be sent. This whole design in general avoids context switching.
The games themselves are essentially fat actors that run single threaded.
So it sounds like he's recommending to use a thread pool to receive packets as top priority. If no packets are ready, the worker threads do packet processing. Here packet processing is opening them up, decryption, or whatever else needed. Then they place these processed packets into a queue for the game thread to pick them up.
Sounds like a totally solid design. The worker threads are reused as much as possible. A more naive design would have jobs submit to a threadpool for packet receives, and separate jobs for packet processing. This involves potential context switches between jobs when the OS does scheduling. Another naive solution is to do packet receive + processing on one job, which does not prioritize picking up new packets, and will have likely context switches once each packet + process job is completed.
That's the context. My question is, are Berkeley sockets implemented as a slightly higher level layer compared to IOCP on Windows? Sort of like how fscanf does internal buffering of syscalls, is there a similar relation to raw Windows APIs for IOCP and the POSIX API?
I'm trying to gauge how high or low level my current POSIX implementation is. I plan on writing some dedicated game server code soon, but don't really want to pay for Windows servers, and was hoping to just keep my POSIX stuff as-is. However, also wanted to learn a little more about IOCP stuff. To understand perf tradeoffs, as I'll be paying for server costs myself.