IOCP critical section design problem
There are several design solutions to deadlock described in typical undergraduate literature -- check a good textbook like Tannenbaum/Woodhull or similar.
In brief, there are three options:
1) Define a strict ordering for your locks such that to hold lock B, you must always first hold lock A. (Or alternatively, to hold lock a, you must never hold lock B.) This means that threads will be blocked on the earlier shared lock in the operation, and thus won't dead lock -- just wait.
2) Use a deadlock detection algorithm. This is easier in the case of, for example, file locking, or database row locking, but can also be done in mutex locking. Apply some deadlock-breaking solution when a deadlock is detected; for example throwing an error and failing one of the two deadlocking threads, letting the other thread continue.
3) Only hold locks for very short time intervals (to protect mutable data structures) and use some other mechanism to serialize order-of-events. For example, change user mutation to a messaging system -- each user has an incoming queue of messages, and those messages are processed in order. If user A wants to affect user B, post a message to user B, rather than lock the data structure.
Threading is one of the hardest things to do well in programming. Period.
Locks are both prone to hard but complex failures (deadlocks) on the one hand, and extreme performance hits on the other (limited number of locks to prevent deadlocks).
I'm personally fond of option 3 (above) in the case of network code, for a couple of reasons. First, you are spending as little time in the lock as possible, so you might be able to get away with a single "I am about to create a message" lock. Second, you have a single inter-thread communications system. The first one makes it less likely you will have programmer error, and the second gives you a single high-risk component to test the everliving-hell out of.
You need to use thread synchronization using SLIM_READ_WRITE structure it is faster than critical_section structure according to (Jeffrey richter and Christophe Nassare in the book Windows VIA C++ page 224)
Example :
SRWLOCK g_srwMapUsers;
AcquireSRWLockExclusive(&g_srwMapUser); // lock the resource if antother worker thread try to enter this section of code
// he will be in a wait state and your thread will be in safe
// do some instructions to access a shared resource
ReleaseSRWLockExclusive(&g_srwMap); // release the resource and wake up another thread if there is a thread in a wait state!!!!
There are quite a few locks in there, do you really need that many? Also, your lock must be recursive or this will deadlock as it stands (locking current user twice). Not all mutexes are recursive by default.
I would have one lock to protect the list from user adds and removes while you are iterating the list, since this is a pain to get right in a lockfree manner. The messages to individual users, I'd just add to lockfree queues (or, if you can't implement a proper lockfree queue, use a micro-spinlock per user (testing and setting a bool) which is mighty fine for a mini-task such as adding an item to a queue).
If sending a message to a user actually involves sending (that is, the thread doesn't just add the message to a queue and another worker sends all queued messages) then I would probably iterate the user list only to collect shared references to all users (basically making a copy of the list). That keeps the time the lock is held short. Sending something over the network can take indefinite time, and holding a lock for indefinite time is calling for trouble.
Also, if other threads are processing the sends-to-users, consider reader-writer locks as suggested by Taylor Softs.