Thank you guys for the input.
I was reading into the whole NIO stuff again for the past few days. I even got it working (to some extent) with the game client. But i have to say, it's hard.
(Compared to the thread per client approach.)
So i managed to merge the non-TCP player threads into one single thread which iterates over all registered players and sends data back and/or processes other stuff (validating coordinates, etc...).
Now the TCP side...
I also merged all TCP threads (thread per socket) into one NIO thread, but it gets complicated very fast.
First, the thread is reading the data from the sockets which then has to be temporarily buffered in a seperate bytebuffer. (As i could either recieve only a fragment of a full message or multiple messages at once.)
Then i have to loop over this buffer and parse the data in order to find out if one full message was actually recieved and is stored in that buffer.
Then i have to copy that message out, create a message object, put that byte data into the message object and send it to the other thread which then responds/processes this message.
The bytes in the temporary message buffer then need to be shifted (to fill the gap where the full message was "cut out") in order to avoid breaking the parsing process.
All that stuff seams kinda heavy weight for a single thread which needs to handle up to 64 connections at once. (Especially the whole parsing, and message creating process.)
My idea was to create a seperate thread which does the parsing/message creation process alongside the TCP listener thread in order to split that load on 2 cores, but then i need additional locks to avoid any kind of race-conditions or errors due to the message buffer which would be shared alongside two threads. (And i have no idea if the locks would make the situation even worse if they interfere with each other too much.)
I have no idea how to proceed further. (I'm really considering ditching that.)
There are 3 options now:
a) try to proceed further with the NIO stuff (which is kinda complicated and in addition to that i really don't have any clue if the general server architecture which i'm building is that much better compared to the thread per client stuff. > Lack of experience and knowledge on my part.)
b) Use the thread per socket model but merge the other non-TCP threads into one single thread which iterates over all registered clients. (The server would then have a maximum of 64 threads for each socket + 3 to 4 other threads for all the other systems which are running in the background.)
c) Revert back to the original architecture (1 reader and 1 processing/writer thread per client. Total number of threads would be 64*2= 128 + 3 to 4 other threads.)
It would be a good idea to benachmark that stuff. (But i have really no clue how. I never wrote any kind of servers before nor bencharked a server application. Do i have to write a software which emulates 64 clients? Is this the common way of doing things?)