I wouldn't bother with anything beyond stereo unless you specifically want to support those speaker layouts in your game. I'd rather have a really nice stereo monitoring setup (and room treatment) than a less good 5.1 system. Atmos is not a good fit for games, and few people have such a setup anyway, so it would have to be mixed down to 5.1 or stereo or binaural anyway in most cases.
In the space I work in (VR), binaural audio delivered over headphones is the primary listening method. Generally mono audio is used for most sound effects, which are spatialized using an HRTF rendering system. For ambiences, 4-channel ambisonics is very common. Those will be converted to binaural audio by convolution with an ambisonic HRTF decoder, and take into account the motion of the user's head (unlike stereo ambiences). Ambisonics is great because it is independent of the listening setup. It can be decoded to any speaker layout, unlike Atmos or other surround formats.
If I were building a studio, I'd spend my money on nice monitors (e.g. genelec 8351B). For accurate listening, it's very important that your monitor system has a way to calibrate itself to your room/desk and remove low-mid resonances. Otherwise you will not get an accurate picture of frequency content, and will tend to remove those resonances with EQ, which will degrade sound quality in other listening environments. You want the most flat linear system you can afford, combined with room treatment to keep RT60 as low as possible.
I'd also get at least a few nice pairs of headphones, especially if you want to do binaural audio. Almost no headphones have a flat response (due to physics limitations) so I wouldn't use them for doing EQ unless you can insert a calibration EQ on your master output (not easy to do unless you have very good ears or a measurement setup). I wouldn't mix in headphones either. I'd only use them to check the content after mixing on good monitors.