Reading through your discussion points suggests to me that you are optimizing the wrong problem in general. Atan should not be in your list of problems, more than likely your problem is simply one of not stepping back and optimizing how often you are calling various functions at a higher level. I.e. the quote: "so far i've gone to selecting target (which requires a range check to every active target) once every 5 seconds" should draw your scrutiny and atan in general should be ignored. Instead, what you probably want to be looking at is how to remove calls to the range check in the first place.
The first thing that jumps out at me in the mentioned quote quote is that it is a description of a very bad "O" notation processing time curve. Your curve is basically number of calls versus number of objects or: 2 objects means 2 calls (i.e. distance from me to you and the inverse), 3 objects means 6 calls, 4 objects means 12, 5 objects means 20 calls etc. That's an "O( n^2-n )" pattern which is very bad. By trying to update less frequently you are reducing only by a linear amount and as such, no matter what you do at some point the number of calls still drags everything to a stand still. What you really want is something to get rid of the "n^2" portion of this problem so you are only calling the code in question once in a while, or not at all preferably.
Before I suggest my solution, I'll make a specific point:
atan(2) is not really evil, it's not even that slow unless you call it in massive amounts. What is more evil is that it is often a function where you are getting a cache miss since it is used to figure out an angle between this object and another object and that other object is unlikely to be in cache at the moment. A profiler will often glob up the cache miss into the atan call since the compiler will often make the actual "access" to the other object while setting up the stack prior to calling atan(2). Usually this means you are seeing a cache hit as included with the atan(2) call and miss-interpreting what is actually slow. If you see atan(2) in your profiles, check the call count, if it is very high then the cost is likely true, if it is a couple hundred, you are likely looking at a cache miss and not a function call problem. (Per frame that is.) Profiling is critical but also reading "between the lines" is exceptionally important since cache misses are more of a black art to figure out than a gospel from any profiler or guess you can make.
Anyway, my suggestion for you is to stop optimizing things the way you are because your first goal seems like it should be to reduce the O notation issue. (I.e. the n^2 portion of the number of calls I described above.) You also want to reduce cache misses which suggests you need to step back and work the problem at a higher level. The higher level would be SAS (spacial awareness system) which works over all objects and only triggers your AI bits as actual "entry/exit" events happen. An SAS is optimized to correctly figure out the distances between objects and trigger events only when an object properly enters a given range, without going n^2 in processing time or incurring the dreaded cache misses constantly.
You might want to check out: http://codesuppository.blogspot.com/2008/03/spatial-awareness-system.html for an overall description of several variations. I tend to use a sweep/prune method myself as flocking is part of my simulation and while it is slower overall it doesn't get the big performance spikes of other solutions when many objects move together.
Given the problems you seem to be facing though, working the problem at a higher level seems to be required as the current direction is not likely to be good enough without sacrificing a lot of game play. My personal unoptimized single core version deals with 5k objects at 500fps easy in debug, while release is 1700+ (stl usage in debug sucks). Given something like this in your game, you would simple start making something, add an awareness bubble around the character and if there are any "entry" events for enemies, go back to normal speed. No more n^2 issues and optimizing small repeatedly called functions won't be a big issue anymore.