particularly MMO and strategy games
Surprised this isn't in For Beginners. :-)
I'm thinking about starting with the following questions which lead to metrics:
Graphics (how realistic/fluid/beautiful? -- basically do they not get in the way?)
AI (how intelligent/realistic are a games agents/actors?)
Fun (how much do people like playing it? how bug-free is the experience?)
Modability/Agility (how easily can users/devs create/change game dynamics?)
Introspection (how much can one learn what happened in post-game analysis?)
Realism (how accurate are the physics built into the game at the right scale?)
Would love to find some papers or some other work that has been done to evaluate games and their associated tech.
Measure what matters and what can be measured.
For graphics: Do you have an objective measure for "beautiful" or "realistic"? Probably not, that is something an art director needs to manage. But you can measure things like frame rate in terms of minimum, maximum, and average milliseconds per frame. You can identify areas with a slow frame rate.
For AI: Do you have an objective measure for "intellegent"? Probably not. You can instead create rules for AI state machines, measure the complexity of the state machines, and allow a designer to modify the thresholds for transitioning between states.
For Fun: Do you have an objective measure for "fun"? Probably not. But you can ask testers what they think of it. you can have play test sessions where people try different activities. You can track metrics for things like where players get stuck, where characters die or fail, where people stop their play session. You can have metrics for where lots of items are changing at once, but recognize that good design has moments of intensity and moments of calmness, thoughtfully interspaced.
For Modability: Again, what you describe is hard to measure. Instead you could measure the time in hours to add new features and the time to modify existing features, the time it takes between finding new bugs, and other time measurements. If adding a standard feature commonly averages about four hours, it is better than a game that adding a standard feature commonly requires sixty hours. Of course not all features are equal, so consider that as part of your metrics.