After looking a little closer today, it just appears to my inexperienced eyes to be a throughput issue. I actually made a mistake in the last post. There's ~1000 quads, not triangles, and then another ~350 d3dxtext.draw's.
I setup some quick profiling/benchmarking around the meaty render() functions, as well as networking, sound... The first number is the number of 'calls' and the second the number of milliseconds the entirety took.
5/8/2006 2:10:08 PM - mem:5139596 - d3d_image: 408429 15280.49875/8/2006 2:10:08 PM - mem:5139596 - d3d_text: 240200 12158.33345/8/2006 2:10:08 PM - mem:5139596 - Blinky: 734 20.20635/8/2006 2:10:08 PM - mem:5139596 - AFE: 17899 214.05265/8/2006 2:10:08 PM - mem:5139596 - mu: 11854 91.80565/8/2006 2:10:08 PM - mem:5139596 - fps: 11854 57.34445/8/2006 2:10:08 PM - mem:5139596 - netdisplay: 11854 76.8665/8/2006 2:10:08 PM - mem:5139596 - Ellipsis: 131 1.65415/8/2006 2:10:08 PM - mem:5139596 - oob: 11854 60.78775/8/2006 2:10:08 PM - mem:5147788 - Total render: 11854 29395.65885/8/2006 2:10:08 PM - mem:5147788 - Sound: 11854 314.9575/8/2006 2:10:08 PM - mem:5147788 - Network: 11854 5633.32755/8/2006 2:10:08 PM - mem:5147788 - App Time: 40069
The first group are the set of non-trivial renderers [that is ones that do work rather than just checking visibility and rendering children]. The last are rendering queued execs that need to be done out of frame, the total time in root.render(), total time in sound.Update(), total time in network.Update(), and total time between the top and bottom of Main.
In an ideal world, the top times would sum to "Total render" time; and the bottom four times would sum to App Time [plus time GC'ing, Initialization, thread switching to server, and handling user input]. They come close [27897 / 29395 and 35402 / 40069], so I'm inclined to think that I'm not missing some giant slowdown somewhere.
The average time per call seems reasonable too:
d3d_image: .038d3d_text: .050Blinky: .027AFE: .011mu: .007fps: .004net: .006
But just with the numbers:
1000 images * .038ms/image + 350text * .050ms/text = 55.5ms
1000ms/s * 1 frame/55.5ms = 18 frame/s
Pokey.
[boy is it nice to see the calculations back up the percieved performance!]