Why not use the debugger? Unfortunately, the startup crashes seemed to kill the app even with the debugger attached, which is why I expected something external was going wrong at first. Oh how wrong I was, but that's getting ahead of myself. After refactoring though, the crashes were happening later as well, not just during startup, so I decided to see if the debugger could catch them again. Fired it up, waited a little bit, and:
FatalExecutionEngineError.
That's... not good.
Especially when that was only the error half the time. Other times I'd get NullReferenceException from things that can't possibly be null, StackOverflowExceptions when the callstack couldn't possibly be that deep, "this" reference being the wrong type or even missing altogether, or maybe just a good old fashioned AccessViolationException. It was absolute insanity.
The managed heap was obviously becoming deeply corrupted somehow. The emulator core has a fibers library to implement coroutines for synchronizing all the various parts of the system, which is part of the reason I was interested in it. I've always been a fan of coroutines, and don't think they get enough play in general, but I digress. I was wary in the beginning of what effect that kind of thing might have on the CLR, but I'd seen articles around that illustrated how to use fibers with .NET, so I perhaps it wasn't a problem. The coroutine library, in addition to implementing them in assembly, also has support for just using the windows fiber API directly. Just to make sure nothing was going wrong there, I switched it to use the windows fibers version, but no joy. Content for the moment that the problem wasn't there, I turned back to slimdx. I commented out the video code; still crashing. I commented out the audio code; still crashing. Updated to the latest version of the emulator source; still crashing.
What the FRACK. Seriously.
I was failing utterly at debugging. Maybe I'd have to use windbg or something and look at the core dump, but I'm just not that hardcore. Without any kind of clue as to what was REALLY wrong, I'd just have to shelf the whole thing and go back to working on the Marble4D editor.
Giving up on it for the night, I tried to get some sleep (which REALLY sucks when you have a mysterious unresolved bug). In the morning I took a step back and tried to reason through it. I just KNEW it had to be something relating to fibers, since in the beginning I was unsure if it'd work while hosted by the CLR at all. Perhaps there was something to that.
Wouldn't video and audio always have crashed though if that were the case, since the events are triggered by the core? Well, video refresh is triggered after the coroutine scheduler comes back to the main thread, and the way I was buffering up audio, I have the video refresh callback in the native interface also fire the audio event with all the samples buffered during that entire frame. If not those, then what...
Then it hit me. In one of the obscure crashes, the "this" reference for my worker thread manager object was corrupted and listed as a DevicePolledEventArgs.
INPUT! The emulator core waits as long as it can to poll input, to get the very latest state. The implication of this was that it was happening in a fiber, rather than the main thread. The input request triggers a callback that crosses the managed barrier back into my app, which fires an input event with a DevicePolledEventArgs. One of the things I changed in refactoring was having the input event create a new one each time the event fires, rather than reusing the same one over and over. I was getting that strange feeling where you just KNOW that the problem is there, even if you're not sure exactly why. I commented out the input callbacks, crossed my fingers, and ran it again.
No crash.
I ran it again and again and again and again both from inside the IDE and from explorer. Still nothing (yet). It would seem that fibers and other forms of "fake" threads interfere with the managed heap and the GC, at least in beta1. This is a definite case in support of my personal programming mantra: a brain is the best debugger. The unfortunate implication is that I'll have to poll input before the frame starts, rather than waiting until the last possible moment, but them's the breaks.
If anyone reads this and is an expert on fibers, I'd be very interested in your thoughts. Is calling back into CLR code from a native fiber known to be unsafe? Am I just doing something wrong?
At any rate, once I have some kind of input up and running, I'll post some screenshots to reveal what system and emulator specifically that I'm wrapping. I'll probably keep working on this mostly until around the new year perhaps, then switch gears back to Marble4D with a fresh outlook. Hopefully be able to get some kind of simple demo ready for DBP 10. It's definitely nice to crawl out from under all the limitations imposed by the 360 clr for a while and use more features and APIs. Definitely learning a LOT from this, and SlimDX rules. :)
Yes. Fibers aren't officially supported as part of the CLR, although attempts have been made to make them more compatible however those attempts were cut because with the light threading model the CLR already implements, fibers just aren't compatible.