I love how this thread ended up in premature optimization land.
Frankly IMO it kinda started off there ;-)
I love how this thread ended up in premature optimization land.
Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]
I thought the key system 'thread' overhead was the locks and such of making a system level call
You don't need locks just because you have threads; it depends on your data structures.
Similarly, you don't avoid the need for locks when you use fibers, unless you apply very strict rules to when you may call fiber-switching functions. Which means things like, "don't do I/O in fibers," which ends up being too restrictive for most cases.
The benefit of fibers is that re-scheduling to another thread/fiber is cheaper in userspace than in kernel space. That's it. If you are drowning in context switches, then fibers may help. But if that's the case, you're probably threading your application too finely anyway.Mechanisms (data) for 'fibers' dont have to be on the stack
You are in a fiber. You make a function call. Where does the state of the current function get saved? Where do locals for the new function you call get allocated?
Separately, that other function you call, may call a yielding function. Where does the OS save the locals of each function on the call stack while it switches to another fiber?
Of course fibers need stack space. You may allocate that from the heap or other unallocated memory regions for each fiber, just like the OS allocates stack space for threads from unallocated memory space.
When a game object is a thread it has to operate as a thread for everything it does (overkill for some uses) ... ie the stuff you mention.
SO I had LOTS (10000s) of objects of limited AI ability (hierarchical FSM control logic, some shared read-only mappings) each with its data (state) restricted to the base objects structure (heap) or a few with linked blocks (from free pool etc..). IO handled by another thread with relevant data injected lockless into the object processing thread (its all sequenced and data protected that way so no locks...)
They would run in a lock-step simulation. I knew the objects could run sequentially (msg signalling when interobject events happened - reaction on next 'turn', with all function calls completed before moving on to next object - no blocking, and all behavior logic Scripted with closely enforced 'sleep' patterns) thus within one thread handling them all (or a seperate core's thread for a second set if scaling was needed). SO no locks required here.
Its a specialized situation and was able to avoid alot of the overhead of more generalized solutions (the performance required the optimization)
I also did fewer (100s) higher AI (planners, magnitudes bigger behavior logic, etc..) objects, but this was done in a similar way (no per object thread/no blocking/no stack based state data/lockless as much as possible) again for the optimizations needed to achieve the size requirement of the game simulation. The logic WASNT closely tied to IO activity (data from that was fully metered and controlled as part of the overall design)
-
Might the above be called 'flat' fibers ??
Not that everything should (or can) be done this way, but depending on performance needs (and circumstances allowing it ) doing this kind of optimization might make the difference in having the program's goals achievable.
Might the above be called 'flat' fibers ??
Might the above be called 'flat' fibers ??
I don't understand the description of your system.
A fiber has a stack. Switching fibers involves saving the CPU registers to some memory, restoring other CPU registers for another fiber, changing the stack pointer to the new fiber, and finally jumping/returning to wherever that fiber was when it yielded.
It sounds a bit like you're simply describing having a lot of objects that get "poked" or "stepped" by a main thread, pulling state as needed. That's just object structure, not fibers.
'flat' would imply doing very little in its context switch
well if not systems-level or language level constructs (though higher level 'script' Macro-overlay-implemented compiled to native code - ).
Implemented in a library independant of the the objects code
whatever then all that might be called (not Green Threads as its not a VM)
'tasklet' would seem being something lower called from the 'objects
Stackless (no register caching either)
non-concurrent, non-preemptive lockless, soft blocking (resume from a 'yield' is handled differently)
only 'co-routine' as far as they all run under the same thread+process
soft scheduled (the main process)
I dont know about 'pulling state' as they are selfcontained/independant (fibers 'pull state ' dont they - one way or another)
anyway alot of overhead chopped out for a specific use in driving object behavior.
coroutines would be a better name - they're the general concept of parallel calls, whereas fibres are a specific mechanism for implementing coroutines.
. 22 Racing Series .
premature optimization
Not so much premature optimization as much as 'design considerations' (measure twice, cut once), when investigating the problem early, might save ALOT of redesign/retrofitting later.
I suggested some kind of quantized bucket chain which might lessen processing if sufficient preknowledge of the timers intervals/patterns can be well understood. (ie - are many Timers many seconds or longer ???)
You Quantize by when the future timers are being due - coarse (long delay) put into the array of buckets (just a link list top insert -- unsorted within those interval buckets). And when each bucket set comes current (now a short time before due) THEN the contents of that bucket get fully sorted/merged into the immediately due list.
That at least minimizes the size of the data set having to be constantly Sorted/Inserted into/Pulled from the priority queue. When there are thousands of timers it might pay off more than a little IF the ratio of 'long delays' is high enough against short delays.
The bilevel system obviously adds complexity and has to be customized (over just using some standard priority queue object).
Preknowledge of the frequency/overhead of timer turnovers might decide if its worth implementing to profile it.
premature optimization
"Selecting the right algorithm" is not premature optimization; it's designing for performance.
Performance is a feature. You have to plan and build for it, at the high level.
The famous quote about premature optimization was in the context of worrying about things like whether a particular if statement would emit one extra instruction or not, and dropping to assembly to avoid it.
Hoare and Knuth would ABSOLUTELY tell you to choose the right algorithm for the particular problem you're trying to solve.
Finally: "add the element at the end, then sort the new array" is also bad advice, unless your sorting algorithm has explicit optimizations for sorting already-mostly-sorted containers.
And every time your computer crashes, it's often caused by someone who chose a higher-performing and harder to maintain system over a safer and adequately-performing one (eg. using C or C++ when not needed).Every time you have to wait for a laggy computer, it's probably caused by someone not understanding the difference between "premature optimziation" and "designing for performance."
this is a narrow interface, so try the simple thing first, and if it doesn't scale, swap it out
typedef std::pair<my_time_type, my_action_type *> queue_item;
std::priority_queue<queue_item, std::vector<queue_item>, std::greater<queue_item> > my_timer_queue;
// inserting in queue
my_timer_queue.push(queue_item(time, action));
// checking which items should run
while (!my_timer_queue.empty() && my_timer_queue.top().first <= current_time) {
auto ptr = my_timer_queue.top().second;
my_timer_queue.pop();
ptr->timer_expired();
}
This will manage a heap in a linear container (vector,) which additionally has the potential cache benefits suggested above.every time your computer crashes, it's often caused by someone who chose a higher-performing and harder to maintain system
Wow ! I was on vacation and came back some weeks ago but I forgot this topic !
I like to see how it evolved from managing timers to (prematurely) optimizing and (precociously) choosing the right algorithm.
Well, I've come to the (provisional) conclusion that I don't need timers to perform a lot of actions I previously though about :
- There's no need of a timer to rearm a shoot action after some delay time : write only the time "now + delay" when you shoot, and always test for "now > this stored time" when you shoot again. That's true for all timed actions and for the exhaustion of any buff / DOT / and so on.
- Environment events are way rarer and fit quietly in a priority queue.
- And so there is no need of a dedicated thread to run this stuff : that is good news !
Thank you all for all your inputs !