scruthut said:
I also want my threads to sleep when not executing objects, which means they would be idle. Creating more threads might be beneficial on my part with this system.
Usually it's best to create as much threads as you have CPU cores, but not more.
Each thread grabs a task, ideally with atomics to avoid the cost of a mutex, processes it and when there is no more work it goes to sleep.
If you have 1000 workitems, that's much faster than creating 1000 threads.
scruthut said:
How would I go about calculating the allocated time for each object that is weighted based on priorities?
What's the expected profit of estimating in advance how much time the job takes?
You can ofc. gather statistics from current work, which you then use to distribute future workloads better, e.g. to keep the application responsive or match some target framerate.
scruthut said:
Also you can place as many copies of the objects in the lists if needed to be done for increasing priority.
If there are multiple instances of a workload, you likely get additional synchronization costs to see if it was already processed from another list / thread. Those costs can be higher than the expected win.
scruthut said:
I do know there is a time trade off for creating threads and it comes with a penalty.
Yes, and those costs are high. So why not a standard job system, which creates one worker thread per core and keeps it alive all the time?
scruthut said:
I plan to use the pipeline model for parallel computing. I will handle the culling of renderables in one thread, then matrice mathematics in the second, and generating normals in the third thread and render the buffer objects from the fourth thread.
Ah, i see. For that a job system isn't needed, and generating just 4 threads should work fine.
The alternative (which i had in mind til here), using a job system could look like this:
All threads work on the culling. Each job culls N objects before the thread picks a new job (requiring to figure out a good value of N).
The main thread prepares the matrix workload while the woker threads are busy, then it goes to sleep, waiting on the culling to finish.
Then the worker threads start the matrix work and so on.
Advantages:
The system scales automatically to any core count.
No need for frequent synchronization, assuming the jobs are lock free.
Because all threads do the same thing, cache utilization might be better. (But can also be a disadvantage since CPU needs to synchronize caches and memory across multiple cores, and multiple cores accessing the exact same memory can cause a bottleneck.)
But that's just a proposal to consider. I have little experience with comparing these two aproaches of multithreading. I always use the fine grained job system aproach, so i can't tell. It's surely possible to mix both approaches as well.