JoeJ said:
You say Russian Roulette is needed to avoid bias. Is this because RR could - in theory(?) - generate paths of infinite lengths to get infinite bounces? Can't be. (I'll never know the precise meaning of ‘biased’ in PT : )
In path tracing we are generally looking for ACCURATE solution of light propagation through the scene into the eye. That means that eventually at infinite samples we would reach exact results (now, this will practically happen sooner than in infinity - as we have limited precision of how we display color!) - i.e. with unbiased rendering the error compared to ground truth is 0. In terms of biased rendering - we trade some accuracy for bias - we will never converge towards actual result (but - the bias error can be consistent - usually in form of blur or such) which still makes such methods viable - you compute faster at the cost of accuracy.
Mathematically speaking - our path tracer is an estimator - expected value of unbiased estimator is the population mean, regardless of the number of observations. Error in each estimation is therefore only due to random statistical variance (high-frequency noise). Variance is reduced by sampling n times (std. deviation is sqrt(n)) - i.e. to reduce standard deviation of error in half we have to sample 4 times. Biased estimator introduces a bias error in effort to reduce high frequency noise (i.e. you will never reach accurate value, but only reach it as close as your bias error is - the main point here is, as long as your bias error is consistent it is often considered “good enough” for some applications).
JoeJ said:
Let's say we support 4 bounces, and so no need for RR. Further assuming each traceRay call takes a similar amount of time, threads should not wait on each other too long.
Just fyi: If we support 4 bounces, such estimator is by definition biased already.
Each traceRay can't take similar amount of time by definition - the scene would need to be same-complex everywhere (only single-primitive scenes without acceleration structure, or things like spheres with infinite radius would fit). I'd like to point you at my traversal code (something similar will happen under the hood) F.e. here https://pastebin.com/43yRFBHu - the requirement of stack-based traversal and generic scene determines that time taken in traceRay will vastly differ. This was quite in-depth described here - https://research.nvidia.com/sites/default/files/pubs/2009-08_Understanding-the-Efficiency/aila2009hpg_paper.pdf
JoeJ said:
This applies to individual samples, but not to the entire signal we try to integrate. We can imagine the signal as an environment map seen from the cluster of our pixels with similar normals. The light paths give us some information about this environment. A path that is good for a single pixel, likely is good for all pixels in the cluster. That's what i've meant, not the other problem caused be random eye path after the first hit. Sounds you have tried this idea, but with a larger number of light paths stored in VRAM. At a larger scale, it reminds me on Photon Mapping a bit.
Exactly. It was interactive back in 2014 or so - with today's GPUs you could probably get to real time framerates (although - the increased resolution on displays plays a bit against you … going from 1080p to 4K means that you need 4-times computational power and memory - GPUs back then had like 1GB - 3GB memory if I'm not mistaken (Titan Z had 12GB)). Nowadays, I've got RX 6800 here with 16GB of memory … computational power is MUCH bigger though.
My idea was - the light paths can be generated completely separately and regenerated when needed, this saves you from tracing all light paths all the time completely (you could possibly also regenerate more often the ones that are not yielding any connections, not just the ones impacted by animated geometry/lights if there are any). This can save quiet a lot on light paths part - plus should yield better results for interactive rendering (with some filtering you could even get smooth image in real time).
JoeJ said:
In Tabys example there are two paths per pixel, which is already a big ray budget. If every pixel needs to trace one more ray to 63 other light path vertices, or even #bounces times 63, that's surely too much. : )
The main problem to which this boils down is, who generates the single light path for block? If it is the same workgroup that solves then eye-path and connection - it doesn't make sense, as the workgroup still needs to wait for the single thread that generates the light path first. The whole time workgroup takes would be dependent on whether light path is going to be “short” (i.e. minimum number of bounces, going through minimal part of acceleration structure) or “long” (i.e. many bounces, going throughout “whole” acceleration structure). Keep in mind that remaining 255 threads have to wait during that step - they have nothing to connect to at that point.
This being said … you could possibly use 1st thread to calculate light path, and let other threads calculate their own eye paths … once the 1st would finish, rest would do connection steps. You (though) won't have any results for 1st pixel (but you could just always use different thread to be ‘light’ one … and after 256 samples you have 255 samples per each pixel).
The other approach would be to start a kernel where each thread would generate N light paths (N being number of your tiles in viewport) - and then each tile would just trace eye path and connect to that single light path. The problem at that point is - you still store the light path in global memory.
…
This all being said - there are generally 2 approaches, first being “megakernels” (where you do whole single-sample path tracing within a single kernel) and another being … well … I can't remember the name, let's say "iterative" (where you keep paths alive, resolve, and restart them once used … each time your kernel runs, you advance each path by single segment). Both are applicable for bidirectional path tracing - yet the latter ends up in higher occupancy. Although I'm probably going here too deep and too far for Taby (sorry for the spam!).
EDIT:
taby said:
Please let me have some time to go through your comments. I thank you for the plethora of ideas to try (and not to try).
Feel free to, I hope I don't sound too rude in the comments - if I do, then sorry. I literally wrote what I had on my mind when reading the lines and trying to understand the code. If you want any clarification or such - feel free to ask, I should be available.