OpenGL path tracer in C++

Shawn Halayka · 2025-01-14T23:39:30

Does anyone have experience with path tracing using OpenGL 4? Do you have a favourite implementation of such a thing? Thanks for your time and expertise.

Math and Physics Programming C++ OpenGL pathtracer

Started by taby January 05, 2025 02:26 AM

18 comments, last by Vilem Otte 1 week, 5 days ago

Vilem Otte

3,390

January 08, 2025 04:28 PM

JoeJ said:
Agreed, but which are the problems that affect you?

Dynamic geometries and acceleration structures (even when we talk about something like nanite) … my most naive implementation some time back used (in compute) HLBVH rebuilt each frame. Since then I've moved to pre-built trees for clusters and assembling BLAS from those. It's not miraculous and would need a lot more time invested from my side to improve (which I'll try to do Q1 2025)

HWRT is just as the naive implementation (yes you can group clusters together to make it “a little better” and not get too much punished by alignment costs, etc). But it still sucks. Vendor locked solutions might come (NVidia already presented one at CES - but honestly that's useless and bad … vendor locked and I assume only for UE5 … not to mention UE5 performance is terrible everywhere) … adopting such to core? The party (NVidia) guilty the most for fixed function in ray tracers is trying to fix the problems by introducing more fixed functions? That's like extinguishing fire by pouring tanks of gasoline into it.

Caching acceleration structure is a joke (you can't do that in any reasonable way).

Intersectors on NVidia gpus are on tensor cores (ray-triangle) and are meh. Try custom intersection code … now you're literally f***ed.

Is it usable? To an extent yes. Is it that much faster compared to compute? No, not really.

JoeJ said:
Maybe it would be better to write one backend per vendor, than trying to have crossvendor APIs for the price of compromised performance and redundant complexity.

This would become a nightmare real fast. Especially when you add new GPU vendors into the mix (the ones from China like Moore Threads). They often keep documentation in Chinese only … CUDA is already a disaster in which everyone is caught (which also counts everyone remotely related to hallucinations … er .… "ai").

Vendor-locking works only if there is no monopoly. If there is any monopoly - that means to literally 0 progress. Have you seen what NVidia presented at CES? 5xxx generation brings almost no actual improvement (hallucinating 3 frames instead of 1 (both is bad) and advertising it as “better performance” is just laughable; they could do the same with dlss in 4xxx - but ofc that will be locked out as they have to sell that minimal improvement in hardware). And of course the price shift upwards (everyone expected that as they are monopoly (>80% share) in many areas now). I wouldn't say HW progress is dead - NV doesn't need to try to sell - so they don't. Add few more tensor cores to make text generators happy, 40% price increase and we're selling!

Their main competitors? AMD are lunatics as usual - who spent more time in renaming their GPU series, than actually fixing the bugs they have (I'm ranting about their wrong amplification shader implementation in D3D12, where nothing is passed into indirect dispatch of those!). Intel hasn't still recovered from melting their own CPUs (although I have to admit Arc is finally becoming something now).

Speaking of Intel - the "nanite" BVH idea I'm playing with is similar to what they have tried in embree for similar dyn lod meshes. They made a paper about it few years back.

JoeJ said:
It's not enough to just complain about missing flexibility, we need to make specific points and proposals i think. Might happen behind closed doors, but ther should be public discussion as well imo.

Our solution is about as crazy as you can think - dual support of compute and hwrt. Sometimes hwrt wins in performance, sometimes loses. The generic gpu programming tools at this point are enough for us to do everything in compute we need. Addition of work graphs is nice and great.

Of course upkeeping the code is a bit harder due to more classes/code … but it's nothing terrible.

Proposing any solution? It's hard - because from experience various scenarios require different solutions/approaches. No one was rushing to generalize that because it is very hard.

JoeJ said:
5 years have passed, Indiana Jones is the first game requiring HWRT, but there were no improvements on the API side at all.

Players have not adopted the hardware, and requirements of games skyrocketed. UE5 games are incapable of running 4K@60fps on top-end hardware (and we're talking 2.5k+ EUR GPU like RTX 5090). Hallucinating frames is bad (latency issues, temporal smearing, etc.) … and you can see that in sales of those games (there are exceptions like Wukong).

You can't reasonably add HWRT at scale into this.

Also I'm not sure whether HWRT and DLSS4 won't bump into each other - tensor cores used for intersections are the same ones used for hallucinating data. This could be a problem.

JoeJ said:
Currently i think the fix will only come when DX12 and VK get phased out and replaced by new APIs again, which is something that happens only each 20-30 years it seems. I'll be dead til then, so using HWRT inefficiently seems the only way to use it at all.

Hold in there buddy! You're still young.

EDIT: Sorry if this sounds as quite a rant - wrote that while messing with some vendor-specific branches in code, which is my favorite thing…

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

JoeJ

4,395

January 08, 2025 06:34 PM

Vilem Otte said:
The party (NVidia) guilty the most for fixed function in ray tracers is trying to fix the problems by introducing more fixed functions?

Haha, sure. Moores Law is dead, so the only way to achieve further progress is HW build specifically to address those performace bottlenecks.

That's NVs strategy, and their marketing is so good that almost all people believe it. The sheep cause the success, which then confirms the idea.

It's exactly this strategy why i hate NV so much. My idea is the opposite: We need flexible general purpose hardware, so we can implement whatever we want, as efficient as possible. Only then we can innovate and come up with new things.
Fixed function is always just a heldback, causing stagnation on the software side.

But always when i try to say that, people think i would want to destroy their worshiped hardware accelerators, erasing their nostalgic memories on when the 3Dfx god came down to earth to give us Doom and Quake, or something like that.

The sad thing here is: People do think it is the HW that creates their gaming immersion. They do not understand that games are actually software, not hardware, the fuck. And if so, it's the software side they associate with bugs and anything that isn't great. So i'm just the lazy dev and the silly compute warrior, unwilling to use the latest and greatest from the only true innovator out ther, Jensens oven.

Idk what you mean with vendor locked solutions shown at CES btw. Some more flexibility using NVapi? Well, i might like that. I would also like to see AMD exposing their two instructions which make up their ‘HW acceleration’.
Damage control is better than nothing.

Vilem Otte said:
Caching acceleration structure is a joke (you can't do that in any reasonable way).

What do you mean? You want to rebuild AS every frame? From scratch? :O

Vilem Otte said:
Is it that much faster compared to compute? No, not really.

I assume RTX cores are 10 times faster than traversal and intersection with compute.
I pull that number out of my ass, assuming you need an order of magnitude speedup to make fixed function worth it. Am i wrong by much?

Vilem Otte said:
This would become a nightmare real fast. Especially when you add new GPU vendors into the mix (the ones from China like Moore Threads).

Yeah. And i really like China building its own GPUs, or Qualcomm entering PC space, etc.
I have no NV shares, so anything competing them and breaking their growing dominance makes my smile.

But vendor APIs are not mutually exclusive with cross vendor APIs. We can have both. And in the case of Mantle it finally gave us DX12 and VK.
Mantle is dead not, but it fullfilled it's purpose.

If there ware vendor APIs, i would consider them a temporary opportunity to optimze, or to do research. But if i want my game can still be played 5 years after realeas, only the cross vendor API support would gurantee that.
It should be something experimental, aiming to bring HW and SW guys closer together, so they agree on something, and get there faster.

Vilem Otte said:
Have you seen what NVidia presented at CES? 5xxx generation brings almost no actual improvement

Yes, i thought the same.
But does not matter. More flops isn't possible anymore, so gamers will take the bait and upgrade only to get those high FPS haluzinations.
And it even works. DLSS 4 shows much less artifacts than before, i have seen.
My plan to get such speed ups would have been texture space shading. You know, ‘caching’, which i am a big fan of. ; )
But integrating DLSS is actually… 1000 times less effort, at least?
Not all they do is wrong ofc.
But the texture space shading plan is still on my list…

Vilem Otte said:
they could do the same with dlss in 4xxx - but ofc that will be locked out as they have to sell that minimal improvement in hardware)

I don't blame them for that. I would do the same.
There are worse examples of this, like Work Graphs. That's a really important feature, probably, did not read yet about it, because…
Only the latest GPU generations of either vendor support them, although my good old GCN could do it too i'm sure.
So it takes +5 years until we can use this revolutionary feature. A lot of time to read about it later… <:/

Vilem Otte said:
(although I have to admit Arc is finally becoming something now).

Well, i'm waiting for Strix Halo. But seems they position as a premium product. It will be very expensive for PS5 compute power and PS4 bandwidth.
So maybe i'll get one more dGPU instead. Will wait for the mysterious RDNA4 reveal, but so far Intels new GPU is my favorite. Good price and powerful enough. And if i buy one, it will increase their market share by a noticeable amount!
They will be happy and motivated to last longer, working harder, and not giving up, until their little crisis is over.

Vilem Otte said:
Our solution is about as crazy as you can think - dual support of compute and hwrt. Sometimes hwrt wins in performance, sometimes loses. The generic gpu programming tools at this point are enough for us to do everything in compute we need. Addition of work graphs is nice and great.

Wow, i did not expect that.
I have some software RT ideas too, maybe i should consider this more seriously.

Vilem Otte said:
Proposing any solution? It's hard - because from experience various scenarios require different solutions/approaches. No one was rushing to generalize that because it is very hard.

Yeah, but only after such generalization have been found and agreed upon, fixed function implementation makes sense.
Thus, current HWRT has a lot in common with AI pixels - it's a halizunation. :D

Vilem Otte said:
… and you can see that in sales of those games

I was quite shocked about Indiana Jones. I saw trailers and such, and my impression was: Maybe interesting, but visuals are meh. Brownish PS3 game with resolutions cranked up.
Yeah, that's what i thought. And then i've heard i can't even play it. I would have given it a try.
Why did they do this? It can't be that costly to support older GPUs as well? It's some programmers vs. hundreds of artists?

Well, i really need to upgrade anyway, but sadly games are no longer my reason to do so.

Vilem Otte said:
EDIT: Sorry if this sounds as quite a rant - wrote that while messing with some vendor-specific branches in code, which is my favorite thing…

Hehe, i guess if we would be still young, we would not rant such a lot? ; )

Anyway, interesting to hear our RT complaints are basically the same.
So… statistical probability i'll see a fix during lifetime is somewhat given. \:D/

Vilem Otte

3,390

January 09, 2025 04:00 PM

JoeJ said:
But always when i try to say that, people think i would want to destroy their worshiped hardware accelerators, erasing their nostalgic memories on when the 3Dfx god came down to earth to give us Doom and Quake, or something like that.

And the funny part in that is, while 3D accelerators were known for those games, they were created completely without hardware rendering at first. I'm still quite impressed by Quake original software renderer and hoping to find a time to mess with software rendering (again).

Fun fact is that due to how powerful hardware is nowadays, we could probably get away with rendering only in compute.

JoeJ said:
Idk what you mean with vendor locked solutions shown at CES btw. Some more flexibility using NVapi?

No, they were talking about vendor-locked hwrt for nanite-like tech. I'm not even sure it will ever be exposed out for public, or just hard implemented for UE. They mentioned it, showed 5 seconds of video and then went back to talking AI.

I don't blame them as company for increasing AI performance only - they target their customers (which is now primarily AI, before it was primarily crypto miners). They don't make hardware for rendering or games primarily anymore (definitely not based on their income share).

JoeJ said:
What do you mean? You want to rebuild AS every frame? From scratch? :O

I mean saving acceleration structure for static models or precomupting for partially dynamic ones (i.e. like “keyframes” in animation you can have acceleration structure - and only refit between those) and shoving that into file. The best way to compute those is to precompute.

JoeJ said:
I assume RTX cores are 10 times faster than traversal and intersection with compute. I pull that number out of my ass, assuming you need an order of magnitude speedup to make fixed function worth it. Am i wrong by much?

Not even that, I know I still have space to optimize a bit more with compute based RT (although that'd hit mainly specific scenarios) - and the numbers are about this (just a random run with editor and various RT setups - I didn't want to switch BVH type or parameters (kept the default ones … also ran a ton of applications in the background - not a proper benchmark), numbers then differ of course); the test was Crytek Sponza and tracing primary rays:

0.98ms HWRT
6.52ms SplitBVH BVH2
4.12ms SplitBVH BVH4
4.78ms SplitBVH BVH8

Also on different hardware you get different numbers (this is for Radeon 7900GRE). It is not even 10-times difference.

JoeJ said:
More flops isn't possible anymore

I'd disagree with this - we can still go a little bit down (although beyond 2nm it's becoming quite insane), we can increase die, layout the chip differently (less tensor cores, more generic compute cores, and such), but increasing compute/flops isn't that necessary - that's not the biggest problem.

The 1080Ti has 11 TFlops or so, the 4090 has 82 TFlops … and pretty much no algorithm will get anywhere near that (because no one really does just FMA instruction repeatedly without anything else). You will suffer much earlier due to feeding the compute - 1080Ti had 11GB of memory and 484 GB/s theoretical max. bandwidth. 4090 has 24GB of memory and theoretically max. of 1.01TB/s bandwidth (bandwidth-to-memory size is actually worse now!). Theoretical fillrate also didn't increase nowhere near as much as compute.

It is not just about flops, it never was - but marketing worked and certain customers often grabbed the “next new thing”.

JoeJ said:
Not all they do is wrong ofc.

Oh clearly it isn't. My main rant isn't about DLSS itself - it is inability of hardware to run current games at reasonable speed. I'd expect high end hardware - which 5070 is without a doubt, to run 4K@60fps natively - it is incapable of doing so (it is incapable of 4K@30fps!). Moreover even the top ultra high end is incapable of doing so. DLSS should be there as a bonus.

This can't be blamed just on vendors (honestly NVidia tries to come up with solution to this problem with DLSS - same goes for AMD and their FSR … the software just got so unoptimized and bloated that it's not possible to run it on today hardware), it also goes after the authors of products with poor performance and throwing everything at heavyweight generic solutions praying that they will handle the performance hit - they won't.

I'm siding here with JBlow and others with similar opinions to him (I'm yet not as crazy as him to build my own language) - software is becoming more and more unoptimized bloated shitware over time - with the quotes from universities since 2000s (hardware is fast enough to handle everything, premature optimization is root of all evil). We, as developers, should never resign on optimizing.

JoeJ said:
There are worse examples of this, like Work Graphs. That's a really important feature, probably, did not read yet about it, because… Only the latest GPU generations of either vendor support them, although my good old GCN could do it too i'm sure. So it takes +5 years until we can use this revolutionary feature. A lot of time to read about it later… <:/

Reminds me of great features in my tech… let me show an example:

if (mRenderer->mMeshShaders && mRenderer->mGraphicsVendor == D3DRenderer::GraphicsVendor::AMD)
{
    ...
}
else if (mRenderer->mMeshShaders && mRenderer->mGraphicsVendor == D3DRenderer::GraphicsVendor::NVIDIA)
{
    ...
}
else // !mRenderer->mMeshShaders
{
    ...
}

What a wonderful way to write compute/render passes. I'm trying to minimize this kind of branching - as it'd become ridiculous real fast.

JoeJ said:
Will wait for the mysterious RDNA4 reveal, but so far Intels new GPU is my favorite. Good price and powerful enough. And if i buy one, it will increase their market share by a noticeable amount!

I'm currently having available/plugged somewhere at home (= workplace) - GTX 1660, RTX 4050, M1, Radeon 6800 and Radeon 7900GRE. At some point I might grab the Intel one … also the Apple one is aging (yet it is enough as I don't work with graphics there too much - it's for some other work items).

I don't think there is a reason for me to jump on new generation right away (neither Radeon 9xxx nor GeForce 5xxx is probably going to make significant difference for me). I might be required to grab new Radeon though - we'll see (depends on requirements related to work).

JoeJ said:
I was quite shocked about Indiana Jones. I saw trailers and such, and my impression was: Maybe interesting, but visuals are meh. Brownish PS3 game with resolutions cranked up.

This is a problem I have with many modern games - they doesn't tend to look that good. Also I don't play for looks! As I've mentioned many times - my most recent story-driven titles I've played were Enderal (mod) and Archolos (mod). Apart from that … Factorio: Space Age and Age of Empires II: Definitive Edition. If the game brings mediocre story and bad gameplay - bad stuttering graphics with temporal smear and forced upsampling just won't save it (and massive input lag coming from that too!).

The only fast paced game I've played recently is Hunt: Showdown.

JoeJ said:
Why did they do this? It can't be that costly to support older GPUs as well? It's some programmers vs. hundreds of artists?

Root cause is Epic and Unreal Engine - which had a great marketing towards management. So significant amount of studios just gave up on programming tech, optimizations, etc. … and simply shove it to UE. The promise by Epic is that you won't need those expensive senior developers - but rather juniors at lowest wages and you rotate them fast because “everyone will know UE”.

The results … flop-rate among UE games is massive (even in AAA).

JoeJ said:
Hehe, i guess if we would be still young, we would not rant such a lot? ; )

Oh darn … the years just pass by. If nothing, at least we're still having fun programming!

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

JoeJ

4,395

January 10, 2025 10:08 AM

Vilem Otte said:
Also on different hardware you get different numbers (this is for Radeon 7900GRE). It is not even 10-times difference.

If you would use a NV card, HW would be 10 times faster i guess.
AMD is still mostly software, since they do the traversal and stack with compute. Only box and triangle intersection is fixed function.
Maybe you could use their intersection instructions, and then you might match or beat their implementation.
That's why i actually like their RT: In theory it is flexible.
But well - if they are too lazy to expose their instructuions, that's sadly only a theoretical advantage. And in practice they are crushed by NV and Intel with their HW traversal units.

Vilem Otte said:
It is not just about flops, it never was - but marketing worked and certain customers often grabbed the “next new thing”.

Yeah, but flops is the only ‘simple and comparable’ metric we have. And in practice the ratio of compute and bandwidth is roughly the same across entry level and enthusiast cards of a chip generation. So it's a good metric.
But it does not allow to compare one vendor to the other, and things like double issue also make the number less representive for real world performance.

Regarding Moores Law, sure they can always squeeze some more performance out. But we are at the point where cost increase is higher than the benefit. So personally i just say it's dead, and i no longer expect free lunch from ever increasing HW power. It's over.
This means we have two options to expand:
Expensive bleeding edge enthusiast HW, or low cost / low power HW.
It's quite easy to see which of those options promises to sell more games. And personally i don't even want faster HW anymore. The improvements in visual fidelity are too small compared to the cost.

So the real problems i currently see are all political:
Too much attention from press and gamers on ‘HW progress’. It's like they want to keep the ball rolling at all cost, just becasue they are used to trust in Moores Law.
Devs contributing to the hype train still, because they do not realize the growing conflict of interst between HW and SW. (Looking a t all those UE5 games with dynamic GI used for completely static scenes.)

The only technical problem i currently see is that PC APUs have not enough bandwidth for their iGPUs, keeping the GPUs too small.
They solve it for consoles, but not for PCs, due to legacy industry standards.
I also speculate the real reason Lunar Lake is discontinued was industry backlash, becasue they don't want to loose the memory business.
In other words: Our PC platform is outdated. Having all replaceable parts was super nice in the 90's, but currently i'm much more impressed from Apples ‘whole computer in a single chip’, which could be much more cost and energy efficient for mainstream gaming than bulky big towers with leds.

Vilem Otte said:
Oh clearly it isn't. My main rant isn't about DLSS itself - it is inability of hardware to run current games at reasonable speed. I'd expect high end hardware - which 5070 is without a doubt, to run 4K@60fps natively - it is incapable of doing so (it is incapable of 4K@30fps!).

But it's the SW which is incapable, not the HW?
I look at it this way: If you want to play at 4K, and you also want ultra realism and epic quality, then you deserve upscaling.
If you want to play Counterstrike 24/7 at 500 fps you can do that too, without any pixel halizinations which might cost you a frag.
And if you just want to play games, not caring about all that tech cargo cult at all, then congrats! Because that's all what gaming should be.

Vilem Otte said:
DLSS should be there as a bonus.

Then nobody would use it.
No - It is there, it works, so devs can and will use it to increase visuals to a point that would not be possible otherwise. That's totally expected i think.
Ofc. in many cases people might speculate: They are lazy and instead optimizing their game they use upscaling as a crutch.
But i'm tired about such arguments even more, becasue they always come from people who know nothing about game dev. They rather just eat NVs marketing claims for breakfast and then they think they are well informed about tech.

Vilem Otte said:
software is becoming more and more unoptimized bloated shitware over time

Yep, obiously. As complexity grows, optimization just becomes harder as well.
We should try to do better, yes.
But no - all games use just one engine. And without competition there is no point to optimize, nor is there an indicator that optimization is at all missing.

Currently the numbers still look pretty good. New Steam records of concurrent player amounts every few months.
But they are logged in only to chat with their buddies, play mostly old games, and watch the latest AAA eye candy only on Twitch.

You know what i think. The Titanic is sinking, the industry will crash, but after that a new generation will rise to make games great again.
Let's just hope the new generation then is more capable than just prompting ChatGPT… :D

Vilem Otte said:
What a wonderful way to write compute/render passes. I'm trying to minimize this kind of branching - as it'd become ridiculous real fast.

See? I never upgrade my GPU, so i only need to work on the 3rd of those branches. \:D/

(Just joking. I've had those NV/AMD branches everywhere too. Back then, when i could still effort to have both.)

Vilem Otte said:
I don't think there is a reason for me to jump on new generation right away

More and more gamers will arrive at the same conculsion finally.
This makes those precious dGPUs even more expensive. No. APU is the only way forwards.

Vilem Otte said:
This is a problem I have with many modern games - they doesn't tend to look that good. Also I don't play for looks!

Nobody plays for looks. Enthusiasts only invest so they can crank up all settings to max. Then they sit back, gazing with joy on their path traced game, for 3 minutes. And they feel like a squirrel, knowing there is a hideout with a stock of nuts nearby. They feel prepared, ready, and save.
Then they quit and play Counterstrike. But opposed to their believs, 200 more fps does not cause them more frags.

Vilem Otte said:
Root cause is Epic and Unreal Engine

But Indiana Jones uses the latest idTech. I assume theri thought is: Consoles have RT anyway, and those few PC gamers which still have no RT GPU won't match min specs anyway, so lets enforce RT to keep it simple.
I don't blame them. after 5 years it's fine to request HW features that old. But i would have expected more impression from the first RT only AAA game.

In fact i would like to play Quake2 RTX. But aside that, i still do not feel like missing something from not having RT. The improvements are noticeable, but the difference is too small for the cost, and the games are no system sellers.
That's no critieque on anything. It just shows that further visual progress is no more big argument to sell new HW and games.
That's really sad, but it is what it is. I'm pretty sure younger generations will come to the same conclusions as well. HW no longer sells games, and games no longer sell HW. We loose soem traction and uplift, but we also become free.

Vilem Otte said:
The results … flop-rate among UE games is massive (even in AAA).

Would it be different if they would use another engine? Kinda doubt.

From my perspective, all those AAA games are just clones of Dark Souls, currently.
It always was like this. Most games were clones of former good games. Super Mario, Doom, now it's Dark Souls.
But i don't like melee combat, so i'm out. Maybe they make me a game in a decade or so. Till then i'm out and AAA is currently dead to me. Game over.

I could try Stalker maybe. But remembering the first game, it's probably the same ‘go anywhere and do what you want - entertain yourself!’ RPG concept that always fails to motivate me. So i'll pass on that, and all i can hope for is some Indie Shooter here and there which uses low poly characters instead Doom billboards ideally, and hopefully avoids broken mechanics such as parkour movement or glory kills.

What i don't understand is this: With a games industry so huge, how can it be they make so few games for me?
Am i some outlier, the last man on earth who actually likes FPS? That can't be.
Much more likely: They target the largest audience, detecting trends, liek Dark Souls currently.
They all detect the same major trend, ofc. And then they all make the same game to serve the (seemingly) dominating group of gamers, which then is oversaturated and becomes boring, and all other desires remain unserved.

They know this is bad, but they do it anyway because their production workflow is so horribly inefficient. For each game they need millions of models, textures, and scanned Hollywood actors. They can not make a game without that, becasue they think they need it to compete those other AAA fools playing the same game of doubling down on production costs.

The obvious solution would be the indie scene, but seems we're all out of ideas a bit, at this point. : /

Vilem Otte

3,390

January 10, 2025 01:32 PM

JoeJ said:
If you would use a NV card, HW would be 10 times faster i guess.

I might get the numbers when I dig out the laptop - from what I recall it was around 7x or so (but I might remember it wrong).

JoeJ said:
But well - if they are too lazy to expose their instructuions, that's sadly only a theoretical advantage. And in practice they are crushed by NV and Intel with their HW traversal units.

From what I've played with RT it is usable only with FSR (at which point you get somewhat good fps, but temporal smearing and such) - comparable to DLSS. Which pretty much matches NVidia (DLSS-only too, even on their top-end models). I've used Radeon 7900GRE .. so AMD's higher end one.

The two games I've played with RT were Witcher 3 and Cyberpunk 2077 - eventually turned it off (together with FSR). Still images were nice, but in movement with FSR has same problem as DLSS. At the end of day, the difference between RT and no-RT is absolutely minimal - the game is still same good, and the added visuals give much higher performance hit than what one would expect.

JoeJ said:
Devs contributing to the hype train still, because they do not realize the growing conflict of interst between HW and SW. (Looking a t all those UE5 games with dynamic GI used for completely static scenes.)

This is becoming ridiculous I agree - looking at HL:Alyx - which still uses lightmaps to an extent, box-projected cubemaps and other tricks … and at the end of day looks significantly better than pretty much any of those UE5 titles. At fraction of performance cost.

JoeJ said:
The only technical problem i currently see is that PC APUs have not enough bandwidth for their iGPUs, keeping the GPUs too small. They solve it for consoles, but not for PCs, due to legacy industry standards.

AMD did try to do that years back with their A10 series (which was at the time comparable to Playstation/XBox). It worked for part of the market. You still have the replcability and extensibility of rest of hardware, while using APU. The main problem for me was that it just wasn't powerful enough and memory was too slow. Fun fact, I played Witcher 3 on it without problems - enjoyed it and it looked great. I had to go to dGPUs “back” due to work.

This being said, one of my friends bought new Ryzen weeks back - uses the iGPU in it just fine for gaming. It is still quite fast and he doesn't complain.

In PC market the dGPU groups are still big enough to sustain themselves (for now) and therefore it will continue to be developed and will exist (and they're willing to spend a lot of money on those - therefore as profitable sector, it is going nowhere). I can see that high end is getting into more ridiculous prices, especially with NV (in industrial/medical sector, we simply put those expenses on customers - who doesn't mind that much, because when you consider equipment cost in millions USD - few dGPUs are nothing in that - especially as they save you significantly more money in other parts).

Hardware wise, the selection for gamers is great - and I could say it was never better. Now as for software…

JoeJ said:
But it's the SW which is incapable, not the HW? I look at it this way: If you want to play at 4K, and you also want ultra realism and epic quality, then you deserve upscaling.

Exactly. It's software lazily developed with mindset of “hardware is overpowered, we don't need to optimize”. I'd raise HL:Alyx again here - ultra realism, epic quality … no upscaling.

JoeJ said:
Yep, obiously. As complexity grows, optimization just becomes harder as well. We should try to do better, yes. But no - all games use just one engine. And without competition there is no point to optimize, nor is there an indicator that optimization is at all missing.

Luckily while UE5 gets indeed bigger part of market, it was always there (I remember ton of games on UE back in the era of first trilogy of Gears of War). They just did the marketing right to get more studios use it.

Other engines went nowhere. Out of the massive generic ones there are still Unity and Godot (the latter becoming quite big among indies) … and unlike Unity it doesn't suddenly change your TOS with threat of your product termination within 30 days. Makes you giggle unless you or your clients are impacted, right…

Not to mention massive amounts of smaller or bigger, closed or open engines - the list is way too long to list (and I'd still miss 90% of those). UE5 might be adopted at bigger scale among AAA companies, especially those that follow marketing of firing senior engineers in favor of quick-cheap rotation of juniors (which is how Epic sold it) … and we're wondering why they have such big amount of flops. This surely doesn't mean that the engine is bad, but rather that those companies adopting it for stupid reasons are - and obviously they release flop and go bust.

JoeJ said:
(Just joking. I've had those NV/AMD branches everywhere too. Back then, when i could still effort to have both.)

I wish I could get rid of those - luckily I need those only at few places (~few dozen at most).

JoeJ said:
This makes those precious dGPUs even more expensive. No. APU is the only way forwards.

That depends … can we get APU as powerful as dGPUs? Possibly yes, cooling could be a problem, but we can increase die (as happened with Threadripper). Although there will be businesses opposing it (and the group buying dGPU is still quite big to sustain itself as I said) - mainly AIBs.

JoeJ said:
But Indiana Jones uses the latest idTech. I assume theri thought is: Consoles have RT anyway, and those few PC gamers which still have no RT GPU won't match min specs anyway, so lets enforce RT to keep it simple.

While that is true - anything apart from high end/ultra high end is useless for RT (even with DLSS won't help you there). I've looked at some of the Indiana Jones ones - and lower end hardware - not impressed. Also for some reason it feels like a lot stuttering despite showing 80+ fps on benchmarks screen (other that showed lows with 3 fps was basically every few seconds at most - not impressed). And that was RTX 4060. That was with DLSS Quality on some low/mid details.

Overall not impressed, as you said. Performance is poor and the game just doesn't look that good to me (the problem there is, I'm slowly getting older and remembering the gems from 00's - and of course comparing to those).

On high quality and 4K it looks nice, performance is still imho bad - I'll add a link here - https://youtu.be/IAid0L3uieA?t=197 - at that specific time (on 7900 GRE) it states 125fps, but honestly feels like 15-20ish to me from the video. It is definitely not smooth.

Fun fact here is - those are all more or less static scenes, you could precompute most of the lighting (except for dynamic objects), do proper LODding, and the scene would run blazingly fast even on low end hardware. Can we at that point say that hardware is bad? Or is it the software? Ultimately it's the consumers who decide - and I think this sold quite well in the end.

JoeJ said:
In fact i would like to play Quake2 RTX. But aside that, i still do not feel like missing something from not having RT.

My 50c (and I'm the one slapping ray tracing everywhere) - you're not. The fact that most games do need to zoom in and show extreme details that have changed due to RT enabled/disabled has proven that we're quite good at faking the effects to a point that it becomes hard to distinguish RT on/off.

Ultimately it always boils down to gameplay.

Quake 2 RTX is nice experiment - but it has its flaws (effects look different, some things feel a bit off - but that's due to Quake 2 not being designed this way). Witcher 3/Cyberpunk 2077 had one of the most advanced RT pipelines (and it was just an added bonus, overall visuals didn't really change much))

JoeJ said:
Would it be different if they would use another engine? Kinda doubt.

No. They would use any other off-the-shelf engine promising those studios the same thing Epic did. Get rid of senior staff and rotate fast juniors, thus lowering your budget. That's the root of the problem.

JoeJ said:
With a games industry so huge, how can it be they make so few games for me?

The answer for you is indie scene.

In AAA most games tend to be everything - open world, RPG with melee, dark souls concept, etc. - it's a grey mix of everything that doesn't shine in any aspect. It ends up being neither open world (yes, huge world like in Horizon - but it is not open, going anywhere is meaningless), neither RPG (some upgrades + statistics doesn't make RPG - you have to play a role and have choice, not a pre-determined story without any player's choice on how to approach things → compare to Gothic II … nowadays RPG means action adventure), melee (don't even get me started on this … I don't mind melee - but it's same Dark-Souls combat system everywhere), and you could go on.

The most recent game I've played was Factorio: Space Age … now we're going to play OpenTTD with some crazy settings. I've got few games on a list I'd like to play (from AAA) - most of them already in library, but can't push myself into playing them. Although to be clear - I've had a second child, daughter, born recently - so time for games is about the last thing I have right now.

Guess, I'll finally have to make a game for myself. We'll see about that.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

JoeJ

4,395

January 11, 2025 11:39 AM

Vilem Otte said:
The two games I've played with RT were Witcher 3 and Cyberpunk 2077 - eventually turned it off (together with FSR). Still images were nice, but in movement with FSR has same problem as DLSS. At the end of day, the difference between RT and no-RT is absolutely minimal - the game is still same good, and the added visuals give much higher performance hit than what one would expect.

I'm from the same camp, but there are also people who just want to set all settings to max, eventually not even understanding what those settings do. They just want the best experience, and max settings imply to deliver that.

Problem is: It's easy to make the experience worse, but the player may not consciously notice, nor may he notice which settings cause what performance costs.
This we should change the way gfx settings can be set, imo. The game should run in the background, showing FPS number. Then the player sees both the visual changes and the costs.
Ofc. it depends on the current scene, but it's still much better than just showing a list of checkboxes and sliders without much context.
Iirc, Returnal did this, and seems a really good idea to me.

Vilem Otte said:
This is becoming ridiculous I agree - looking at HL:Alyx - which still uses lightmaps to an extent, box-projected cubemaps and other tricks … and at the end of day looks significantly better than pretty much any of those UE5 titles. At fraction of performance cost.

Alyx looks great. But i think UE5 can top this impression easily, if the devs would make some actual use of the all dynamic lighting. E.g. large moving objects which affect GI, moving light sources, etc. But they don't do this. Their games are still static, so the cost of realtime lighting is a waste, and the argument of older / baked games looking better, or beeing better optimized comes up.

The Silent Hill remake is a good example. Announced as a 30fps game on a RTX 2080, but the game is compeltely static, like the original game. I can identify a use of realtime GI only by observing the noise! But ther is nothing about dynamics in lighting otherwise.

It runs quite well on my machine. Maybe 60 fps, putting most settings to min. And it still looks better than any UE4 game, just the image is very unstable from the upscaling. Some for other games like Robocop and siome Indie stuff i've tried. To me UE perf is overall much better than what i have expected.
But the Quake 2 reamster is a much better experience: No stutter, no noise, no flickering. And the visuals look good to me as well.

Vilem Otte said:
On high quality and 4K it looks nice, performance is still imho bad - I'll add a link here - https://youtu.be/IAid0L3uieA?t=197 - at that specific time (on 7900 GRE) it states 125fps, but honestly feels like 15-20ish to me from the video. It is definitely not smooth.

Yeah, looks like a terrible frame rate. Maybe it's FSR which has frame pacing issues - not sure if they could fix that meanwhile.
I also wonder about GI at id in general. They got this totally wrong in recent Wolfenstein games. It's baked, but incorrect. Too dark overall, looking like exaggerated AO on top of a single bounce or something.
And i see the same issues here. Black shadows beside full sunlight - no indirect at all and looking like from PS3 era.
Honestly, the lighting is just ugly. RT may help to make the ugliness more detailed, but it's still ugly.

But hey, it has Path Tracing! Even exclusive to the green team!!! So ofc. it's the best visuals of all games from the year, according to DSOG. : )

Vilem Otte said:
Fun fact here is - those are all more or less static scenes, you could precompute most of the lighting (except for dynamic objects), do proper LODding, and the scene would run blazingly fast even on low end hardware.
Can we at that point say that hardware is bad? Or is it the software?

Neither! It's the art direction and game design.
They want realtime gI so they no longer need to wait for baking, and then storing all that data on disk.
But that's NOT it's selling point. It's dynamics, and only that.

Vilem Otte said:
No. They would use any other off-the-shelf engine promising those studios the same thing Epic did. Get rid of senior staff and rotate fast juniors, thus lowering your budget. That's the root of the problem.

Hehe, so maybe, after the upcoming video game crash 2.0, it's not a new generation of devs who fix games,
but a bunch of fired senior programmers? This could actually work! :D

Vilem Otte said:
…
but it's same Dark-Souls combat system everywhere
…

Wrote the same things yesterday, but then decided to not post it becasue don't wanna sound too negative. ; )

Vilem Otte said:
Guess, I'll finally have to make a game for myself. We'll see about that.

Yes.

JoeJ

4,395

January 14, 2025 11:20 PM

Vilem Otte said:
No, they were talking about vendor-locked hwrt for nanite-like tech.

Just saw it:

Impressive. Looks like remeshing, then tessellated with displacement, surely using their DMM tech introduced with 4xxx iirc.

I was thinking about this before. Because i do basically the same thing, DMM would work for me.
But the problem seems: DMM is only detail amplification, no reduction. So i wonder how they reduce beyond the base mesh, and if they do so at all.

However, one thing is clear: Instead of fixing RT APIs to introduce the missing flexibilty so we could handle the lod problem,
they rather ‘solve’ the problem by introducing even more proprietary fixed function hacks. I'm not surprised.

JoeJ

4,395

January 14, 2025 11:20 PM

Vilem Otte

3,390

January 14, 2025 11:39 PM

JoeJ said:
However, one thing is clear: Instead of fixing RT APIs to introduce the missing flexibilty so we could handle the lod problem, they rather ‘solve’ the problem by introducing even more proprietary fixed function hacks. I'm not surprised.

This is a problem I see in there.

I'm trying to use current existing tech for RT of dynamic geometries with nanite-like lod and it does work quite well. I dont need proxy meshes like UE - and I'm still messing with details that could improve this (a lot). I'm basically doing 2 implementations of this at once (compute and hardware). Truth to be told, compute one is a lot easier and better to work with considering how poorly written hw rt apis are.

The idea for compute tracer is to have acceleration structures for meshlets (group of meshlets) precomputed and build blas quickly from these. To an extent this works (needs a lot more work to be finished though) - but is heavily dependent on how you build the upper part of blas (hlbvh seems to be very good at this to be fair … I'd like to mess with gpu-only hlbvh builder - which could excel here, and also gpu-only compactors to bvh4 and bvh8) … most of the acceleration structure code I use is cpu-driven now (and gpu-based builder is not really able to build “upper-part” only, I have to modify this).

In hwrt such approach won't work - I can't cache part of acceleration structure, yeah … meh. But the builders seems to be quite fast so groupping meshlets and building those sub-blas'es (which can be cached, assuming rest is up to tlas-only) is going to work. The real question is, how fast is this going to be for realistic scene.

I have a list of tasks to do this, I also have a strong motivation as we will likely need to do this work-wise. The only thing remaining is actually doing it (which is kinda meh nowadays due to life - moving myself to a new location, and my family grew recently). So - I'm getting there, just too many other things preventing me from working on this at the moment.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

OpenGL path tracer in C++

Popular Topics

Recommended Tutorials

OpenGL path tracer in C++

Popular Topics

Recommended Tutorials

Reticulating splines