Advertisement

Software rendering resources and optimization

Started by September 13, 2021 08:06 PM
23 comments, last by Geri 3 years, 1 month ago

Probably the most known post about SIMD SW render was about triangle rasterization, written by Nick Capen.
But i can't find it anymore. seems gone.
IIRC, it was about rendering blocks of triangles, and each pixel solved all 3 side of line equations. But because it was parallelized, it was faster than the traditional edge walk for scanlines.

I also remember a powerful SW renderer Pixomatic, aiming to work on Larrabee. Michael Abrash worked on it among others. Papers are easy to find, e.g. https://www.cs.cmu.edu/afs/cs/academic/class/15869-f11/www/readings/abrash09_lrbrast.pdf​ Maybe this gives some good resources.​

Oh… i remember about a really good and recent resource. It was an open source SW renderer, fast enough to run games with pretty detailed graphics, and it was posted here on this forum. But can't give any name or search term. :( Project is on GitHub.

thanks so much for these information every body, I think I got it to make a decent rasterizer no way making it fully software, I should use the power of GPU … I think Im going to change the old rasterization algortihms(scanline) and using way more suitable for parallelism and use GPU threades using CUDA or OpenCL is that will be a good idea to get my rasterizer good enough to work with at least showing 2D graphics at reasonable FPS

Advertisement

I would suggest given what @shaarigan said some more thought. Before worrying about low-level optimization, I would worry about having a solid design down first then optimize. Do some research into how ‘modern’ GPU tackle the task, specifically I would look at how mobile GPU pipeline is structured as there are multiple tiers of high-level optimization that can be done prior to delving into the like of assembly etc. Most mobile GPU implement a tile-base rendering approach which I think would definitely suit your software renderer.

Unless you really want to write a game engine, or have some very unusual problem to solve, get some existing game engine and go with that. This is a problem you no longer have to solve yourself.

My software renderer have no simd code whatsoever. I dont even see how that could be usable outside boosting some calculations in the initial vertex transformation… Thats not even the bottleneck.

Geri said:
I dont even see how that could be usable outside boosting some calculations in the initial vertex transformation… Thats not even the bottleneck.

To benefit from SIMD, every pixel can map to one SIMD lane, so you can get 4 or more pixels with similar amount of instructions.
But like GPUs do, you may process pixels in blocks, and pixels in the block but outside the triangle are 'wasted'. Still a win if your triangles are not too small.
The other option would be mapping scanlines to SIMD rows of 4 pixels length, so the waste would reduce to one half. Not sure when and if the non optimal memory alignment matters then.

Advertisement

i think micromanaging the stuff to fit for a simd-relying algorithm, would be slower than doing the rasterization from raw muscle. i am getting around 100-200 fps in 1080p with 100k-200k polygons on a 6 core i7 and thats a relatively dumb but optimal classic c code with threads. the code already edges out the cache bandwidth as the most limiting factor so the simd wouldnt help.

in other hand i could use some trickery to incrase the polygon count, because the situation with that is not super perfect, above 200k poly, the speed starts to nose dive. but these are one year old results, since then i made some optimizations, maybe i already reached 1 million since i have made these measures. this is enough for me, because i typically use only a few 10k polygons on the scene, but would be problematic if i would wish to use more.

To answer the OPs original question: with the increased demands from the game industry in correct and beautiful 3D graphics came an increase in computation demand. New effects like per-pixel graphics computations, so called pixel-shaders, as well as increased screen resolutions are the main factors. But I would not say that decent software rendering is impossible in todays world. I am not the right person though if you want to ask about optimizational shortcuts of the early days that involved trickster-programming, taking a lot of shortcuts, not solving the math properly, going by gut feeling, etc.

Geri said:
i think micromanaging the stuff to fit for a simd-relying algorithm, would be slower than doing the rasterization from raw muscle. i am getting around 100-200 fps in 1080p with 100k-200k polygons on a 6 core i7 and thats a relatively dumb but optimal classic c code with threads. the code already edges out the cache bandwidth as the most limiting factor so the simd wouldnt help.

How do you measure cache BW is saturated? Is there some VS profiler option showing this? I'm a bit lazy with figuring such things out…

Great post. Yes I do recommed doing single thread case initially. It will be helpful in learning about profiling, and understanding how memory access patterns affects the speed of CPU. You can start with single thread and then move on.

This topic is closed to new replies.

Advertisement