Software rendering resources and optimization

karimHamdallah · 2021-09-30T18:08:12

I want to learn 2d/3d software rendering so I started with tricks for windows programming gurus which make a software primitive drawing library my problem here is the book use outdated technology (direct draw with 8-bit graphic mode) so I used SDL instead .... my problem now is how to optimize this small lib algorithms before going 3D >> like image blit ,alpha blending, polygon filling, I searched a lot about SIMD and I cant find any resources about SIMD in this topic and software rendering in general no resources no books nothing at all and youtube contain only demo shows software rendering any help???

Graphics and GPU Programming Programming optimziation simd multithreading cpurendering softwarerendering

Started by karimHamdallah September 13, 2021 08:06 PM

23 comments, last by Geri 3 years, 1 month ago

JoeJ

4,344

September 14, 2021 01:40 PM

Probably the most known post about SIMD SW render was about triangle rasterization, written by Nick Capen.
But i can't find it anymore. seems gone.
IIRC, it was about rendering blocks of triangles, and each pixel solved all 3 side of line equations. But because it was parallelized, it was faster than the traditional edge walk for scanlines.

I also remember a powerful SW renderer Pixomatic, aiming to work on Larrabee. Michael Abrash worked on it among others. Papers are easy to find, e.g. https://www.cs.cmu.edu/afs/cs/academic/class/15869-f11/www/readings/abrash09_lrbrast.pdf Maybe this gives some good resources.

Oh… i remember about a really good and recent resource. It was an open source SW renderer, fast enough to run games with pretty detailed graphics, and it was posted here on this forum. But can't give any name or search term. :( Project is on GitHub.

karimHamdallah

Author

September 14, 2021 02:03 PM

thanks so much for these information every body, I think I got it to make a decent rasterizer no way making it fully software, I should use the power of GPU … I think Im going to change the old rasterization algortihms(scanline) and using way more suitable for parallelism and use GPU threades using CUDA or OpenCL is that will be a good idea to get my rasterizer good enough to work with at least showing 2D graphics at reasonable FPS

cgrant

1,875

September 14, 2021 06:13 PM

I would suggest given what @shaarigan said some more thought. Before worrying about low-level optimization, I would worry about having a solid design down first then optimize. Do some research into how ‘modern’ GPU tackle the task, specifically I would look at how mobile GPU pipeline is structured as there are multiple tiers of high-level optimization that can be done prior to delving into the like of assembly etc. Most mobile GPU implement a tile-base rendering approach which I think would definitely suit your software renderer.

Nagle

September 15, 2021 03:03 AM

Unless you really want to write a game engine, or have some very unusual problem to solve, get some existing game engine and go with that. This is a problem you no longer have to solve yourself.

Geri

408

September 17, 2021 03:37 PM

My software renderer have no simd code whatsoever. I dont even see how that could be usable outside boosting some calculations in the initial vertex transformation… Thats not even the bottleneck.

JoeJ

4,344

September 17, 2021 04:03 PM

Geri said:
I dont even see how that could be usable outside boosting some calculations in the initial vertex transformation… Thats not even the bottleneck.

To benefit from SIMD, every pixel can map to one SIMD lane, so you can get 4 or more pixels with similar amount of instructions.
But like GPUs do, you may process pixels in blocks, and pixels in the block but outside the triangle are 'wasted'. Still a win if your triangles are not too small.
The other option would be mapping scanlines to SIMD rows of 4 pixels length, so the waste would reduce to one half. Not sure when and if the non optimal memory alignment matters then.

Geri

408

September 17, 2021 04:14 PM

i think micromanaging the stuff to fit for a simd-relying algorithm, would be slower than doing the rasterization from raw muscle. i am getting around 100-200 fps in 1080p with 100k-200k polygons on a 6 core i7 and thats a relatively dumb but optimal classic c code with threads. the code already edges out the cache bandwidth as the most limiting factor so the simd wouldnt help.

in other hand i could use some trickery to incrase the polygon count, because the situation with that is not super perfect, above 200k poly, the speed starts to nose dive. but these are one year old results, since then i made some optimizations, maybe i already reached 1 million since i have made these measures. this is enough for me, because i typically use only a few 10k polygons on the scene, but would be problematic if i would wish to use more.

The_GTA

September 17, 2021 04:19 PM

To answer the OPs original question: with the increased demands from the game industry in correct and beautiful 3D graphics came an increase in computation demand. New effects like per-pixel graphics computations, so called pixel-shaders, as well as increased screen resolutions are the main factors. But I would not say that decent software rendering is impossible in todays world. I am not the right person though if you want to ask about optimizational shortcuts of the early days that involved trickster-programming, taking a lot of shortcuts, not solving the math properly, going by gut feeling, etc.

JoeJ

4,344

September 17, 2021 05:22 PM

Geri said:
i think micromanaging the stuff to fit for a simd-relying algorithm, would be slower than doing the rasterization from raw muscle. i am getting around 100-200 fps in 1080p with 100k-200k polygons on a 6 core i7 and thats a relatively dumb but optimal classic c code with threads. the code already edges out the cache bandwidth as the most limiting factor so the simd wouldnt help.

How do you measure cache BW is saturated? Is there some VS profiler option showing this? I'm a bit lazy with figuring such things out…

Constantinou_CEO_CWM_FX

September 17, 2021 07:09 PM

Great post. Yes I do recommed doing single thread case initially. It will be helpful in learning about profiling, and understanding how memory access patterns affects the speed of CPU. You can start with single thread and then move on.

Software rendering resources and optimization

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Software rendering resources and optimization

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines