@undefined Sure, a short summary then.
First step is to generate sprites with diffuse, normal and height maps. This is painful to do by hand if you want normal maps for a more realistic look. In my Sandbox example, I made a program for generating 3D models from high-resolution images using two triangles per pixel. Same detail level that Unreal Engine 5 will use, but on the CPU in higher frame-rates, by pre-rasterizing into deep sprites.
By assigning the texture pixels as vertex colors, random memory access can be avoided because vertex color is stored in triangle draw order. Isometric triangle rendering with only vertex colors also save us the expensive depth division in each pixel, because we can interpolate colors by pre-generating DX and DY color offsets for each triangle. Rendering a few million triangles on the CPU is then done in an instant and saved to image files for the game to load.
Then you keep a full-screen image for each property. The actual height of the sprite in 3D is then subtracted from the Y location on the screen and added to the sprite's sampled depth pixel before comparing and writing. If the sprite's new pixel is closer than the screen's depth buffer, you write to diffuse, normal and depth buffers.
To speed up rendering of rarely moved items, the background has a set of pre-drawn blocks with all static items in that region. When the camera moves, just use memcpy calls to draw the visible background blocks. Render them when made visible and recycle when too far away from the camera. Knowing which static items to draw for a region can be solved using either a 2D grid with a maximum height (easy to manage), or an oc-tree structure (more reusable).
When you have your diffuse, normal and height images for the frame, you add a light image and fill it with black using memset. Then make a draw call onto the light image for visible light sources. The height image solves the position equation by extruding multiplied by a vector from a flat zero plane. Then unpack normals and you know the scene pixel's relation to the light source. If close enough and not occluded in the shadow depth-map, you can add that light to the light image.
Then you multiply diffuse with light to get the final image, up-scale and tell a background thread to upload the result to the window while moving on with game logic for the next frame.
Dynamic light equations on isometric CPU rendering are pretty much like deferred light using post effects on the GPU. You just go through all pixels in memory using SIMD intrinsics and multi-threading. Intel and ARM has reference manuals for SSE/NEON vectorization. Any article about “deferred rendering” will explain the theory and you just do the same without the 3D rendering pipeline. You can begin with a basic pixel loop until you learn CPU optimization, because it's still fast in retro resolutions.