frob said:
While it is important to match your design, it is also important to match your implementation to the hardware.
Decades ago the hardware was oriented around tiles and sprites. On many systems you would fill up a sprite array, then provide another array of which sprite elements to use to fill the screen. Scrolling was done by a shift value, where you could scroll the entire screen by a fraction of a tile. It was quite efficient, especially on systems that could access memory on their cartridge directly. They could simply point the sprite buffer at a location in their card for rendering, and use a small array to represent the active screen. A small bit-field would flip the sprite horizontally, flip it vertically, or rotate it in 45 degree increments. The implementation wasn't to match the design, it was to match the hardware.
These days hardware is more oriented toward meshes and sprite clouds with support for multi-gigabyte megatextures. The arrays of tiles and sprites are distant memories. As you have realized, a naive implementation trying to mimic hardware from decades past will result in tons of draw calls and quickly bog the system down.
Smarter use of the system resources based on how today's hardware actually works can give amazing performance. Instead of drawing 760 individual tiles every frame, composite all the stationary tiles just once, then draw them one time. Don't just composite a single screen, you can easily make a single, very large composite image on the graphics card, perhaps something that is two or even four screens wide, say 4096x2048, then you can have a single draw call that handles all of it, needing to generate the next version when you begin to approach the boundary of the current screen. Making new regions can be done gradually rather than as a massive single-frame task. You might occasionally need to update small subsections (e.g. Super Mario Bros block gets broken and needs to be removed from the static image and replaced with an active sprite) but that's a very small operation compared to the thousand render calls per frame. Or alternatively, instance them. It is a more complex and relies on shaders instead, but then you're drawing each sprite once per screen, exchanging video memory versus data arrays.
Whatever direction you go, have your implementation match what today's hardware is actually doing. Just like how decades ago, implementations went with sprites and tiles to match what their hardware was actually doing.
Thank you @frob
This is really interesting.
I could draw the entire background, lets say 2 screens high and 4 screens wide, with an art package. Save it as a png file, and load it once before the first render.
I'm currently using this line to render my graphics to the screen:
SDL_RenderCopy(Window::renderer, _image_texture, nullptr, &asset);
The format of SDL_RenderCopy is:
int SDL_RenderCopy(SDL_Renderer * renderer,
SDL_Texture * texture,
const SDL_Rect * srcrect,
const SDL_Rect * dstrect);
In the two panel version I'm calling this twice, once for each panel (where each panel is a separate image texture) , with 1800 wide of pixels being rendered outside on the viewing area (Window::renderer). I'm not sure if SDL_RenderCopy is clipping/truncating all pixels outside the viewable area, but it's not causing any issues that I can see - just feels expensive.
Perhaps I can load the entire 2x screen by 4x screen size background into a single image texture, but instead of passing a nullptr argument into the const SDL_Rect* srcrect paramiter, use this to point to a once screen size rectangle of the image texture to be rendered.
This would be extremely efficient, as it would only use one SDL_RenerCopy call per frame, and only render within the viewable window.
I'm only now seeing that SDL_Rect is a structure with predefined x and y that I can manipulate to point to the portion of the image texture I want rendered to screen,
I'll go and try this.
Thank you ?