2 hours ago, VoxycDev said:
Yes, since I was looking for a way to draw all sprites with one call, I decided to make mvMatrix an attribute.
I still don't see why it is needed? Maybe you are trying to over optimise this. The reason sprite batching is needed, is because you can have thousands , even tens of thousands of visible sprites in a scene, even after basic culling, and that many state changes and draw calls is too many. A handful, even dozens of draw calls is fine.
In something 2D (for simplicity of the example) like your Youtube video or other 2D games, I might have separate drawing at least for the background tiles (easy to cull calculate coordinates CPU side, might even cache), world objects, and the UI.
2 hours ago, VoxycDev said:
Thing is, even though right now it's a just a grid, the sprites are supposed to be stretchable/bendable, like trey were in my old fixed pipeline code
Sounds like it could still be done without a unique matrix per vertex, but I am not clear exactly what you are doing. The old code you posted just draws a normal tile grid with no deform unless I missed something.
Would you need these deformed positions CPU side anyway, e.g. for collision detection? In which case just use those directly.
Is the deformation limited, to say moving the 4 corner points of a large object? Or something else that can be determined on the fly from a small dataset?
And surely you can't deform every sprite in the game? If some things need a more complex and expensive routine, avoid letting that add significant cost to the thousands of other things being rendered.
2 hours ago, VoxycDev said:
Quote
- What is `texAtlas.add(obj- >textureName);`. Your not rebuilding a texture dynamically are you? Even if not every frame, need to be careful not to cause slow frames / stutter. Also looks like a string, if its doing string map lookups for every sprite that is not ideal.
It makes sure the texture is in the texture atlas. It's rebuilt as-needed (only when a brand new texture is added). You're right, I probably should get rid of string map lookup here. But in this particular case there is only one texture so array size is 1, so it's not the bottleneck.
Normally if I had an atlas Id do it at load time. Doing it dynamically is a lot more complex. "rebuilt as-needed" can be perceived as stutter if not careful when that "as needed" frame takes longer than the other frames that didn't rebuild anything.
One of the reasons I hate string comparison, is even the best case is fairly expensive. You have a hash map with one entry, well in the case of a "hit" you just did an O(n) hash computation, and an O(n) string comparison (to check against collision), and if you have a fairly long string like a filename, or worse a path, it is a fair bit. Probably not the bottleneck, but things like that if throughout a program add up a lot (some languages and/or programs might "intern" strings so they can use reference equality instead, essentially turning such strings into integers).
2 hours ago, VoxycDev said:
Quote
- Also not sure on the cost of things like `setVertexAttrib`. You should be able to do this once, and it is saved with the `GL_ARRAY_BUFFER` (possibly all in one go, e.g. `glVertexAttribPointer`)
setVertexAttrib just calls all the gl functions needed to set up an attribute. Good point, though. I should try to do this once if I can. This is not the only program/renderer that runs in the engine though, so I assumed I have to re-set-up all the attributes on every frame for every program. Is that not the case?
This is where `glVertexAttribPointer` etc. come in, despite maybe the first appearance, it is not setting global state, it is modifying the buffer, and what you set will be there next time you use that buffer.
2 hours ago, VoxycDev said:
Quote
- Any sort of dynamic branch in a shader is usually bad if adjacent/nearby data will branch differently. GPU cores are not like CPU ones and can't all independently do their own thing. I didn't look closely at your data, but something to be aware of.
I'm not super worried about the gaps between the sprites. This is only for an editor, not for rendering in the game. As long as it's smooth and I can quickly build vast landscapes and cities out of voxels, that's all I care about.
Wrong quote? The gaps is when you let a translation get combined with other things in a matrix and it causes rounding errors.
Dynamic branching in a GPU program / shader can be a serious performance impact. If the GPU has say 32 threads together, then all 32 threads must do the exact same thing each cycle, they just get different registers (and there are some memory access rules as well). If you have a condition of some sort such that some threads will do one thing and others something else, then it basically has to "pause" one set of threads do the first thing, "pause" the others threads, and do the other thing, on separate cycles.
2 hours ago, VoxycDev said:
@SyncViews, just an idea. What if I send mvMatrix as a uniform array, and even though I can only send 32 or 64 matrices at once, I can then break it up into, let's say, 4 draw calls, to do 128 or 256 sprites? Maybe worth a try.
With only 6 vertices using the same matrix, I am not sure if that is a great help. You would need to test it. Also there may be a penalty for that uniform/memory access pattern, not sure.