Fixed Function Pipeline Faster For Sprites?

Dimitri Lozovoy · 2019-04-10T06:22:01

I'm getting some strangely unexpected results with my new sprite renderer that uses OpenGL ES 2.0. It performs much worse than my old sprite renderer from 5 years ago that uses OpenGL ES 1.1 (no shaders). All I'm doing is displaying a grid of quads 16x16 and moving and zooming it around a little bit. You can see the difference in the video below: Video to demonstrate the issue Clearly, the fixed pipeline runs smoothly, but my supposedly fast one-draw-call shader program chugs (when I tried one draw call-per-quad it was naturally even slower). This is not what I expected. How can I speed up my new sprite renderer? Is the fixed function pipeline naturally just more adapted to vertex data that changes more often? (like a new VBO on every frame) I could just re-write the new renderer in OpenGL ES 1.1 again, but then I will lose compatibility with desktop OpenGL. This is a bad idea, right? Can I emulate the fixed-function pipeline with shaders? Is there code out there that does this? What tricks did they use in it to get sprites to render so fast? Old Fixed-Function Code: for (int z = 0; z <= mTileEdit.mCurLevel; z++) { for (int y = 0; y < tm.mSizeY; y++) { for (int x = 0; x < tm.mSizeX; x++) { int t = tm.get(x, y, z); if (t != 0 && t > 0 && t < 256) { // Set alpha float alpha = 1.0f; if (Lozoware.getMP().get("name").equals("pixeledit") || Lozoware.getMP().get("name").equals("edit3d")) { alpha = 1.0f - ((float)z / (float)tm.mSizeZ); } // Set color gl.glColor4f(tm.mPalette.mRed[t], tm.mPalette.mGreen[t], tm.mPalette.mBlue[t], alpha); // Vertex buffer bb = ByteBuffer.allocateDirect((6 * 3) * 3 * 4); bb.order(ByteOrder.nativeOrder()); FloatBuffer buf = bb.asFloatBuffer(); float bottomLeftX = x * mGLTileSizeX; float bottomLeftY = y * mGLTileSizeY; float topLeftX = x * mGLTileSizeX; float topLeftY = y * mGLTileSizeY + mGLTileSizeY; float bottomRightX = x * mGLTileSizeX + mGLTileSizeX; float bottomRightY = y * mGLTileSizeY; float topRightX = x * mGLTileSizeX + mGLTileSizeX; float topRightY = y * mGLTileSizeY + mGLTileSizeY; buf.position(0); buf.put(topLeftX); buf.put(topLeftY); buf.put(0); buf.put(bottomRightX); buf.put(bottomRightY); buf.put(0); buf.put(bottomLeftX); buf.put(bottomLeftY); buf.put(0); buf.put(topLeftX); buf.put(topLeftY); buf.put(0); buf.put(topRightX); buf.put(topRightY); buf.put(0); buf.put(bottomRightX); buf.put(bottomRightY); buf.put(0); buf.position(0); // Draw gl.glEnableClientState(GL10.GL_VERTEX_ARRAY); gl.glVertexPointer(3, GL10.GL_FLOAT, 0, buf); gl.glDrawArrays(GL10.GL_TRIANGLES, 0, 6 * 3); gl.glDisableClientState(GL10.GL_VERTEX_ARRAY); } } } } gl.glFlush(); New OpenGL ES 2.0 Code: int numVerts = 0; int numQuads = 0; // Alloc enough data for all sprites for (const auto & pair: objects) { Object * obj = pair.second; if (obj != nullptr && obj - > visible && obj - > type == OBJTYPE_SPRITE) { numVerts += 6; numQuads += 1; } } int floatsPerVert = 26; float * data = new float[numVerts * floatsPerVert]; int cursor = 0; // Quad/sprite index int q = 0; // Fill data for all sprites for (const auto & pair: objects) { Object * obj = pair.second; if (obj != nullptr && obj - > visible && obj - > type == OBJTYPE_SPRITE) { // Add sprite texAtlas.add(obj - > textureName); if (texAtlas.getNeedsRefresh()) texAtlas.refresh(); // Set modelview matrix glm::mat4 mvMatrix; glm::mat4 scaleToNDC; glm::mat4 cameraRotate; glm::mat4 cameraTranslate; glm::mat4 rotate; #ifdef PLATFORM_OPENVR scaleToNDC = glm::scale(glm::mat4(), glm::vec3(VRSCALE, VRSCALE, VRSCALE));# else scaleToNDC = glm::scale(glm::mat4(), glm::vec3(NDC_SCALE, NDC_SCALE, NDC_SCALE));# endif if (obj - > alwaysFacePlayer) rotate = glm::rotate(glm::mat4(), glm::radians(-camera - > yaw), glm::vec3(0, 1, 0)) // Model yaw * glm::rotate(glm::mat4(), glm::radians(camera - > pitch), glm::vec3(1, 0, 0)); // Model pitch else rotate = glm::rotate(glm::mat4(), glm::radians(-obj - > yaw), glm::vec3(0, 1, 0)) // Model yaw * glm::rotate(glm::mat4(), glm::radians(-obj - > pitch), glm::vec3(1, 0, 0)); // Model pitch cameraRotate = glm::rotate(glm::mat4(), glm::radians(camera - > roll), glm::vec3(0, 0, 1)) // Camera roll * glm::rotate(glm::mat4(), -glm::radians(camera - > pitch), glm::vec3(1, 0, 0)) // Camera pitch * glm::rotate(glm::mat4(), glm::radians(camera - > yaw), glm::vec3(0, 1, 0)); // Camera yaw cameraTranslate = glm::translate(glm::mat4(), glm::vec3(-camera - > position.x, -camera - > position.y, -camera - > position.z)); // Camera translate #ifdef PLATFORM_OPENVR mvMatrix = glm::make_mat4((const GLfloat * ) g_poseEyeMatrix.get()) * scaleToNDC * cameraRotate * cameraTranslate * glm::translate(glm::mat4(), glm::vec3(obj - > position.x, obj - > position.y, obj - > position.z)) // World translate * rotate * glm::scale(glm::mat4(), obj - > scale / glm::vec3(2.0, 2.0, 2.0)); // Scale #else mvMatrix = scaleToNDC * cameraRotate * cameraTranslate * glm::translate(glm::mat4(), glm::vec3(obj - > position.x, obj - > position.y, obj - > position.z)) // World translate * rotate * glm::scale(glm::mat4(), obj - > scale / glm::vec3(2.0, 2.0, 2.0)); // Scale #endif // ______ // |\\5 4| // |0\\ | // | \\ | // | \\ | // | \\3| // |1__2_\\| // Triangle 1 // Vertex 0 data[cursor + 0] = -1.0 f; data[cursor + 1] = 1.0 f; data[cursor + 2] = 0.0 f; data[cursor + 3] = 1.0 f; UV input; input.u = 0.0 f; input.v = 1.0 f; UV output = texAtlas.getUV(obj - > textureName, input); data[cursor + 4] = output.u; data[cursor + 5] = output.v; data[cursor + 6] = mvMatrix[0][0]; data[cursor + 7] = mvMatrix[0][1]; data[cursor + 8] = mvMatrix[0][2]; data[cursor + 9] = mvMatrix[0][3]; data[cursor + 10] = mvMatrix[1][0]; data[cursor + 11] = mvMatrix[1][1]; data[cursor + 12] = mvMatrix[1][2]; data[cursor + 13] = mvMatrix[1][3]; data[cursor + 14] = mvMatrix[2][0]; data[cursor + 15] = mvMatrix[2][1]; data[cursor + 16] = mvMatrix[2][2]; data[cursor + 17] = mvMatrix[2][3]; data[cursor + 18] = mvMatrix[3][0]; data[cursor + 19] = mvMatrix[3][1]; data[cursor + 20] = mvMatrix[3][2]; data[cursor + 21] = mvMatrix[3][3]; data[cursor + 22] = obj - > color.r; data[cursor + 23] = obj - > color.g; data[cursor + 24] = obj - > color.b; data[cursor + 25] = obj - > color.a; cursor += floatsPerVert; // Vertex 1 data[cursor + 0] = -1.0 f; data[cursor + 1] = -1.0 f; data[cursor + 2] = 0.0 f; data[cursor + 3] = 1.0 f; input.u = 0.0 f; input.v = 0.0 f; output = texAtlas.getUV(obj - > textureName, input); data[cursor + 4] = output.u; data[cursor + 5] = output.v; data[cursor + 6] = mvMatrix[0][0]; data[cursor + 7] = mvMatrix[0][1]; data[cursor + 8] = mvMatrix[0][2]; data[cursor + 9] = mvMatrix[0][3]; data[cursor + 10] = mvMatrix[1][0]; data[cursor + 11] = mvMatrix[1][1]; data[cursor + 12] = mvMatrix[1][2]; data[cursor + 13] = mvMatrix[1][3]; data[cursor + 14] = mvMatrix[2][0]; data[cursor + 15] = mvMatrix[2][1]; data[cursor + 16] = mvMatrix[2][2]; data[cursor + 17] = mvMatrix[2][3]; data[cursor + 18] = mvMatrix[3][0]; data[cursor + 19] = mvMatrix[3][1]; data[cursor + 20] = mvMatrix[3][2]; data[cursor + 21] = mvMatrix[3][3]; data[cursor + 22] = obj - > color.r; data[cursor + 23] = obj - > color.g; data[cursor + 24] = obj - > color.b; data[cursor + 25] = obj - > color.a; cursor += floatsPerVert; // Vertex 2 data[cursor + 0] = 1.0 f; data[cursor + 1] = -1.0 f; data[cursor + 2] = 0.0 f; data[cursor + 3] = 1.0 f; input.u = 1.0 f; input.v = 0.0 f; output = texAtlas.getUV(obj - > textureName, input); data[cursor + 4] = output.u; data[cursor + 5] = output.v; data[cursor + 6] = mvMatrix[0][0]; data[cursor + 7] = mvMatrix[0][1]; data[cursor + 8] = mvMatrix[0][2]; data[cursor + 9] = mvMatrix[0][3]; data[cursor + 10] = mvMatrix[1][0]; data[cursor + 11] = mvMatrix[1][1]; data[cursor + 12] = mvMatrix[1][2]; data[cursor + 13] = mvMatrix[1][3]; data[cursor + 14] = mvMatrix[2][0]; data[cursor + 15] = mvMatrix[2][1]; data[cursor + 16] = mvMatrix[2][2]; data[cursor + 17] = mvMatrix[2][3]; data[cursor + 18] = mvMatrix[3][0]; data[cursor + 19] = mvMatrix[3][1]; data[cursor + 20] = mvMatrix[3][2]; data[cursor + 21] = mvMatrix[3][3]; data[cursor + 22] = obj - > color.r; data[cursor + 23] = obj - > color.g; data[cursor + 24] = obj - > color.b; data[cursor + 25] = obj - > color.a; cursor += floatsPerVert; // Triangle 2 // Vertex 3 data[cursor + 0] = 1.0 f; data[cursor + 1] = -1.0 f; data[cursor + 2] = 0.0 f; data[cursor + 3] = 1.0 f; input.u = 1.0 f; input.v = 0.0 f; output = texAtlas.getUV(obj - > textureName, input); data[cursor + 4] = output.u; data[cursor + 5] = output.v; data[cursor + 6] = mvMatrix[0][0]; data[cursor + 7] = mvMatrix[0][1]; data[cursor + 8] = mvMatrix[0][2]; data[cursor + 9] = mvMatrix[0][3]; data[cursor + 10] = mvMatrix[1][0]; data[cursor + 11] = mvMatrix[1][1]; data[cursor + 12] = mvMatrix[1][2]; data[cursor + 13] = mvMatrix[1][3]; data[cursor + 14] = mvMatrix[2][0]; data[cursor + 15] = mvMatrix[2][1]; data[cursor + 16] = mvMatrix[2][2]; data[cursor + 17] = mvMatrix[2][3]; data[cursor + 18] = mvMatrix[3][0]; data[cursor + 19] = mvMatrix[3][1]; data[cursor + 20] = mvMatrix[3][2]; data[cursor + 21] = mvMatrix[3][3]; data[cursor + 22] = obj - > color.r; data[cursor + 23] = obj - > color.g; data[cursor + 24] = obj - > color.b; data[cursor + 25] = obj - > color.a; cursor += floatsPerVert; // Vertex 4 data[cursor + 0] = 1.0 f; data[cursor + 1] = 1.0 f; data[cursor + 2] = 0.0 f; data[cursor + 3] = 1.0 f; input.u = 1.0 f; input.v = 1.0 f; output = texAtlas.getUV(obj - > textureName, input); data[cursor + 4] = output.u; data[cursor + 5] = output.v; data[cursor + 6] = mvMatrix[0][0]; data[cursor + 7] = mvMatrix[0][1]; data[cursor + 8] = mvMatrix[0][2]; data[cursor + 9] = mvMatrix[0][3]; data[cursor + 10] = mvMatrix[1][0]; data[cursor + 11] = mvMatrix[1][1]; data[cursor + 12] = mvMatrix[1][2]; data[cursor + 13] = mvMatrix[1][3]; data[cursor + 14] = mvMatrix[2][0]; data[cursor + 15] = mvMatrix[2][1]; data[cursor + 16] = mvMatrix[2][2]; data[cursor + 17] = mvMatrix[2][3]; data[cursor + 18] = mvMatrix[3][0]; data[cursor + 19] = mvMatrix[3][1]; data[cursor + 20] = mvMatrix[3][2]; data[cursor + 21] = mvMatrix[3][3]; data[cursor + 22] = obj - > color.r; data[cursor + 23] = obj - > color.g; data[cursor + 24] = obj - > color.b; data[cursor + 25] = obj - > color.a; cursor += floatsPerVert; // Vertex 5 data[cursor + 0] = -1.0 f; data[cursor + 1] = 1.0 f; data[cursor + 2] = 0.0 f; data[cursor + 3] = 1.0 f; input.u = 0.0 f; input.v = 1.0 f; output = texAtlas.getUV(obj - > textureName, input); data[cursor + 4] = output.u; data[cursor + 5] = output.v; data[cursor + 6] = mvMatrix[0][0]; data[cursor + 7] = mvMatrix[0][1]; data[cursor + 8] = mvMatrix[0][2]; data[cursor + 9] = mvMatrix[0][3]; data[cursor + 10] = mvMatrix[1][0]; data[cursor + 11] = mvMatrix[1][1]; data[cursor + 12] = mvMatrix[1][2]; data[cursor + 13] = mvMatrix[1][3]; data[cursor + 14] = mvMatrix[2][0]; data[cursor + 15] = mvMatrix[2][1]; data[cursor + 16] = mvMatrix[2][2]; data[cursor + 17] = mvMatrix[2][3]; data[cursor + 18] = mvMatrix[3][0]; data[cursor + 19] = mvMatrix[3][1]; data[cursor + 20] = mvMatrix[3][2]; data[cursor + 21] = mvMatrix[3][3]; data[cursor + 22] = obj - > color.r; data[cursor + 23] = obj - > color.g; data[cursor + 24] = obj - > color.b; data[cursor + 25] = obj - > color.a; cursor += floatsPerVert; q++; } } #if defined PLATFORM_WINDOWS || defined PLATFORM_OSX // Generate VAO glGenVertexArrays(1, (GLuint * ) & vao); checkGLError("glGenVertexArrays"); glBindVertexArray(vao); checkGLError("glBindVertexArray");# endif // Generate VBO glGenBuffers(1, (GLuint * ) & vbo); checkGLError("glGenBuffers"); glBindBuffer(GL_ARRAY_BUFFER, vbo); checkGLError("glBindBuffer"); // Load data into VBO glBufferData(GL_ARRAY_BUFFER, sizeof(float) * 6 * floatsPerVert * q, data, GL_STATIC_DRAW); checkGLError("glBufferData"); // Delete data delete data; // Get aspect float width = PLAT_GetWindowWidth(); float height = PLAT_GetWindowHeight();# ifdef PLATFORM_OPENVR float aspect = 1.0;# else float aspect = width / height;# endif // DRAW glEnable(GL_CULL_FACE); checkGLError("glEnable"); glFrontFace(GL_CCW); checkGLError("glFrontFace"); glCullFace(GL_BACK); checkGLError("glCullFace"); glEnable(GL_BLEND); checkGLError("ShapeRenderer glEnable");# ifndef PLATFORM_ANDROID glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA); checkGLError("ShapeRenderer glBlendFunc");# endif // Add program to OpenGL environment int curProgram = -1; curProgram = programMain; glUseProgram(curProgram); checkGLError("SpriteRenderer glUseProgram"); #if defined PLATFORM_WINDOWS || defined PLATFORM_OSX // Bind the VAO glBindVertexArray(vao); checkGLError("glBindVertexArray");# endif // Bind the VBO glBindBuffer(GL_ARRAY_BUFFER, vbo); checkGLError("glBindBuffer"); // Set the projection matrix glm::mat4 projMatrix; #if defined PLATFORM_OPENVR projMatrix = glm::make_mat4((const GLfloat * ) g_projectionMatrix.get());# else projMatrix = glm::perspective(VIEW_FOV, aspect, 0.001 f, 1000.0 f);# endif setMatrix(curProgram, "projectionMatrix", projMatrix); setUniform4f(curProgram, "globalColor", globalColor.x, globalColor.y, globalColor.z, globalColor.w); int t = texAtlas.getGlTexId(); glActiveTexture(GL_TEXTURE0); checkGLError("glActiveTexture"); glBindTexture(GL_TEXTURE_2D, t); setUniform2f(curProgram, "vTexSpan", 1.0, 1.0); setUniform1f(curProgram, "useTexture", 1.0); setUniform1f(curProgram, "fadeNear", 600.0 * NDC_SCALE); setUniform1f(curProgram, "fadeFar", 900.0 * NDC_SCALE); // Set attributes setVertexAttrib(curProgram, "vPosition", 4, GL_FLOAT, false, floatsPerVert * sizeof(float), 0); setVertexAttrib(curProgram, "vTexCoords", 2, GL_FLOAT, false, floatsPerVert * sizeof(float), 4); setVertexAttrib(curProgram, "mvMatrixPt1", 4, GL_FLOAT, false, floatsPerVert * sizeof(float), 6); setVertexAttrib(curProgram, "mvMatrixPt2", 4, GL_FLOAT, false, floatsPerVert * sizeof(float), 10); setVertexAttrib(curProgram, "mvMatrixPt3", 4, GL_FLOAT, false, floatsPerVert * sizeof(float), 14); setVertexAttrib(curProgram, "mvMatrixPt4", 4, GL_FLOAT, false, floatsPerVert * sizeof(float), 18); setVertexAttrib(curProgram, "vColor", 4, GL_FLOAT, false, floatsPerVert * sizeof(float), 22); // Draw glDrawArrays(GL_TRIANGLES, 0, q * 6); checkGLError("glDrawArrays"); #if defined PLATFORM_WINDOWS || defined PLATFORM_OSX // Reset glBindVertexArray(0); glBindTexture(GL_TEXTURE_2D, 0); glUseProgram(0);# endif // Delete VAO and VBO glDeleteBuffers(1, (GLuint * ) & vbo);# if defined PLATFORM_WINDOWS || defined PLATFORM_OSX glDeleteVertexArrays(1, (GLuint * ) & vao);# endif Shader Code: // // VERTEX SHADER ES 2.0 // const char * vertexShaderCodeES20 = "attribute vec4 vPosition;"\ "varying lowp vec4 posOut; "\ "attribute vec2 vTexCoords;"\ "varying lowp vec2 vTexCoordsOut; "\ "uniform vec2 vTexSpan;"\ "attribute vec4 vNormal;"\ "varying vec4 vNormalOut;"\ "attribute vec4 vVertexLight; "\ "varying vec4 vVertexLightOut; "\ "uniform mat4 projectionMatrix; "\ "varying lowp float distToCamera; "\ "attribute vec4 mvMatrixPt1; "\ "attribute vec4 mvMatrixPt2; "\ "attribute vec4 mvMatrixPt3; "\ "attribute vec4 mvMatrixPt4; "\ "attribute vec4 vColor; "\ "varying vec4 vColorOut;"\ "attribute mat4 oldmvMatrix; "\ "void main() {"\ " mat4 mvMatrix; "\ " mvMatrix[0] = mvMatrixPt1; "\ " mvMatrix[1] = mvMatrixPt2; "\ " mvMatrix[2] = mvMatrixPt3; "\ " mvMatrix[3] = mvMatrixPt4; "\ " gl_Position = projectionMatrix * mvMatrix * vPosition; " " vTexCoordsOut = vTexCoords * vTexSpan; "\ " posOut = gl_Position; "\ " vec4 posBeforeProj = mvMatrix * vPosition;"\ " distToCamera = -posBeforeProj.z; "\ " vColorOut = vColor; "\ "}\n"; // // FRAGMENT SHADER ES 2.0 // const char * fragmentShaderCodeES20 = "uniform sampler2D uTexture; "\ "uniform lowp vec4 vColor; "\ "uniform lowp vec4 globalColor; "\ "varying lowp vec2 vTexCoordsOut; "\ "varying lowp vec4 posOut; "\ "uniform lowp float useTexture; "\ "uniform lowp float fadeNear; "\ "uniform lowp float fadeFar; "\ "varying lowp float distToCamera; "\ "varying lowp vec4 vColorOut; "\ "void main() {"\ " lowp vec4 f = texture2D(uTexture, vTexCoordsOut.st); "\ " if (f.a == 0.0) "\ " discard; "\ " lowp float visibility = 1.0; "\ " lowp float alpha = 1.0; "\ " if (distToCamera >= fadeFar) discard; "\ " if (distToCamera >= fadeNear) "\ " alpha = 1.0 - (distToCamera - fadeNear) * 3.0; "\ " if (useTexture == 1.0)"\ " {"\ " gl_FragColor = texture2D(uTexture, vTexCoordsOut.st) * vColorOut * vec4(visibility, visibility, visibility, alpha) * globalColor; "\ " }"\ " else"\ " {"\ " gl_FragColor = vColorOut * vec4(visibility, visibility, visibility, alpha) * globalColor; "\ " }"\ "}\n"; The rest of the new code is here: TextureAtlas.cpp Renderer.cpp https://github.com/dimitrilozovoy/Voxyc/

Graphics and GPU Programming Programming OpenGLES

Started by VoxycDev March 23, 2019 01:53 AM

15 comments, last by VoxycDev 5 years, 10 months ago

VoxycDev

Author

March 23, 2019 06:30 PM

2 minutes ago, lawnjelly said:
The other things is that you appear to be recreating and compiling the shader on every frame, which will probably kill performance. Again move this to one off code and reuse the shader. After all this is done you can reassess whether there are any bottlenecks.

Oh no, I'm definitely not doing that. Shader compilation happens once at init time. I did not include that part for brevity's sake.

lawnjelly

2,021

March 23, 2019 06:32 PM

1 minute ago, VoxycDev said:
Oh no, I'm definitely not doing that. Shader compilation happens once at init time. I did not include that part for brevity's sake.

Ah yep, sorry you are right, I didn't read thoroughly enough.

VoxycDev

Author

March 23, 2019 06:35 PM

1 minute ago, lawnjelly said:
Ah yep, sorry you are right, I didn't read thoroughly enough.

Cool. Thank you for your input. Overall, this has been a very helpful thread for me. I think I am on my way to that blazing fast particle system I wanted.

SyncViews

844

March 23, 2019 08:10 PM

2 hours ago, VoxycDev said:
Yes, since I was looking for a way to draw all sprites with one call, I decided to make mvMatrix an attribute.

I still don't see why it is needed? Maybe you are trying to over optimise this. The reason sprite batching is needed, is because you can have thousands , even tens of thousands of visible sprites in a scene, even after basic culling, and that many state changes and draw calls is too many. A handful, even dozens of draw calls is fine.

In something 2D (for simplicity of the example) like your Youtube video or other 2D games, I might have separate drawing at least for the background tiles (easy to cull calculate coordinates CPU side, might even cache), world objects, and the UI.

2 hours ago, VoxycDev said:
Thing is, even though right now it's a just a grid, the sprites are supposed to be stretchable/bendable, like trey were in my old fixed pipeline code

Sounds like it could still be done without a unique matrix per vertex, but I am not clear exactly what you are doing. The old code you posted just draws a normal tile grid with no deform unless I missed something.

Would you need these deformed positions CPU side anyway, e.g. for collision detection? In which case just use those directly.

Is the deformation limited, to say moving the 4 corner points of a large object? Or something else that can be determined on the fly from a small dataset?

And surely you can't deform every sprite in the game? If some things need a more complex and expensive routine, avoid letting that add significant cost to the thousands of other things being rendered.

2 hours ago, VoxycDev said:
Quote
What is `texAtlas.add(obj- >textureName);`. Your not rebuilding a texture dynamically are you? Even if not every frame, need to be careful not to cause slow frames / stutter. Also looks like a string, if its doing string map lookups for every sprite that is not ideal.
It makes sure the texture is in the texture atlas. It's rebuilt as-needed (only when a brand new texture is added). You're right, I probably should get rid of string map lookup here. But in this particular case there is only one texture so array size is 1, so it's not the bottleneck.

Normally if I had an atlas Id do it at load time. Doing it dynamically is a lot more complex. "rebuilt as-needed" can be perceived as stutter if not careful when that "as needed" frame takes longer than the other frames that didn't rebuild anything.

One of the reasons I hate string comparison, is even the best case is fairly expensive. You have a hash map with one entry, well in the case of a "hit" you just did an O(n) hash computation, and an O(n) string comparison (to check against collision), and if you have a fairly long string like a filename, or worse a path, it is a fair bit. Probably not the bottleneck, but things like that if throughout a program add up a lot (some languages and/or programs might "intern" strings so they can use reference equality instead, essentially turning such strings into integers).

2 hours ago, VoxycDev said:
Quote
Also not sure on the cost of things like `setVertexAttrib`. You should be able to do this once, and it is saved with the `GL_ARRAY_BUFFER` (possibly all in one go, e.g. `glVertexAttribPointer`)
setVertexAttrib just calls all the gl functions needed to set up an attribute. Good point, though. I should try to do this once if I can. This is not the only program/renderer that runs in the engine though, so I assumed I have to re-set-up all the attributes on every frame for every program. Is that not the case?

This is where `glVertexAttribPointer` etc. come in, despite maybe the first appearance, it is not setting global state, it is modifying the buffer, and what you set will be there next time you use that buffer.

2 hours ago, VoxycDev said:
Quote
Any sort of dynamic branch in a shader is usually bad if adjacent/nearby data will branch differently. GPU cores are not like CPU ones and can't all independently do their own thing. I didn't look closely at your data, but something to be aware of.

I'm not super worried about the gaps between the sprites. This is only for an editor, not for rendering in the game. As long as it's smooth and I can quickly build vast landscapes and cities out of voxels, that's all I care about.

Wrong quote? The gaps is when you let a translation get combined with other things in a matrix and it causes rounding errors.

Dynamic branching in a GPU program / shader can be a serious performance impact. If the GPU has say 32 threads together, then all 32 threads must do the exact same thing each cycle, they just get different registers (and there are some memory access rules as well). If you have a condition of some sort such that some threads will do one thing and others something else, then it basically has to "pause" one set of threads do the first thing, "pause" the others threads, and do the other thing, on separate cycles.

2 hours ago, VoxycDev said:
@SyncViews, just an idea. What if I send mvMatrix as a uniform array, and even though I can only send 32 or 64 matrices at once, I can then break it up into, let's say, 4 draw calls, to do 128 or 256 sprites? Maybe worth a try.

With only 6 vertices using the same matrix, I am not sure if that is a great help. You would need to test it. Also there may be a penalty for that uniform/memory access pattern, not sure.

VoxycDev

Author

March 23, 2019 10:38 PM

2 hours ago, SyncViews said:
I still don't see why it is needed? Maybe you are trying to over optimise this.

Perhaps. I guess I jumped on the whole "as fewer draw calls as possible" wagon and took it a bit too far. It's still a lot faster than drawing every quad with separate draw call, though.

Quote
In something 2D (for simplicity of the example) like your Youtube video or other 2D games, I might have separate drawing at least for the background tiles (easy to cull calculate coordinates CPU side, might even cache), world objects, and the UI.

Ultimately, this is for a universal sprite/particle renderer class that I can use for:

Regular sprites in the game
Particle system in the game
Flexible stretchable 2D tiles in the orthographic voxel editor (the part that I need most at the moment)

Quote
Sounds like it could still be done without a unique matrix per vertex, but I am not clear exactly what you are doing.

To be able to hit all 3 cases above, I need at least a matrix per quad. I will do the stretched corners with vertex coordinates. As @JohnnyCode pointed out, if they are 2D and do not rotate (yes and yes for case 3), I can save on memory by using a smaller data structure. This may allow me to cram enough of them into an uniform array and still do all of them with one draw call.

Quote
The old code you posted just draws a normal tile grid with no deform unless I missed something.

Yes, the example does not deform. I had trouble finding the old piece of code that deforms (it's old code). But the original question was only to find out why the quads draw so slow.

Quote
Would you need these deformed positions CPU side anyway, e.g. for collision detection? In which case just use those directly.

Yes, I do and I will.

Quote
Is the deformation limited, to say moving the 4 corner points of a large object? Or something else that can be determined on the fly from a small dataset?

Deformation will be everywhere for terrain, but less so for buildings (when designing either in the orthographic editor).

Quote
And surely you can't deform every sprite in the game? If some things need a more complex and expensive routine, avoid letting that add significant cost to the thousands of other things being rendered.

This may be needed if I'm designing a sophisticated landscape for case 3.

Quote
Normally if I had an atlas Id do it at load time. Doing it dynamically is a lot more complex. "rebuilt as-needed" can be perceived as stutter if not careful when that "as needed" frame takes longer than the other frames that didn't rebuild anything.

Yes, it does stutter, especially in Evertank. But you cannot predict what textures the user is going to load in the editor and when, so it has to be on demand. In a game release, I try to remedy this by pre-loading all the required textures in Lua at the start of the game, but not sure if this is working right now.

Quote
One of the reasons I hate string comparison, is even the best case is fairly expensive. You have a hash map with one entry, well in the case of a "hit" you just did an O(n) hash computation, and an O(n) string comparison (to check against collision), and if you have a fairly long string like a filename, or worse a path, it is a fair bit. Probably not the bottleneck, but things like that if throughout a program add up a lot (some languages and/or programs might "intern" strings so they can use reference equality instead, essentially turning such strings into integers).

I'd be happy to get rid of all the string lookups.

VoxycDev

Author

April 10, 2019 06:22 AM

Collision detection was causing most of the performance slow down in my original question. Once I disabled it, the frame-rate went way up. Here is the new version of fast sprite renderer, with all suggestions included (not re-creating VBO, dynamic draw and so on):

https://github.com/dimitrilozovoy/Voxyc/blob/master/engine/SpriteRenderer.cpp

It works pretty well. I also tried to put mvMatrix into uniforms and here is that version of the fast sprite renderer:

https://github.com/dimitrilozovoy/Voxyc/blob/master/engine/SpriteRenderer2.cpp

The one above can do max 10 sprites per draw call due to uniforms limit, but may be even faster (further testing needed). Thank you everyone for your input.

Fixed Function Pipeline Faster For Sprites?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Fixed Function Pipeline Faster For Sprites?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines