Advertisement

Why 64 bit Vulkan performance boost

Started by June 01, 2019 06:28 AM
5 comments, last by Krypt0n 5 years, 8 months ago

I have a test(3000+ drawcall with simple materials.) on mobile(Snapdragon 845, Adreno™ 630).

If I run 32 bit version, it runs at 35 fps. If I run the 64 bit version vulkan, it goes beyond 60 fps.

Does anyone know about why Vulkan run so fast in 64 bit?

1) That's waaaaay too many draw calls per frame for ideal performance. You should be aiming for sub-100, or even sub-30. Have you profiled your code yet?

2) At a wild guess, based on the near-zero information provided... there might be a large conversion or thunking cost of translating your 32-bit API calls to the native library's 64-bit API. Have you profiled your code yet?

3) The bottleneck might be in your code, not in Vulkan or the driver. Have you profiled your code yet?

RIP GameDev.net: launched 2 unusably-broken forum engines in as many years, and now has ceased operating as a forum at all, happy to remain naught but an advertising platform with an attached social media presense, headed by a staff who by their own admission have no idea what their userbase wants or expects.Here's to the good times; shame they exist in the past.
Advertisement
1 hour ago, Wyrframe said:

1) That's waaaaay too many draw calls per frame for ideal performance. You should be aiming for sub-100, or even sub-30. Have you profiled your code yet?

Do you have any information that supports this "sub-100, or even sub-30"? Best I could find was a Unity post from 2014 that claims 2000 draw calls is a good number: https://answers.unity.com/questions/694570/unity-mobile-draw-call-limit.html

I am really interested in getting a feeling for very rough draw call count on mobile. It seems on PC the numbers are really high (for example: https://www.anandtech.com/show/11223/quick-look-vulkan-3dmark-api-overhead), but I don't have a good feeling for mobile. That said draw call count isn't really a representative number though, there are many factors. The CPU overhead of settings render state, setting up resources (textures, buffers, etc). These make the driver overhead vary wildly from say a simple draw call that just calls draw, vs one that sets a lot of state. Then there is obviously the GPU side, which varies even more wildly, but is also not really related to the original poster's problem as he/she doesn't change the scene.

In any case I hope someone has a good explanation for this, or that iGrfx posts the results he finds, because that is an interesting question.

3 hours ago, deadc0deh said:

Do you have any information that supports this "sub-100, or even sub-30"? Best I could find was a Unity post from 2014 that claims 2000 draw calls is a good number: https://answers.unity.com/questions/694570/unity-mobile-draw-call-limit.html

That user says he's using a high-end smartphone or tablet (of his era), probably is targetting 30fps, probably doesn't care about battery drain... and you're citing some random guy on his personal project of unspecified complexity.

 

UE4 suggests batch counts sub-1000, sometimes even sub-200. Because the goal is not to just barely hold 30fps, or 60fps, or 72fps... it's to run the processor as slow as it can possibly be while getting that performance, because that's how you save battery life and keep heat down.

https://docs.unrealengine.com/en-US/Platforms/Mobile/Performance/index.html (see "Additional Suggestions" near the bottom).

 

Oculus, for the Quest and Go (mobile android VR headsets) suggest 50-100 draw calls, using under 100k vertices/tris per frame (whichever is the higher, though naturally overdraw and occlusion will hinder and help, respectively), if you want to run your game for tens or sixties of minutes. More than that, and the device will have to throttle for both heat and battery reasons, and you really, really want to maintain 72fps without ever dipping.

https://developer.oculus.com/documentation/quest/latest/concepts/unreal-debug-quest/

RIP GameDev.net: launched 2 unusably-broken forum engines in as many years, and now has ceased operating as a forum at all, happy to remain naught but an advertising platform with an attached social media presense, headed by a staff who by their own admission have no idea what their userbase wants or expects.Here's to the good times; shame they exist in the past.
5 hours ago, Wyrframe said:

https://docs.unrealengine.com/en-US/Platforms/Mobile/Performance/index.html (see "Additional Suggestions" near the bottom).

Thank you for that link, that was a good read. I'm guessing you are referring to this section:

Quote
  • Draw calls of the entire scene should be <=700 for any single view. Areas with poor occlusion, like looking over a large landscape, is going to be biggest challenges for this. This can be seen with Stat OpenGLRHI on device or Stat D3D11RHI in the Previewer on PC.

  • Triangle count of the entire scene should be <=500k for any view. This has been determined to be the maximum poly count that can hit 30fps on both iPad4 and iPad Air. This can be seen with Stat OpenGLRHI on device or Stat D3D11RHI in the Previewer on PC.

The only issue that I have with this is that the hardware mentioned is again from around 2012, which perhaps doesn't make it too representative of 2019. The original poster quoted an Adreno 630 which is from 2018.

Never the less, your link set me onto an interesting search and I found some more relevant numbers for Unreal Engine in this presentation where they talk about holding 30fps at a comfortable CPU usage for Fortnite. (Note that the time stamp is to the draw call count averages and is for fairly dated hardware)

It seems they vary from ~1000-2000 root draw calls (not counting instances/draw indirect.) For main stream hardware from around 2015-2016.

Anyway, thanks again Wyrframe for the information, it set me down a path of watching videos and reading articles in which I learned more things about cool ways of optimizing contents too!

The UE link refers to Opengl and D3D11 rendering. That was the time where you've been heavily limited by draw calls due to the driver overhead and it's one reason why the Industry moved to Mantle/Metal/Vulkan/D3D12. On consoles, with the same hardware, you could easily reach 5k drawcalls Hence don't apply 1:1 performance budgets to modern APIs and Hardware, it's misleading.

You can find some reference numbers e.g. from PowerVR (2015) running 13500 DC @ 30Hz : https://www.imgtec.com/blog/gnomes-per-second-in-vulkan-and-opengl-es/

 

With Vulkan, it's not just about switching the API and magic happens, as you'd leave behind a lot of new opportunities.

1. Vulkan allows you to generate command-buffers in multiple threads. Even if you don't need it for performance, you still can get work done quicker and therefor allow the CPU to go into a power saving state. This might leave more room the GPU. 

2. You should cache most of the render setups. For most static objects, you could keep pipelines, descriptors etc. unchanged (Only a few matrices might change per frame, which can be copied into per-frame uniform buffers, eventually during the post process work to hide latency). That way your drawcalls become even cheaper -> even faster. 

3. Vulkan allows you to set some special flags e.g. for render passes, that discard buffer content if it's not needed. That way, especially on mobile, depth/stencil buffers don't need to be written out to memory. color buffer dont need to be loaded from memory, fast clears can be applied, with some extensions, even tiled registration can be avoided.

 

But it's hard to tell why the 32 bit version is slow. Maybe it's really the API translation, maybe the 32bit driver is less optimized, maybe your code was compiled with some suboptimal flags, maybe you run into some memory limits and it causes swapping. (That's still an area of possible improvement of vulkan, you can't be sure how memory is managed on driver side ).

In general, for mobile, the variance of hardware speed is very huge. Even on the same device, due to battery state, power saving options, temperature and other activities (e.g. wifi download), the performance available to your game/engine will vary. Don't spend too much time investigating why perf is bad on every device, but rather make sure your game/engine adapts to the available power. Adapt LOD levels, draw distance, shader quality, resolution, frequency of shadow updates etc. to keep the frame rate close to the FPS target. Nobody will tinker with the options, everyone expects smooth framerate, hence above making your code "fast", make it "adaptive" .

 

 

 

This topic is closed to new replies.

Advertisement