The UE link refers to Opengl and D3D11 rendering. That was the time where you've been heavily limited by draw calls due to the driver overhead and it's one reason why the Industry moved to Mantle/Metal/Vulkan/D3D12. On consoles, with the same hardware, you could easily reach 5k drawcalls Hence don't apply 1:1 performance budgets to modern APIs and Hardware, it's misleading.
You can find some reference numbers e.g. from PowerVR (2015) running 13500 DC @ 30Hz : https://www.imgtec.com/blog/gnomes-per-second-in-vulkan-and-opengl-es/
With Vulkan, it's not just about switching the API and magic happens, as you'd leave behind a lot of new opportunities.
1. Vulkan allows you to generate command-buffers in multiple threads. Even if you don't need it for performance, you still can get work done quicker and therefor allow the CPU to go into a power saving state. This might leave more room the GPU.
2. You should cache most of the render setups. For most static objects, you could keep pipelines, descriptors etc. unchanged (Only a few matrices might change per frame, which can be copied into per-frame uniform buffers, eventually during the post process work to hide latency). That way your drawcalls become even cheaper -> even faster.
3. Vulkan allows you to set some special flags e.g. for render passes, that discard buffer content if it's not needed. That way, especially on mobile, depth/stencil buffers don't need to be written out to memory. color buffer dont need to be loaded from memory, fast clears can be applied, with some extensions, even tiled registration can be avoided.
But it's hard to tell why the 32 bit version is slow. Maybe it's really the API translation, maybe the 32bit driver is less optimized, maybe your code was compiled with some suboptimal flags, maybe you run into some memory limits and it causes swapping. (That's still an area of possible improvement of vulkan, you can't be sure how memory is managed on driver side ).
In general, for mobile, the variance of hardware speed is very huge. Even on the same device, due to battery state, power saving options, temperature and other activities (e.g. wifi download), the performance available to your game/engine will vary. Don't spend too much time investigating why perf is bad on every device, but rather make sure your game/engine adapts to the available power. Adapt LOD levels, draw distance, shader quality, resolution, frequency of shadow updates etc. to keep the frame rate close to the FPS target. Nobody will tinker with the options, everyone expects smooth framerate, hence above making your code "fast", make it "adaptive" .