Of course the performance was really bad after i used it in my actual game test level which does contains a lot of line segments, but i found a solution for that. I use two additional uniform grids to sort in the fixtures + proxy-index and use the exact same approach i used to detect the particle neighbors efficiently. Even for the first implementation this works pretty well - Simulating 1500 particles with 200 line segments runs at 60 fps without any multithreading at all.
The only thing which bugs me a bit, when i start the game the fps goes down instantly to < 1 fps for the first 3-5 seconds and then it stabilize it around 60 fps. There may be several reasons for this like:
- Creating particles initially using a active/inactive single buffer is not so good for performance maybe?
- Uploading every final particle position to the GPU using FloatBuffer is not good for performance as well (Java is very bad at this)
- Even with disabled particle system, the fps starts at < 1 at the first few frames also (Drawing just a tile map including box2d physics)
But for the moment i am happy and i can focus on the actual game play mechanics.
Fixing performance will come later and also i have plans to maybe move the entire physics (particles and rigidbody) to the GPU entirely to get rid of any CPU>GPU buffer uploads and maybe port the game to c++/11 to have a much better way to handle memory and such.
One thing i will try which may be added in a few minutes (thanks to java) to move the PBF-Solver code into threadpools - so that performance should not be an issue in the meanwhile.
Stay tuned.