JoeJ said:
To be clear, you'd need to show from where you read your data and how you index it.
particle.xyzw = bufferInVRAM[(i * 64) + flatThreadID];
No problem - all the data is parsed and pushed. You can use 32 threads too.
It is just an example to show you it works. GPU hates large loops, so it will be super slow. But it works to prove my point.